Skip to main content

Command Palette

Search for a command to run...

Apache Airflow: A Birds-Eye View

Updated
3 min read
Apache Airflow: A Birds-Eye View

Definition

Apache Airflow is a workflow scheduler.

It defines what should run, when it should run, and in what order — but it does not perform the work itself.

Core Building Blocks

DAG (Directed Acyclic Graph)

A DAG is a workflow definition, written in Python.

  • A DAG contains multiple tasks arranged in a directed, acyclic order.

  • DAGs define workflows; they do not execute them directly.

with DAG(...) as dag:
    task_a >> task_b >> task_c
  • DAG files are continuously parsed by the Scheduler

  • Parsing must be fast and free of side effects

Task

A Task is a single unit of work.

Examples:

  • Run Python code

  • Execute SQL

  • Call an API

Tasks:

  • Are stateless

  • Can succeed or fail independently

  • Are executed by workers

Operator

An Operator is a template for creating tasks.

Examples:

  • PythonOperator

  • PostgresOperator

  • BashOperator

PythonOperator(
    task_id="task_id",
    python_callable=method_name
)

Operator = how a task runs

Task = a specific instance of that operator

Scheduler

The Scheduler continuously evaluates DAGs and task dependencies to decide when tasks should be queued for execution.

It:

  • Parses DAG files

  • Determines when DAG runs should occur

  • Queues tasks when dependencies are met

Executor

The Executor determines how tasks are dispatched to workers.

It lives inside the scheduler and defines the execution strategy.

Common executors include:

  • SequentialExecutor

  • LocalExecutor

  • KubernetesExecutor

Worker

A Worker executes tasks.

It:

  • Runs Python, SQL, or shell commands

  • Produces logs

  • Reports success or failure back to Airflow

Metadata Database

The Metadata Database stores Airflow’s state.

It contains:

  • DAG runs

  • Task states

  • Connections

  • Variables

If this database is reset, Airflow loses its history and configuration.

Local Run

Airflow provides a simple way to run everything locally for development and learning.

The command is:

airflow standalone

airflow standalone

This command:

  1. Creates a metadata database (SQLite by default)

  2. Starts:

    1. Scheduler

    2. Webserver (http://localhost:8080)

    3. Worker

  3. Scans $AIRFLOW_HOME/dags

  4. Parses DAG files

  5. Waits for triggers (schedule-based or manual)

Birds-Eye View

At a high level, Airflow works as a coordination loop between the user, the scheduler, and task execution.

User ↔ Web UI ↔ Metadata DB ↔ Scheduler ↔ Executor ↔ Worker

How to read this flow

  • User

    Interacts with Airflow by writing DAGs, triggering runs, and monitoring execution.

  • Web UI

    Provides a visual interface and API for interacting with Airflow.

    All actions in the UI are recorded in the metadata database.

  • Metadata Database

    Acts as Airflow’s source of truth.

    It stores DAG state, task state, schedules, connections, and execution history.

  • Scheduler

    Continuously reads from the metadata database and DAG files to decide:

    • When a DAG should run

    • Which tasks are ready to execute

  • Executor

    Determines how ready tasks are dispatched (locally, via queues, or as containers).

  • Worker

    Executes the task code and reports the result back to the metadata database.

Airflow’s architecture separates definition (DAGs), decision-making (Scheduler + Executor), and execution (Workers) to enable scalable and fault-tolerant workflows.

2 views