Apache Airflow: A Birds-Eye View

Definition
Apache Airflow is a workflow scheduler.
It defines what should run, when it should run, and in what order — but it does not perform the work itself.
Core Building Blocks
DAG (Directed Acyclic Graph)
A DAG is a workflow definition, written in Python.
A DAG contains multiple tasks arranged in a directed, acyclic order.
DAGs define workflows; they do not execute them directly.
with DAG(...) as dag:
task_a >> task_b >> task_c
DAG files are continuously parsed by the Scheduler
Parsing must be fast and free of side effects
Task
A Task is a single unit of work.
Examples:
Run Python code
Execute SQL
Call an API
Tasks:
Are stateless
Can succeed or fail independently
Are executed by workers
Operator
An Operator is a template for creating tasks.
Examples:
PythonOperatorPostgresOperatorBashOperator
PythonOperator(
task_id="task_id",
python_callable=method_name
)
Operator = how a task runs
Task = a specific instance of that operator
Scheduler
The Scheduler continuously evaluates DAGs and task dependencies to decide when tasks should be queued for execution.
It:
Parses DAG files
Determines when DAG runs should occur
Queues tasks when dependencies are met
Executor
The Executor determines how tasks are dispatched to workers.
It lives inside the scheduler and defines the execution strategy.
Common executors include:
SequentialExecutorLocalExecutorKubernetesExecutor
Worker
A Worker executes tasks.
It:
Runs Python, SQL, or shell commands
Produces logs
Reports success or failure back to Airflow
Metadata Database
The Metadata Database stores Airflow’s state.
It contains:
DAG runs
Task states
Connections
Variables
If this database is reset, Airflow loses its history and configuration.
Local Run
Airflow provides a simple way to run everything locally for development and learning.
The command is:
airflow standalone
airflow standalone
This command:
Creates a metadata database (SQLite by default)
Starts:
Scheduler
Webserver (
http://localhost:8080)Worker
Scans
$AIRFLOW_HOME/dagsParses DAG files
Waits for triggers (schedule-based or manual)
Birds-Eye View
At a high level, Airflow works as a coordination loop between the user, the scheduler, and task execution.
User ↔ Web UI ↔ Metadata DB ↔ Scheduler ↔ Executor ↔ Worker
How to read this flow
User
Interacts with Airflow by writing DAGs, triggering runs, and monitoring execution.
Web UI
Provides a visual interface and API for interacting with Airflow.
All actions in the UI are recorded in the metadata database.
Metadata Database
Acts as Airflow’s source of truth.
It stores DAG state, task state, schedules, connections, and execution history.
Scheduler
Continuously reads from the metadata database and DAG files to decide:
When a DAG should run
Which tasks are ready to execute
Executor
Determines how ready tasks are dispatched (locally, via queues, or as containers).
Worker
Executes the task code and reports the result back to the metadata database.
Airflow’s architecture separates definition (DAGs), decision-making (Scheduler + Executor), and execution (Workers) to enable scalable and fault-tolerant workflows.



