Home Data Engineering ETL & ELT Data Pipelines Tutorials Blog Databases Data Warehousing Big Data Cloud Data SQL Guides Python Guides Tools Glossary Resources About Contact
Intermediate

Airflow Workflow Basics

Learn how DAGs, tasks, and schedules coordinate real pipelines.

What Is Airflow?

Apache Airflow is an open-source tool for scheduling and monitoring workflows. It's widely used in data engineering to orchestrate ETL/ELT jobs, ensuring tasks run in the right order, on the right schedule, with visibility into failures.

DAGs and Tasks

Airflow workflows are defined as DAGs — Directed Acyclic Graphs — where each node is a task, and edges define dependencies between tasks. This structure ensures tasks run in a valid order and never in a circular loop.

A Simple DAG Example

PYTHON
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def extract(): ...
def transform(): ...
def load(): ...

with DAG('daily_revenue_pipeline', start_date=datetime(2026, 1, 1), schedule_interval='@daily') as dag:
    t1 = PythonOperator(task_id='extract', python_callable=extract)
    t2 = PythonOperator(task_id='transform', python_callable=transform)
    t3 = PythonOperator(task_id='load', python_callable=load)

    t1 >> t2 >> t3

The >> operator defines dependencies — here, extract must finish before transform, and transform before load.