Building a Simple Data Pipeline

9 min read By DataQron Team Updated January 2026

Combine extraction, transformation, and loading into one working example.

On this page

Overview Step 1: Extract Step 2: Transform Step 3: Load Putting It Together

Overview

A simple data pipeline connects the three ETL stages into one repeatable script. This example reads raw CSV data, cleans it, and writes summarized results to an output table.

Step 1: Extract

PYTHON

import pandas as pd

def extract(path):
    return pd.read_csv(path)

Step 2: Transform

PYTHON

def transform(df):
    df = df.dropna(subset=['customer_id'])
    df = df.drop_duplicates(subset=['order_id'])
    summary = df.groupby('customer_id').agg(
        total_orders=('order_id', 'count'),
        total_revenue=('order_amount', 'sum')
    ).reset_index()
    return summary

Step 3: Load

PYTHON

def load(df, destination_path):
    df.to_csv(destination_path, index=False)

Putting It Together

PYTHON

def run_pipeline():
    raw = extract('orders_raw.csv')
    summary = transform(raw)
    load(summary, 'customer_revenue.csv')
    print('Pipeline complete:', len(summary), 'customers processed')

if __name__ == '__main__':
    run_pipeline()

In production, this same pattern is usually scheduled with an orchestrator like Airflow and reads from and writes to real databases instead of local files.

Continue Learning

Building a Simple Data Pipeline

Overview

Step 1: Extract

Step 2: Transform

Step 3: Load

Putting It Together

Related Tutorials

How ETL Pipelines Work

Airflow Workflow Basics

Data Quality Checks