Glossary
Plain-language definitions for common data engineering, database, and analytics terms.
Airflow
An open-source workflow orchestration tool used to schedule and monitor data pipelines.
ACID
A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing reliable database transactions.
Batch Processing
Processing data in scheduled chunks rather than continuously as it arrives.
Big Data
Datasets characterized by high volume, velocity, and variety that exceed traditional processing capacity.
Data Lake
A centralized repository that stores raw, unstructured, and structured data at scale.
Data Warehouse
An analytics-optimized storage system designed for large-scale reporting and queries.
Data Pipeline
An automated sequence of steps that moves and transforms data from source to destination.
Dimension Table
A table storing descriptive attributes, such as customer or product details, in a star schema.
ETL
Extract, Transform, Load — a pattern where data is transformed before loading into its destination.
ELT
Extract, Load, Transform — a pattern where raw data is loaded first, then transformed in place.
Fact Table
A table storing measurable, quantitative data such as transactions or events in a star schema.
Idempotency
A property where re-running an operation produces the same result without unwanted side effects.
Indexing
A database structure that speeds up data retrieval at the cost of extra storage and write time.
Normalization
The process of organizing database tables to reduce redundancy and improve data integrity.
NoSQL
A category of non-relational databases optimized for flexibility and horizontal scale.
Orchestration
The coordination of pipeline tasks, including scheduling, dependencies, and failure handling.
Schema
The structural definition of how data is organized within a database or table.
Star Schema
A data warehouse modeling pattern with a central fact table linked to dimension tables.
Streaming
Processing data continuously as events occur, rather than in scheduled batches.
Window Function
A SQL function that performs calculations across a set of rows related to the current row.