Home Data Engineering ETL & ELT Data Pipelines Tutorials Blog Databases Data Warehousing Big Data Cloud Data SQL Guides Python Guides Tools Glossary Resources About Contact
A

Airflow

An open-source workflow orchestration tool used to schedule and monitor data pipelines.

ACID

A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing reliable database transactions.

B

Batch Processing

Processing data in scheduled chunks rather than continuously as it arrives.

Big Data

Datasets characterized by high volume, velocity, and variety that exceed traditional processing capacity.

D

Data Lake

A centralized repository that stores raw, unstructured, and structured data at scale.

Data Warehouse

An analytics-optimized storage system designed for large-scale reporting and queries.

Data Pipeline

An automated sequence of steps that moves and transforms data from source to destination.

Dimension Table

A table storing descriptive attributes, such as customer or product details, in a star schema.

E

ETL

Extract, Transform, Load — a pattern where data is transformed before loading into its destination.

ELT

Extract, Load, Transform — a pattern where raw data is loaded first, then transformed in place.

F

Fact Table

A table storing measurable, quantitative data such as transactions or events in a star schema.

I

Idempotency

A property where re-running an operation produces the same result without unwanted side effects.

Indexing

A database structure that speeds up data retrieval at the cost of extra storage and write time.

N

Normalization

The process of organizing database tables to reduce redundancy and improve data integrity.

NoSQL

A category of non-relational databases optimized for flexibility and horizontal scale.

O

Orchestration

The coordination of pipeline tasks, including scheduling, dependencies, and failure handling.

S

Schema

The structural definition of how data is organized within a database or table.

Star Schema

A data warehouse modeling pattern with a central fact table linked to dimension tables.

Streaming

Processing data continuously as events occur, rather than in scheduled batches.

W

Window Function

A SQL function that performs calculations across a set of rows related to the current row.