Databases
Databases are organized systems for storing, retrieving, and managing data — the foundation that almost every application and pipeline relies on.
Databases generally fall into two broad categories: relational databases, which store data in structured tables with defined relationships, and non-relational (NoSQL) databases, which store data in flexible formats like documents, key-value pairs, or graphs.
Relational databases use SQL and enforce strong consistency through transactions, making them a natural fit for structured, transactional data. Non-relational databases trade some of that structure for flexibility and horizontal scalability, which suits high-volume or rapidly changing data.
Understanding database fundamentals — indexing, normalization, and transactions — helps data engineers design schemas that are both fast to query and safe to update.
Relational vs Non-Relational
Two broad families of databases, each suited to different workloads.
Relational (SQL)
Structured tables with defined schemas and relationships, queried using SQL.
Non-Relational (NoSQL)
Flexible document, key-value, or graph stores optimized for scale and flexibility.
Indexing
Data structures that speed up lookups at the cost of extra storage and write overhead.
Core Database Concepts
Concepts every data engineer should be comfortable with.
- Normalization — organizing tables to reduce data duplication
- Primary and foreign keys — defining relationships between tables
- Transactions — grouping operations so they succeed or fail together
- Indexes — speeding up common query patterns
- ACID properties — atomicity, consistency, isolation, durability
Databases — Common Questions
Quick answers to frequent questions on this topic.
Related Guides
Continue building context around this topic.
Data Warehousing
See how databases evolve into analytical warehouses for reporting.
SQL Guides
Practice writing queries against relational database structures.
Big Data
Learn how databases scale to handle very large datasets.