How a Database Thinks

A database is not a spreadsheet with superpowers. It is a system built around one central idea: data has structure, and that structure should be enforced by the system itself, not by convention or careful human attention.

Understanding that distinction is worth spending a moment on before you write a single query. Once it clicks, the rest of SQL becomes intuitive.

The problem with spreadsheets

Consider a bookstore tracking customers and orders in a spreadsheet. One natural approach is to put everything in one sheet: customer name, customer email, order date, order total, all in a single row.

This feels convenient right up until a customer changes their email address. Now you have to find and update every row for that customer. Miss one, and the data is wrong. It gets worse when two employees are editing the spreadsheet at the same time, or when you want to ask questions like "how many unique customers ordered last month?" The structure of the data fights you at every step.

A relational database solves this by separating concerns. Customers live in a customers table. Orders live in an orders table. Each order stores the customer's ID, not the customer's name or email. To look up a customer's email from an order, you follow the ID to the customers table. When the customer changes their email, you update one row in one place, and every order automatically reflects the change.

This practice of organizing data to eliminate redundancy is called normalization. You do not need to memorize the formal rules right now. The instinct is enough: if the same fact appears in more than one place, that is a problem waiting to happen.

Tables, rows, and columns

Everything in a relational database lives in tables. A table is a collection of things of the same kind. The customers table has one row per customer. The orders table has one row per order. The columns describe attributes of each thing.

-- The customers table
id   | name          | email
-----|---------------|------------------
1    | Alice Chen    | alice@example.com
2    | Bob Okafor    | bob@example.com

-- The orders table
id  | customer_id | total  | order_date
----|-------------|--------|------------
101 | 1           | 49.99  | 2024-03-15
102 | 2           | 12.50  | 2024-03-16
103 | 1           | 89.00  | 2024-03-18

The id column in each table is the primary key: a value guaranteed to be unique for every row. It is how tables refer to each other. The customer_id column in orders is a foreign key: it holds the id of a row in customers.

Notice that Alice (id = 1) has two orders without her name or email being duplicated anywhere. The relationship is expressed through the ID.

Every column has a type

This is another place where databases differ from spreadsheets. In a spreadsheet, a cell can hold whatever you put in it. In a database, every column has a declared type, and the database will reject values that do not match.

A column defined as INTEGER will not accept the string "hello." A column defined as DATE will not accept "March 15th." A column with a NOT NULL constraint will refuse to be left empty.

These constraints feel restrictive at first. Over time, you will come to see them as a feature. The database is enforcing the rules so your application code does not have to.

SQL is declarative

When you write SQL, you describe what you want, not how to get it. The database figures out the most efficient path to the result.

This is different from most programming languages. If you wanted to filter a list in Python, you would write a loop that examines each element. In SQL, you write a WHERE clause that describes your criteria, and the database decides whether to use an index, scan the table, or take some other approach entirely.

That distinction will matter more as your queries grow more complex. For now, it means you can focus on what question you are asking rather than the mechanics of answering it.