How a Database Thinks

Before you write a single query, it helps to understand what a database actually is - not as a definition to memorize, but as a design philosophy. Get this right and the rest of SQL starts to click.

A database is not a spreadsheet with superpowers. It is a system built around one central idea: data has structure, and that structure should be enforced by the system itself, not by convention or careful human attention.

The problem with spreadsheets

Picture a bookstore tracking customers and orders in a spreadsheet. The obvious approach is to put everything in one sheet - customer name, customer email, order date, order total, all in a single row.

This feels convenient right up until a customer changes their email address. Now you have to find and update every row for that customer. Miss one, and the data is wrong. It gets worse when two people are editing the sheet at the same time, or when you want to ask something like "how many unique customers ordered last month?" The structure of the data fights you at every step.

A relational database solves this by separating concerns. Customers live in a customers table. Orders live in an orders table. Each order stores the customer's ID, not the customer's name or email. When the customer changes their email, you update one row in one place, and every order automatically reflects the change.

This practice of organizing data to eliminate redundancy is called normalization. You do not need to memorize the formal rules. The instinct is enough: if the same fact appears in more than one place, that is a problem waiting to happen.

Tables, rows, and columns

Everything in a relational database lives in tables. A table holds one kind of thing. The customers table has one row per customer, the orders table has one row per order, and columns describe the attributes of each.

-- The customers table
id   | name          | email
-----|---------------|------------------
1    | Alice Chen    | alice@example.com
2    | Bob Okafor    | bob@example.com

-- The orders table
id  | customer_id | total  | order_date
----|-------------|--------|------------
101 | 1           | 49.99  | 2024-03-15
102 | 2           | 12.50  | 2024-03-16
103 | 1           | 89.00  | 2024-03-18

The id column in each table is the primary key: a value guaranteed to be unique for every row. It is how tables refer to each other. The customer_id column in orders is a foreign key: it holds the id of a row in customers.

Notice that Alice has two orders and her name and email are not duplicated anywhere. The relationship is expressed entirely through the ID.

Every column has a type

In a spreadsheet, a cell can hold whatever you put in it. Databases are different. Every column has a declared type, and the database will reject values that do not match.

A column defined as INTEGER will not accept the string "hello." A column defined as DATE will not accept "March 15th." A column with a NOT NULL constraint will refuse to be left empty.

This trips people up at first - it feels like the database is being difficult. Over time you come to see it as a feature. The database is enforcing the rules so your application code does not have to.

SQL is declarative

SQL is different from most programming languages in a specific way: you describe what you want, not how to get it. The database figures out the most efficient path to the result.

Think about filtering a list in Python - you write a loop that examines each element. In SQL, you write a WHERE clause that describes your criteria and let the database decide whether to use an index, scan the table, or something else entirely. You never specify the mechanics.

That distinction matters more as your queries grow more complex. For now it just means you can focus on the question you are asking, not on how to answer it. That is a good deal.