Aggregation: COUNT, SUM, AVG
Collapsing many rows into a single summary value.
Every query you have written so far has returned rows that correspond one-to-one with rows in the table. Aggregation works differently: it takes an entire set of rows and computes a single summary value from them. This is how you answer questions like "how many orders did we receive?" or "what is the average salary in the engineering department?"
Understanding how aggregation works, and how it interacts with NULL, is essential before moving on to GROUP BY.
The five core aggregate functions
| Function | What it returns |
|---|---|
COUNT(*) | Number of rows in the set |
COUNT(column) | Number of non-NULL values in the column |
SUM(column) | Total of all non-NULL values |
AVG(column) | Arithmetic mean of all non-NULL values |
MIN(column) | Smallest non-NULL value |
MAX(column) | Largest non-NULL value |
SELECT
COUNT(*) AS total_employees,
AVG(salary) AS avg_salary,
MAX(salary) AS highest_salary,
MIN(salary) AS lowest_salary,
SUM(salary) AS total_payroll,
MIN(hire_date) AS earliest_hire
FROM employees;
When you use an aggregate function without a GROUP BY clause, the result is a single row summarizing the entire table. The intermediate rows disappear; you see only the summary.
COUNT(*) versus COUNT(column)
The difference matters more than most beginners expect, and getting it wrong produces silent errors.
COUNT(*) counts every row, regardless of whether any column in that row is NULL. COUNT(column) counts only the rows where that specific column is not NULL.
SELECT
COUNT(*) AS total_orders,
COUNT(shipped_at) AS shipped_orders
FROM orders;
If 80 orders exist but only 65 have a shipped_at timestamp, the first column returns 80 and the second returns 65. Use COUNT(column) when you specifically want to count the presence of a value, and COUNT(*) when you want to count rows regardless.
How NULL affects aggregates
All aggregate functions except COUNT(*) silently ignore NULL values. This is usually the right behavior, but it can surprise you in specific situations.
Consider a bonus column where some employees have no bonus recorded (NULL):
SELECT
SUM(bonus) AS total_bonuses, -- ignores NULLs; sums only what exists
AVG(bonus) AS avg_bonus, -- divides by non-NULL count, not total rows
COUNT(bonus) AS has_bonus -- count of employees with a bonus on record
FROM employees;
If you want AVG to treat missing bonuses as zero rather than ignoring them, you have to be explicit:
SELECT AVG(COALESCE(bonus, 0)) AS avg_bonus_including_zero
FROM employees;
The choice between these two calculations is a modeling question: does "no bonus" mean zero, or does it mean the information is absent? Both are valid interpretations depending on your domain.
DISTINCT inside aggregates
Adding DISTINCT inside an aggregate function makes it operate on unique values only:
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
Without DISTINCT, this would count one entry per order, including multiple orders from the same customer. With DISTINCT, it counts how many different customers have placed at least one order.
You can apply DISTINCT to SUM and other aggregates too, though COUNT(DISTINCT ...) is by far the most common usage.
Aggregation as a foundation
Bare aggregation, summarizing an entire table with no grouping, is useful for quick data checks. Before diving into detailed analysis, it is worth running a few aggregate queries just to understand the data: how many rows exist, what the range of values looks like, whether there are unexpected NULLs. The next guide covers GROUP BY, which applies these same aggregate functions to subgroups of the data rather than the whole table.