Window Functions

Window functions are the feature that separates intermediate SQL from advanced SQL. Most people discover them too late - after writing several painful self-joins or correlated subqueries that could have been a single clean expression.

The key insight: a window function computes a value for each row by looking at a set of related rows, but unlike GROUP BY, it does not collapse the rows. The original row stays intact. You get the aggregate sitting right next to the detail that produced it.

The OVER clause

The OVER clause is what turns an ordinary aggregate function into a window function:

SELECT
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

The result looks like this:

name    | department  | salary | dept_avg
--------|-------------|--------|----------
Alice   | Engineering | 95000  | 102000
Bob     | Engineering | 105000 | 102000
Carol   | Engineering | 106000 | 102000
Diana   | Marketing   | 72000  | 68000
Evan    | Marketing   | 64000  | 68000

Every row keeps its own salary. Every row also gains a dept_avg reflecting the average for its department - no subquery, no join, no second query. The PARTITION BY clause tells the function to reset its calculation for each department, the same way GROUP BY would, except the rows are not collapsed.

If you omit PARTITION BY, the window spans the entire result set:

AVG(salary) OVER ()  -- company-wide average on every row

Ranking functions

Three ranking functions assign positions within a window, and choosing the wrong one is a classic interview mistake:

Function	Behavior on ties
`ROW_NUMBER()`	Always unique; ties get arbitrary sequential numbers
`RANK()`	Tied rows share a rank; the next rank skips (1, 1, 3...)
`DENSE_RANK()`	Tied rows share a rank; no gaps (1, 1, 2...)

SELECT
  name,
  department,
  salary,
  ROW_NUMBER()  OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()        OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
  DENSE_RANK()  OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

The most common real-world use is finding the top-N rows per group. You cannot filter on a window function in the same WHERE clause where it is computed - you have to wrap it first:

SELECT * FROM (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) ranked
WHERE rn = 1;

This finds the single highest-paid employee in each department. Ties are broken arbitrarily - if two people share the top salary, you get whichever one the planner picks. Switch to RANK() = 1 if you want all tied employees returned.

Running totals

Add ORDER BY inside OVER and any aggregate becomes cumulative. The function uses all rows from the start of the partition up to and including the current row:

SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

Each row's running_total accumulates all orders up to that date. Running balances, cumulative revenue charts, and streak tracking all follow this same pattern.

LAG and LEAD

Before window functions, comparing a row to its neighbor meant a self-join. LAG and LEAD make that trivial:

SELECT
  order_date,
  amount,
  LAG(amount)  OVER (ORDER BY order_date) AS prev_amount,
  LEAD(amount) OVER (ORDER BY order_date) AS next_amount,
  amount - LAG(amount) OVER (ORDER BY order_date) AS change_from_prev
FROM orders;

LAG looks one row back by default. LEAD looks one row forward. Both accept an offset as a second argument and a default value as a third - handy for the first or last row where no neighbor exists:

LAG(amount, 1, 0) OVER (ORDER BY order_date)  -- defaults to 0 for the first row

NTILE

NTILE(n) divides rows into n roughly equal buckets and returns the bucket number for each row:

SELECT name, salary, NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

Bottom quarter gets 1, top quarter gets 4. The trick is that "roughly equal" means uneven distributions when the row count is not divisible by n - earlier buckets get the extra rows. For clean percentile segmentation without worrying about exact cutoffs, NTILE is your friend.

Frame clauses

By default, a window with ORDER BY uses a frame that spans from the first row of the partition to the current row - which is why SUM(...) OVER (ORDER BY date) gives you a running total without any extra syntax. You can override that default with a frame clause:

-- 7-day rolling average (including today and the 6 preceding days)
SELECT
  order_date,
  amount,
  AVG(amount) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7day_avg
FROM orders;

ROWS BETWEEN counts physical rows. RANGE BETWEEN counts by value, which behaves differently when multiple rows share the same date. For rolling windows, ROWS BETWEEN is almost always what you want.