Subqueries

Sometimes the answer to one question depends on the answer to another question. Subqueries let you express that directly. A subquery is a complete SELECT statement nested inside another query - the inner query runs first, and its result feeds into the outer query. No temp tables, no application code in between, no round trips to the database.

The placement of the subquery determines what it does. Same idea, four different behaviors.

Subquery in WHERE

The most common pattern: filter the outer query based on a set of values produced by the inner one.

SELECT name, salary
FROM employees
WHERE department_id IN (
  SELECT id FROM departments WHERE location = 'New York'
);

The inner query runs, returns a list of IDs, and the outer query filters against that list. You could write this as a join - and for large tables, a join usually wins on performance. The subquery version shines when the inner logic is self-contained and you just want to read it top-to-bottom.

One thing that trips people up: NOT IN behaves strangely when the subquery can return NULLs. If even one NULL slips into that list, NOT IN returns nothing - no rows at all. See the NULL guide if you hit this.

Scalar subquery

A subquery that returns exactly one row and one column can stand in for any single value - in a SELECT list, in arithmetic, in a comparison.

SELECT
  name,
  salary,
  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

The inner query runs once, produces a single number, and that number gets applied to every row in the outer query. The surprising part is that "once" - it does not re-execute per row, so this is cheaper than it looks.

If a scalar subquery ever returns more than one row, you get a runtime error. Guard against it if you are not certain.

Subquery in FROM (derived table)

You can use any subquery as if it were a table - just put it in the FROM clause and give it an alias. The result is called a derived table.

SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 80000;

The question people ask here is: why not just use HAVING? Often you can. But derived tables unlock more: you can join them to other tables, layer another aggregation on top, or express transforms that HAVING cannot reach.

Correlated subquery

Here is where subqueries get genuinely interesting. A correlated subquery references a column from the outer query, so it runs once per outer row - not once total.

SELECT name, salary, department
FROM employees e1
WHERE salary = (
  SELECT MAX(salary)
  FROM employees e2
  WHERE e2.department = e1.department
);

For each employee, the inner query recalculates the max salary for that employee's department. Ten thousand employees means ten thousand inner-query executions. That is the cost of the power. On large tables, window functions (covered in the Intermediate section) are usually the better tool for "highest value per group" problems. Correlated subqueries earn their keep for cases window functions cannot express.

EXISTS

EXISTS answers a yes/no question: does this subquery return any rows? It does not care what columns come back, only whether the result set is empty or not.

SELECT name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.id
    AND o.total > 500
);

The SELECT 1 is just a convention - you are signaling "I only care about row existence, not values." EXISTS stops the moment it finds a match, so it can be fast. It also handles NULLs correctly, which IN does not always do. For "find rows in A where a matching row exists in B," EXISTS is often the clearest and safest choice.

Subqueries vs joins vs CTEs