Guide/Beginner/Subqueries

Subqueries

Queries inside queries: how to use one result as input for another.

A subquery is a complete SELECT statement nested inside another query. The inner query executes first, and its result is handed to the outer query to use. This lets you express multi-step logic in a single statement rather than running multiple queries and combining the results yourself.

Subqueries can appear in several places in a SQL statement, and each placement has a different purpose.

Subquery in WHERE

The most common use: generate a list of values to filter against.

SELECT name, salary
FROM employees
WHERE department_id IN (
  SELECT id FROM departments WHERE location = 'New York'
);

The inner query runs first and returns a list of department IDs. The outer query then filters employees to those in that list. You could write this as a join instead, and for large tables a join often performs better. But subqueries in WHERE can be easier to read when the logic is self-contained.

A NOT IN version finds employees not in those departments. Be careful: if the subquery can return NULL values, NOT IN behaves unexpectedly. Review the NULL guide if you encounter this.

Scalar subquery

A subquery that returns exactly one row and one column can be used anywhere a single value is expected: in the SELECT clause, in arithmetic, in comparisons.

SELECT
  name,
  salary,
  salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

The inner query runs once, returns a single number (the company-wide average), and that number is then subtracted from each employee's salary in turn. This is efficient because the subquery executes once regardless of how many rows the outer query returns.

A scalar subquery that returns more than one row is a runtime error. If you are not certain a subquery will return exactly one row, guard against it.

Subquery in FROM (derived table)

You can treat any subquery as if it were a table by placing it in the FROM clause. The result is called a derived table, and you must give it an alias.

SELECT dept, avg_salary
FROM (
  SELECT department AS dept, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) AS dept_averages
WHERE avg_salary > 80000;

This pattern is useful when you need to filter on an aggregated value. You might think HAVING could handle this, and often it can. But derived tables let you write more complex logic: you can join the derived table to other tables, apply additional aggregations on top of it, or express transformations that HAVING cannot.

Correlated subquery

A correlated subquery references a column from the outer query. Unlike the other types, it runs once for every row the outer query processes.

SELECT name, salary, department
FROM employees e1
WHERE salary = (
  SELECT MAX(salary)
  FROM employees e2
  WHERE e2.department = e1.department
);

For each employee in the outer query, the inner query finds the maximum salary in that employee's department. If the outer query processes 10,000 employees, the inner query runs 10,000 times. This is powerful but can be slow on large tables.

Window functions, covered in the Intermediate section, are usually the better tool for this kind of "per-group maximum" problem. But correlated subqueries remain useful for cases that window functions cannot express.

EXISTS

EXISTS checks whether a subquery returns any rows at all. It does not care what columns the subquery returns, only whether the result is empty or not.

SELECT name
FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.customer_id = c.id
    AND o.total > 500
);

The SELECT 1 inside the subquery is a convention: you are not interested in the values, only the presence of rows. EXISTS stops as soon as it finds the first matching row, which makes it efficient.

EXISTS also handles NULL correctly, which IN does not always do. For "find rows in A where a related row exists in B," EXISTS is often the safest and clearest choice.

Subqueries vs joins vs CTEs

The same problem can often be solved with a subquery, a join, or a CTE. In general: use a join when you need data from both sides of the relationship. Use a subquery when the inner result is self-contained and you only need a filter or a scalar value. Use a CTE (covered next) when the subquery is complex or needs to be referenced more than once.