Guide/Intermediate/Self-Joins

Self-Joins

Joining a table to itself to find relationships within the same dataset.

A self-join is a join where both sides of the join reference the same table. You alias the table twice, giving each instance a different name, and the database treats them as if they were two separate tables. It sounds circular, but it is the most natural way to express relationships that live within a single table: hierarchies, comparisons between rows, and finding pairs.

The most common use case: hierarchies

Many tables contain a self-referential relationship. An employees table where each employee has a manager_id pointing to another row in the same table is the canonical example:

SELECT
  e.name  AS employee,
  m.name  AS manager,
  e.title AS employee_title
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Here e represents the employee and m represents the manager. Both aliases point to the same employees table, but the join treats them as two separate sources. The result:

employee  | manager  | employee_title
----------|----------|----------------
Alice     | Carol    | Engineer
Bob       | Carol    | Engineer
Carol     | NULL     | VP Engineering

The LEFT JOIN is important. Carol has no manager, so her manager_id is NULL. An INNER JOIN would exclude her from the result. The LEFT JOIN keeps her and fills the manager columns with NULL.

Comparing rows within the same table

Self-joins let you compare rows against each other. A common question: which employees earn more than their direct manager?

SELECT
  e.name              AS employee,
  e.salary            AS employee_salary,
  m.name              AS manager,
  m.salary            AS manager_salary,
  e.salary - m.salary AS difference
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Without the self-join, answering this question would require either a correlated subquery or pulling the data into application code.

Generating unique pairs

When you want to find all pairs of rows that share some attribute, a self-join works well. The key is to add a condition that prevents duplicates and self-pairings:

SELECT a.name AS person_a, b.name AS person_b
FROM employees a
JOIN employees b ON a.department = b.department
  AND a.id < b.id;

The a.id < b.id condition is doing two things at once. It prevents a row from being paired with itself (since a row's ID cannot be less than itself), and it ensures each pair appears only once (if Alice and Bob are both in Engineering, you get the pair (Alice, Bob) but not (Bob, Alice)). Without this condition, you would get every combination twice plus each employee paired with themselves.

Finding rows with no matching pair

You can use a self-join to find rows that have no related row in the same table. For example, managers who have no direct reports:

SELECT e.name
FROM employees e
LEFT JOIN employees reports ON reports.manager_id = e.id
WHERE reports.id IS NULL;

The left join attempts to find at least one employee who reports to e. When no such employee exists, all of the reports columns are NULL. The WHERE clause keeps only those employees where no direct report was found.

When self-joins are not enough

Self-joins work well for one level of hierarchy: employee to manager, or employee to skip-level. But if you need to traverse an org chart all the way from CEO to individual contributors, you would need to write one self-join per level. If the hierarchy is three levels deep, that is three joins. If the depth is unknown, self-joins do not scale.

For arbitrary-depth traversal, recursive CTEs are the right tool. They are covered in the Advanced section.