Reference Guides/Beginner/Distinct

Distinct

How to eliminate duplicate rows from query results.

SELECT DISTINCT removes duplicate rows from the result set. It is one of the simplest SQL keywords - but a few of its behaviors catch people off guard, especially around multi-column selects and COUNT.

Basic DISTINCT

SELECT DISTINCT country FROM customers;

Without DISTINCT, you get one row per customer - many duplicates if customers share countries. With DISTINCT, each country appears exactly once.

DISTINCT across multiple columns

DISTINCT applies to the entire row, not to a single column. This query returns each unique (country, city) pair:

SELECT DISTINCT country, city FROM customers;

You cannot write SELECT DISTINCT country, city and expect only country to be deduplicated. The deduplication always covers every selected column together.

COUNT(DISTINCT column)

To count how many unique values exist in a column, combine COUNT with DISTINCT:

SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;

This counts unique customer IDs, not total orders. COUNT(*) would give the total number of rows; COUNT(DISTINCT customer_id) gives the number of distinct customers who placed at least one order.

DISTINCT vs GROUP BY

Both produce unique combinations of the selected columns. The difference is purpose and output:

-- DISTINCT: just deduplicate
SELECT DISTINCT department FROM employees;

-- GROUP BY: deduplicate and optionally aggregate
SELECT department, COUNT(*) AS headcount FROM employees GROUP BY department;

When you only need unique values and no aggregation, DISTINCT is cleaner. When you need aggregate functions, you must use GROUP BY. Functionally, SELECT DISTINCT col FROM t and SELECT col FROM t GROUP BY col return identical results - the planner may even produce the same execution plan.

Performance

On large tables, DISTINCT can be expensive - the database has to sort or hash every row to eliminate duplicates. That cost is worth paying when you genuinely need deduplication. When you find yourself reaching for DISTINCT because a join is producing unexpected extra rows, that is usually a sign the join condition is wrong. Fix the join; do not hide the problem behind deduplication.