Partitioning | Unnested

Splitting large tables into smaller pieces for better query performance and data management.

At some point your orders table hits a hundred million rows, and suddenly things slow down - even queries with good indexes. You want to drop old data, but a DELETE on 50 million rows holds a lock for minutes. Index maintenance starts dragging down writes. Partitioning is the tool built for exactly this situation.

The idea is simple: split one large table into smaller physical pieces called partitions. To any query or application, the table still looks like a single unified table. Under the hood, the database stores each partition separately, routes writes to the correct partition, and can skip entire partitions when answering queries whose filters exclude them.

Why partitioning helps

On a table with 500 million rows, even a well-indexed query is working against a large B-tree. The index itself has millions of entries. Partitioning adds a coarser layer of organization above the index: before the database even looks at an index, it can eliminate 90% of the physical storage if the query filter matches the partition key.

Beyond query performance, partitioning helps with data retention. Dropping old data normally requires a large, slow, lock-holding DELETE. When each month's data lives in its own partition, you drop an entire month instantly:

DROP TABLE orders_2022_01;  -- immediate, no rows to scan

This is often the primary reason to partition time-series tables.

Range partitioning

The most common pattern: split by a range of values in a column, almost always a date or timestamp:

CREATE TABLE orders (
  id          BIGINT,
  customer_id BIGINT,
  created_at  TIMESTAMPTZ NOT NULL,
  total       NUMERIC
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024 PARTITION OF orders
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE orders_2025 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

The TO value is exclusive. A row with created_at = '2025-01-01 00:00:00' goes to orders_2025.

Partition pruning

When a query includes a filter on the partition key, the database eliminates irrelevant partitions at planning time:

SELECT * FROM orders WHERE created_at >= '2025-01-01';
-- Only scans orders_2025; orders_2024 is skipped entirely

This is called partition pruning. The surprising part is how strict it is: pruning only kicks in when the partition key appears literally in the filter. A query like SELECT COUNT(*) FROM orders with no date filter scans every partition.

List partitioning

Split by a fixed set of discrete values:

CREATE TABLE orders (...) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US', 'CA', 'MX');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('DE', 'FR', 'UK', 'NL');

List partitioning makes sense when queries frequently filter on a low-cardinality column and the groups have roughly equal sizes.

Hash partitioning

Split rows evenly across N partitions using a hash of the partition key:

CREATE TABLE orders (...) PARTITION BY HASH (customer_id);
CREATE TABLE orders_0 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_1 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_2 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_3 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Hash partitioning achieves even distribution when there is no natural range or list to split on. The tradeoff is real though: no query can prune by the partition key, since any customer could land in any shard. You give up the pruning benefit entirely. What you get instead is even write distribution, which can reduce hot-partition contention on high-throughput tables.

Indexes on partitioned tables

An index created on the parent table is automatically propagated to each partition, with its own index file per partition. The result is that after pruning eliminates irrelevant partitions, the planner can use a targeted index within the one it actually needs to scan.

When partitioning is not the answer

Partitioning adds real complexity: more objects to manage, more schema maintenance, and some genuine gotchas around foreign keys and unique constraints across partitions. People often reach for it too early.

It does not help when:

The table lacks a clear, frequently-filtered partition key
Queries routinely span all partitions (a full table count or aggregation scans everything regardless)
The table has tens of millions of rows but not hundreds of millions - good indexes are likely sufficient

Get your indexes right first. Reach for partitioning when the table is genuinely large, you have a specific performance or data retention problem, and there is a natural partition key that your most important queries actually filter on. If you cannot name those queries, you are not ready to partition yet.