Reference Guides/Advanced/JSON in SQL

JSON in SQL

Storing and querying semi-structured data with PostgreSQL's JSONB.

Relational databases shine when data has a fixed, well-defined shape. Reality does not always cooperate. Event payloads from third-party APIs arrive in different structures depending on the event type. User preference objects differ from one account to the next. Feature flags accumulate faster than your migration schedule. For these cases, PostgreSQL's JSONB type lets you store flexible, schema-less data right inside a relational database - without giving up the ability to query and index it.

JSON vs JSONB

PostgreSQL has two JSON types:

TypeStorageKey orderDuplicatesQuery speed
JSONRaw textPreservedPreservedSlower
JSONBBinary decomposedNot preservedLast winsFaster

Use JSONB by default. The binary format is faster to query and supports indexing. JSON is only useful when you need to preserve the original text byte-for-byte, which is rare in practice.

Storing JSON data

CREATE TABLE events (
  id      BIGINT PRIMARY KEY,
  user_id BIGINT,
  type    TEXT,
  payload JSONB
);

INSERT INTO events (user_id, type, payload)
VALUES (1, 'purchase', '{"amount": 49.99, "product_id": 123, "currency": "USD"}');

Each row can have a completely different structure inside payload. No schema change required - that is the whole point.

Accessing JSON fields

PostgreSQL gives you operators for navigating into a JSON document:

OperatorReturnsExample
->JSON valuepayload -> 'amount' returns 49.99 (JSON)
->>Text valuepayload ->> 'amount' returns "49.99" (text)
#>JSON at pathpayload #> '{address,city}'
#>>Text at pathpayload #>> '{address,city}'
SELECT payload ->> 'currency' AS currency FROM events;
SELECT (payload ->> 'amount')::NUMERIC AS amount FROM events;
SELECT payload -> 'address' ->> 'city' AS city FROM users;

The thing that trips people up: ->> always returns text. If you need to compare or do math with that value, you have to cast it explicitly - the planner will not do it for you.

Filtering on JSON fields

-- Equality on a JSON field
WHERE payload ->> 'currency' = 'USD'

-- Numeric comparison (requires cast)
WHERE (payload ->> 'amount')::NUMERIC > 100

-- Containment: does this JSON value contain this sub-document?
WHERE payload @> '{"currency": "USD", "status": "completed"}'

The @> operator is worth knowing well. It checks whether the left JSONB value contains all the key-value pairs on the right side - so you can match multiple fields in one clean expression. More importantly, it can be indexed efficiently, which the text-extraction approach above cannot.

Indexing JSON

A GIN (Generalized Inverted Index) index covers containment and key-existence checks across the whole document:

CREATE INDEX idx_events_payload ON events USING GIN (payload);

This makes @> and ? (key exists) queries fast. The tradeoff is size - it indexes every key and value in every document, so it is considerably larger than a B-tree index.

When you mostly filter on one specific key, an expression index is cheaper and more selective:

CREATE INDEX idx_events_currency ON events ((payload ->> 'currency'));

Think of this as a regular B-tree index on the extracted value. Use it when one or two fields carry most of your query predicates.

Constructing and transforming JSON

-- Build a JSON object from column values
SELECT jsonb_build_object('name', name, 'email', email) FROM users;

-- Aggregate multiple rows into a JSON array
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM products;

-- Expand a JSON array into individual rows
SELECT elem FROM jsonb_array_elements('["a","b","c"]'::jsonb) AS elem;

-- Expand a JSON object into (key, value) rows
SELECT key, value FROM jsonb_each('{"status": "active", "tier": "pro"}'::jsonb);

These functions let you reshape JSON data to fit a query - turning a nested array into rows you can join against, or building a JSON response directly in SQL without a round-trip to the application layer.

When to use JSON and when not to

JSON columns solve a specific problem. They are not a general alternative to schema design, and treating them that way causes real pain later.

Use JSON when the data structure genuinely varies by row, when you are ingesting external payloads you do not control, or when the schema is evolving so fast that running a migration every week is impractical.

Avoid it when the same fields appear on most rows. Those fields belong in dedicated columns. A status field in a JSON column cannot have a CHECK constraint. A user_id buried in JSON cannot have a foreign key. Your queries become harder to read and the planner has less information to optimize them.

The right mental model: JSONB is an escape hatch for genuinely flexible data. Reach for it when the varying structure is the actual problem - not as a way to avoid thinking through the schema.