JSON in SQL
Storing and querying semi-structured data with PostgreSQL's JSONB.
Relational databases are at their best when data has a fixed, well-defined structure. But reality does not always cooperate. Event payloads from third-party systems arrive in varying shapes. User preferences differ from one account to the next. Feature flags accumulate. For these cases, PostgreSQL's JSONB type lets you store flexible, schema-less data inside a relational database, without abandoning the ability to query and index it.
JSON vs JSONB
PostgreSQL has two JSON types:
| Type | Storage | Key order | Duplicates | Query speed |
|---|---|---|---|---|
JSON | Raw text | Preserved | Preserved | Slower |
JSONB | Binary decomposed | Not preserved | Last wins | Faster |
Use JSONB by default. The binary representation is faster to query and supports indexing. JSON is useful only when you need to preserve the original text byte-for-byte, which is rare.
Storing JSON data
CREATE TABLE events (
id BIGINT PRIMARY KEY,
user_id BIGINT,
type TEXT,
payload JSONB
);
INSERT INTO events (user_id, type, payload)
VALUES (1, 'purchase', '{"amount": 49.99, "product_id": 123, "currency": "USD"}');
The payload column stores the entire JSON document. Different rows can have completely different structures inside payload with no schema changes required.
Accessing JSON fields
PostgreSQL provides operators for navigating into JSON:
| Operator | Returns | Example |
|---|---|---|
-> | JSON value | payload -> 'amount' returns 49.99 (JSON) |
->> | Text value | payload ->> 'amount' returns "49.99" (text) |
#> | JSON at path | payload #> '{address,city}' |
#>> | Text at path | payload #>> '{address,city}' |
SELECT payload ->> 'currency' AS currency FROM events;
SELECT (payload ->> 'amount')::NUMERIC AS amount FROM events;
SELECT payload -> 'address' ->> 'city' AS city FROM users;
The ->> operator returns text, so if you need the value as a number, cast it explicitly.
Filtering on JSON fields
-- Equality on a JSON field
WHERE payload ->> 'currency' = 'USD'
-- Numeric comparison (requires cast)
WHERE (payload ->> 'amount')::NUMERIC > 100
-- Containment: does this JSON value contain this sub-document?
WHERE payload @> '{"currency": "USD", "status": "completed"}'
The containment operator @> checks whether the left JSONB value contains all the key-value pairs in the right JSONB value. It is the most flexible way to filter on multiple JSON fields at once, and it can be indexed efficiently.
Indexing JSON
A GIN (Generalized Inverted Index) index supports containment queries and key-existence checks across the entire payload document:
CREATE INDEX idx_events_payload ON events USING GIN (payload);
This makes @> and ? (key exists) queries fast. It indexes every key and value in every document, so it is larger than a B-tree index.
For queries that filter on a specific key, an expression index on just that key is more selective:
CREATE INDEX idx_events_currency ON events ((payload ->> 'currency'));
This is a regular B-tree index on the extracted value. Use it when you frequently filter on one or two specific fields.
Constructing and transforming JSON
-- Build a JSON object from column values
SELECT jsonb_build_object('name', name, 'email', email) FROM users;
-- Aggregate multiple rows into a JSON array
SELECT jsonb_agg(jsonb_build_object('id', id, 'name', name)) FROM products;
-- Expand a JSON array into individual rows
SELECT elem FROM jsonb_array_elements('["a","b","c"]'::jsonb) AS elem;
-- Expand a JSON object into (key, value) rows
SELECT key, value FROM jsonb_each('{"status": "active", "tier": "pro"}'::jsonb);
These functions are useful for transforming JSON data into a shape that fits a query: turning a nested array into rows, or building a JSON response directly in SQL.
When to use JSON and when not to
JSON columns are a tool for specific situations, not a general alternative to schema design.
Use JSON when the data structure genuinely varies by row, when you are storing external payloads you do not control, or when the schema is evolving so rapidly that adding columns every week is impractical.
Avoid JSON when the same fields appear on most rows. Those fields belong in dedicated columns with proper types, indexes, and constraints. A status field stored in a JSON column cannot have a CHECK constraint. A user_id stored in JSON cannot have a foreign key. Querying JSON columns is harder to read and harder for the planner to optimize.
The right mental model: JSON is an escape hatch for genuinely flexible data. Use it when the structure is the problem, not as a shortcut around designing a schema.