Good database schema design in 2026 still rests on the same unglamorous discipline it always has: model the domain honestly, normalize first, and denormalize only after a specific query bottleneck proves it earns its keep. What has changed is the toolkit around that discipline — PostgreSQL is the default relational choice, and PostgreSQL 18 shipped genuinely useful new primitives that change a handful of long-standing schema decisions. This reference assumes you have already settled the engine question; if you have not, our SQL versus NoSQL decision matrix maps access patterns, consistency, and scale to the right database before you reach the schema.

The cost of getting schema wrong is asymmetric. Application code is cheap to refactor; a primary-key choice or a denormalized table baked into a billion rows is not. The decisions in this reference — primary keys, JSONB versus columns, normalization level, indexing, multi-tenancy, and migration tooling — are the ones that are expensive to reverse once data accumulates, which is exactly why they deserve deliberate thought up front.

This guide is organized as a working reference, not a tutorial. It opens with two original decision tables, walks the primary-key and JSONB trade-offs with the numbers that actually drive them, covers the seven Postgres index types you will reach for, and closes on multi-tenancy, row-level security, and the 2026 migration-tooling landscape. Postgres specifics are kept current as of PostgreSQL 18.4 (May 14, 2026).

Key takeaways

01
Normalize first; denormalize only on evidence.Start at Third Normal Form. Denormalize a specific column or introduce a materialized view only after EXPLAIN ANALYZE proves a join is the measured bottleneck — not on a hunch.
02
Default to BIGSERIAL; reach for UUID v7 deliberately.BIGSERIAL benchmarks 20–40x faster on reads and uses half the on-disk width (8 bytes vs 16). UUID v7 is the modern timestamp-ordered key, justified only when IDs must be unique across databases without coordination.
03
Typed columns for core fields, JSONB for the remainder.Bare JSONB was roughly 160x slower than a column in one 50K-row benchmark. The EAV anti-pattern is orders of magnitude worse still. Extract frequently-queried fields to typed columns and keep JSONB for the genuinely variable tail.
04
PostgreSQL 18 changed real schema decisions.Native uuidv7(), virtual generated columns by default, WITHOUT OVERLAPS temporal constraints, B-tree skip scan, and OLD/NEW in RETURNING are all new in the September 25, 2025 release and affect how you write DDL today.
05
Multi-tenancy is a cost-versus-isolation trade.Shared schema with tenant_id is cheapest but highest data-leak risk; database-per-tenant is the most isolated and runs 3–5x more expensive to maintain. Row-Level Security makes the shared model safer when deployed under an unprivileged role.

01 — First PrinciplesOne decision matrix for the choices that are expensive to reverse.

Most schema-design advice is scattered across a dozen separate guides — one for primary keys, another for JSONB, a third for normalization. In practice you make these decisions together, at the same table, on the same afternoon. The matrix below consolidates the eight decisions that matter into a single scannable view, with the 2026 default, the condition under which you should deviate, the performance reason, and — crucially — what PostgreSQL 18 changed for each.

Schema design decision matrix mapping eight common decisions to their recommended 2026 default, when to deviate, the performance implication, and the PostgreSQL 18 change. Sources: PostgreSQL official documentation, BSWEN JSONB benchmark, dasroot.net multi-tenancy patterns, and Encore ORM comparison, retrieved June 15, 2026.
Decision	Recommended default 2026	When to deviate	Performance implication	PostgreSQL 18 change
Primary key strategy	BIGSERIAL / identity column	Switch to UUID v7 only when IDs must be unique across multiple databases without coordination	BIGSERIAL reads benchmark 20–40x faster; UUID is 16 bytes vs 8	Native uuidv7() function added
Variable attributes	Typed columns for core fields	Add one JSONB column for the genuinely variable remainder	Bare JSONB was 160x slower in one 50K-row benchmark; GIN narrows the gap	Parallel GIN index builds
Normalization level	Third Normal Form (3NF)	Denormalize a specific column only after EXPLAIN ANALYZE proves a join is the bottleneck	Over-normalization can force 10+ joins on simple reads	No change
Deleting rows	Soft delete via nullable deleted_at	Hard delete when regulation or storage demands true removal	Partial index WHERE deleted_at IS NULL keeps unique constraints honest	OLD / NEW in RETURNING aids audit trails
Multi-tenancy model	Shared schema + tenant_id	Schema- or database-per-tenant when isolation outranks cost	Database-per-tenant runs 3–5x more expensive to maintain	No change
Default index type	B-tree	GIN for JSONB / arrays / full-text; BRIN for append-only time-series	B-tree is O(log n); BRIN only helps when values track physical order	B-tree skip scan can omit leading columns
Migration tooling	Forward-only with feature flags	Atlas or Liquibase when automated reverse migrations are mandatory	Rollbacks treated as a last resort, not a routine	No change
Audit columns	created_at, updated_at, deleted_at on every table	Add a dedicated history table for full change capture	updated_at maintained by trigger; near-zero read cost	Virtual generated columns reduce write overhead

Read the matrix as a set of starting hypotheses, not commandments. Every row has a deviation condition for a reason: the right answer depends on your access patterns, read-to-write ratio, and isolation requirements. What the table buys you is a defensible default for each decision and a clear signal of the one piece of evidence that should make you change your mind.

02 — Primary KeysUUID v7 versus BIGSERIAL, decided by one question.

The primary-key debate has a clean decision boundary that most posts bury under nuance. Ask one question: do your IDs need to be unique across multiple databases without a central coordinator? If no — which covers the single-cluster Postgres deployment most SaaS products run on — BIGSERIAL is the right default. If yes (sharded systems, client-generated IDs for offline-first apps), UUID v7 is the modern answer.

The reason BIGSERIAL wins on a single cluster is mechanical. BIGSERIAL is an 8-byte monotonically increasing integer, so inserts always append to the right edge of the B-tree. UUID v7 fixes the historic UUID problem — it is timestamp-ordered, so unlike random UUID v4 it also appends to the end of the index rather than scattering page splits across the tree — but it is still a 16-byte value, double the on-disk width. Benchmarks put BIGSERIAL at roughly 20–40x faster on reads, though that figure varies sharply with index size, cache hit rate, and concurrency, so treat it as directional rather than a fixed measurement.

Single cluster

BIGSERIAL / identity column

8-byte sequential integer. Reads benchmark 20–40x faster than UUID v7 and use half the on-disk width (8 bytes vs 16). The right default for the single-Postgres-cluster deployment most SaaS products run on.

Default to BIGSERIAL

Distributed

UUID v7 — timestamp-ordered

16-byte, timestamp-ordered so inserts append to the end of the B-tree instead of scattering page splits like UUID v4. Justified when IDs must be globally unique across databases without coordination — sharded systems, offline-first client-generated IDs.

Pick UUID v7 deliberately

Avoid

UUID v4 as a primary key

Random UUID v4 causes the page splits that gave UUIDs their performance reputation. If you need a UUID at all, prefer v7. PostgreSQL 18 keeps uuidv4() as an alias for gen_random_uuid() but adds native uuidv7() precisely so the ordered variant is one function call away.

Prefer v7 over v4

PostgreSQL 18 · native UUID v7

Before PostgreSQL 18 you generated UUID v7 in application code or with an extension. The September 25, 2025 release ships a native uuidv7() function for timestamp-ordered UUIDs, with uuidv4() added as an alias for gen_random_uuid(). If you do decide UUID v7 is warranted, the database can now generate it for you — no client library required.

03 — JSONB vs ColumnsThe escalation ladder from EAV to extracted columns.

Storing variable attributes is where schemas quietly rot. The worst answer is the Entity-Attribute-Value (EAV) anti-pattern — one tall table of entity_id, attribute, value rows that forces a self-join for every field you want back. In Bytebase’s testing, EAV ran orders of magnitude slower than the JSONB equivalent for unindexed queries; the published figure is dramatic but carries no stated methodology, so treat it qualitatively: EAV is the pattern to escape, not to tune.

JSONB is the right escape hatch, but it is not free. In one 50,000-row benchmark a bare JSONB query took 3.2 seconds against 0.02 seconds for an equivalent typed column — about 160x slower — and at a million rows the same data carried roughly 26 MB of storage against 4 MB for a column, a 6.5x overhead driven by JSON repeating every key name on every row. A GIN index closes much of the read gap but does not erase the storage cost. The discipline is a four-rung ladder: escape EAV, land on JSONB, index it with GIN when you query it, and extract the fields you query most into typed columns.

JSONB cost vs a typed column · illustrative, one benchmark

Source: BSWEN JSONB vs Columns benchmark (2026-04-24), single 50K-row test

Typed column (baseline)50K rows · 0.02s query

Bare JSONB, no index50K rows · 3.2s query · BSWEN benchmark

~160x slower

JSONB storage at 1M rows26 MB vs 4 MB for a column · key-name repetition

~6.5x

BSWEN’s row-count thresholds give a practical rule of thumb: bare JSONB is acceptable below roughly 100,000 rows, you should extract frequently-queried fields to columns in the 100K–1M range, and above a million rows JSONB should be reserved for genuinely non-core attributes. Remember the numbers above come from a single blog benchmark on one workload — use them to set direction, then measure on your own data and query shapes before committing the schema.

Rung 1 → 2

Escape the anti-pattern

EAV→ JSONB

EAV forces a self-join per attribute and runs orders of magnitude slower than JSONB for unindexed queries (Bytebase, methodology unstated). The first move is always: collapse EAV into a single JSONB column.

Bytebase

Rung 2 → 3

Index what you query

GIN

A GIN index makes JSONB containment and key lookups fast. It is slower to write but faster to read, which suits document columns queried far more than they are updated. PostgreSQL 18 added parallel GIN index builds.

PostgreSQL 18

Rung 3 → 4

Extract hot fields to columns

100Krows

Past roughly 100K rows, pull the fields you filter and sort on into typed columns. Above a million rows, keep JSONB strictly for the non-core variable tail — the storage and query overhead stops being acceptable.

BSWEN thresholds

04 — IndexingSeven index types, and the one you default to.

PostgreSQL ships several index types and the official documentation describes each in isolation. The table below puts the ones you actually reach for side by side, with write overhead, read characteristics, and the PostgreSQL 18 enhancement for each. The default is almost always B-tree; the value of knowing the others is recognizing the specific cases — JSONB, append-only time-series, index-only scans — where a different type is decisively better. For a deeper treatment, see our companion reference on indexing strategy for read-heavy workloads.

PostgreSQL index type selection guide comparing B-tree, Hash, GiST, SP-GiST, GIN, BRIN, and covering (INCLUDE) indexes by use case, example column type, write overhead, read characteristics, and PostgreSQL 18 enhancement. Source: PostgreSQL official index-type documentation and PostgreSQL 18 release notes, retrieved June 15, 2026.
Index type	Use case	Example column	Write overhead	Read profile	PG18 enhancement
B-tree	Default — equality, ranges, ORDER BY	Most scalar columns	Low	O(log n)	Skip scan: omit leading columns
Hash	Equality only (=)	High-cardinality lookup keys	Low	Fast on =	No change
GiST	Geometric / spatial / range overlap	Geometry, ranges	Medium	Medium	Supports INCLUDE covering
SP-GiST	Non-balanced structures (k-d trees)	Quadtrees, IP ranges	Medium	Medium	Supports INCLUDE covering
GIN	Arrays, JSONB, full-text	Document columns, tags	High	Fast on contains	Parallel index builds
BRIN	Large physically-ordered tables	Append-only logs, time-series	Very low	Range only, if correlated	No change
Covering (INCLUDE)	Index-only scans without heap access	Hot read paths	Medium	Index-only	B-tree dedup disabled with INCLUDE

Two operational details matter more than the type choice itself. First, BRIN earns its tiny footprint only when column values correlate with physical row order — append-only logs and time-series qualify, randomly-inserted data gets no benefit at all. Second, on a live table you must build with CREATE INDEX CONCURRENTLY, which avoids locking writes at the cost of two table scans instead of one, and cannot run inside a transaction block. Forgetting the CONCURRENTLY keyword on a production table is one of the most common ways a routine index addition becomes an outage.

PostgreSQL 18’s B-tree skip scan is the quietly significant change here: queries can now use a multi-column index even when they do not constrain the leading column, which makes some existing indexes useful for queries they previously could not serve. It does not remove the need to order composite indexes thoughtfully, but it softens the penalty for getting that order slightly wrong.

05 — NormalizationThe breakpoint nobody publishes.

The normalization target for most production systems is Third Normal Form. 3NF eliminates transitive dependencies; BCNF cleans up the edge cases 3NF misses; and 4NF and beyond, which handle multi-valued dependencies, are rarely needed outside data warehousing. The advice to “normalize first, denormalize later” is correct but useless without a breakpoint — a concrete signal for when later has arrived.

Here is a heuristic the standard guides omit. Run EXPLAIN ANALYZE on the slow query. If a sequential scan or a specific join dominates the cost, measure how often the columns you would denormalize actually change. If the read-to-write ratio on those columns is high — say beyond 10:1 — denormalization or a materialized view pays off, because you amortize the duplication cost across many reads per write. If writes are frequent, you will spend more keeping the copies consistent than you save on reads, and you should leave the schema normalized. A Redis caching layer is often the better answer than denormalizing the table at all.

The best PostgreSQL schema design is usually not fully normalized or fully denormalized. It is a deliberate balance based on query patterns, update frequency, and operational complexity.— Elysiate Engineering Blog, PostgreSQL Normalization vs Denormalization Guide

Anti-pattern · over-normalization

Normalization has a failure mode too. Bytebase documents a fintech startup that collapsed under peak load after decomposing currency codes, transaction types, and timestamp components into separate lookup tables — turning simple reads into queries requiring 10+ joins. The lesson is symmetric with denormalization: more tables is not automatically more correct. And whatever you do, never store money as FLOAT — use DECIMAL / NUMERIC for exact precision and no floating-point rounding errors.

06 — Audit & LifecycleAudit columns, soft deletes, and temporal constraints.

Three columns belong on essentially every table: created_at TIMESTAMP NOT NULL DEFAULT now(), updated_at TIMESTAMP NOT NULL DEFAULT now() maintained by a trigger, and a nullable deleted_at TIMESTAMP for soft deletes. The soft-delete pattern sets deleted_at to now() instead of removing the row, which preserves history and referential integrity. It also creates the classic trap: a user soft-deletes their account, then tries to re-register with the same email, and a naive unique constraint rejects them. The fix is a partial unique index — WHERE deleted_at IS NULL — so the constraint applies only to live rows.

Soft-delete and audit columns are part of the same reliability story as request-level deduplication. If your writes arrive over HTTP, pair these lifecycle columns with idempotency keys and audit tables so retried requests do not create duplicate rows or corrupt the audit trail.

PostgreSQL 18 · WITHOUT OVERLAPS

PostgreSQL 18’s temporal constraints are an underreported win for scheduling schemas. The new WITHOUT OVERLAPS clause lets PRIMARY KEY, UNIQUE, and FOREIGN KEY constraints range over time periods — so the database itself rejects overlapping bookings for the same room, clinic slot, or vehicle, with no application-layer guard or trigger needed. Pair it with virtual generated columns (computed on read, the new default) and OLD / NEW values in RETURNING for audit comparisons, and a class of hand-written safeguards moves into the schema where it belongs.

07 — Multi-TenancyCost versus isolation, and where RLS fits.

Multi-tenancy is the schema decision with the widest cost spread. Three patterns sit on a spectrum: a shared schema with a tenant_id column is the cheapest to operate but carries the highest data-leak risk; schema-per-tenant offers moderate isolation at the price of migration complexity; and database-per-tenant delivers maximum isolation but runs 3–5x more expensive to maintain. In Nile’s benchmark across a thousand tenants, the shared schema showed the highest CPU efficiency with the weakest isolation while database-per-tenant inverted that — a useful directional signal, but one stated by a vendor with a commercial interest in the shared-schema model, so verify against your own load before treating it as definitive.

Lowest cost

Shared schema + tenant_id

one schema · one tenant_id column

Cheapest to run, highest data-leak risk. Every indexed column should lead with tenant_id — e.g. CREATE INDEX idx_orders_status ON orders(tenant_id, status). Row-Level Security is what makes this model defensible.

Best for most SaaS at start

Moderate isolation

Schema-per-tenant

one Postgres schema per tenant

One Postgres schema per tenant inside a shared database. Stronger isolation than a tenant_id column, but migrations now run across N schemas, which is the recurring operational tax of this pattern.

Migration complexity

Maximum isolation

Database-per-tenant — 3–5x cost

one database per tenant

The strongest isolation available and the obvious choice when a single tenant's data must be physically separable. It also runs roughly 3–5x more expensive to maintain, so reserve it for tenants or regulations that genuinely require it.

Isolation outranks cost

If you choose the shared-schema model, PostgreSQL Row-Level Security is the mechanism that keeps tenants apart. RLS has a property worth internalizing: define a policy that is too restrictive, or forget one entirely, and queries simply fail closed — a far safer failure mode than a hand-written WHERE tenant_id = ? that someone eventually forgets to add. The pitfalls are specific: superusers always bypass RLS, so deploy under an unprivileged role; thread-local tenant context must be reset after every request; and a policy that queries its own table can recurse, which you resolve with a SECURITY DEFINER function.

The biggest benefit of RLS is that if you define a policy that's too restrictive, or forget to define a policy, things just fail, compared to dynamic queries where forgetting to add a WHERE will leak data.— Nile Blog, Multi-Tenant SaaS with PostgreSQL RLS

08 — MigrationsForward-only by default, and the 2026 tooling field.

The single most important migration rule is one of discipline, not tooling: prefer forward-only migrations with feature flags to control exposure, and treat rollbacks as a last resort. This is not a limitation to work around — it is the design. Neither Prisma nor Drizzle ships built-in rollback by design; Atlas auto-calculates reverse migrations and Flyway’s free tier expects you to write a new migration to undo a change. The safe path on a production database is to make additive, reversible-by-construction changes and gate the cutover behind a flag.

On the ORM side, the two dominant TypeScript options remain Drizzle and Prisma. Prisma 7, released November 19, 2025, replaced its Rust query engine with a TypeScript/WASM core; its vendor-reported bundle size dropped from about 14 MB to roughly 1.6 MB, narrowing but not fully closing the gap to Drizzle. The figure is Prisma-adjacent, so read it as a direction of travel rather than an audited measurement.

TypeScript-first

Drizzle Kit

Generates SQL diffs from a TypeScript schema. No built-in rollback by design. The lean choice when you want SQL to stay visible and the bundle small.

TS-first, SQL-visible

Managed workflow

Prisma Migrate

Shadow-database workflow with a managed migration history. Prisma 7's TypeScript/WASM rewrite cut bundle size sharply. No built-in rollback — forward-only with flags.

Managed, forward-only

Declarative

Atlas — auto-reverse

Declarative HCL, Terraform-like, and the one tool here that auto-calculates reverse migrations. Reach for Atlas (or JVM-based Liquibase, which pairs explicit rollback scripts) when automated rollback is a hard requirement.

When rollback is mandatory

Language-agnostic

Flyway / Liquibase

Versioned SQL files (Flyway, ~400K downloads, approximate) or JVM-based changelogs with paired rollback scripts (Liquibase). The pick when migrations must be language-agnostic and run outside the application runtime.

Outside the app runtime

The tooling choice is downstream of two questions: do you want SQL to stay visible (Drizzle, Flyway) or abstracted (Prisma, Liquibase), and do you need automated reverse migrations (Atlas, Liquibase) or are forward-only with flags acceptable? For most teams on the default stack, Drizzle or Prisma with forward-only discipline is the pragmatic answer — and if you are standing up a new Postgres-backed product, our web development engagements start from exactly these defaults so the schema is sound before the first table fills up.

09 — ConclusionThe decisions that are expensive to reverse.

Schema design, June 2026

The fundamentals held; the toolkit around them got better.

The throughline of every decision in this reference is the same: model honestly, normalize first, and change a default only on evidence you can point to. The numbers — 20–40x for BIGSERIAL reads, roughly 160x for bare JSONB in one benchmark, 3–5x for database-per-tenant — are not commandments. They are reasons to set a defensible default and signals for when to revisit it.

What genuinely shifted in 2026 is the platform underneath. PostgreSQL 18 turned several schema decisions that used to need application code or extensions into one-line DDL: native uuidv7(), temporal WITHOUT OVERLAPS constraints, virtual generated columns by default, and B-tree skip scan. The forward-looking move is to learn what your current Postgres version actually changed and let the database absorb the safeguards you used to hand-write — then keep the discipline that no benchmark can replace.

Schema is the one layer of an application where a wrong call gets more expensive every day it survives. Spend the deliberate hour on it up front, lean on PostgreSQL 18 where it now does the work for you, and treat every denormalization, every JSONB column, and every tenancy choice as a hypothesis you measure rather than a verdict you assume.

Database Schema Design in 2026: Engineering Reference