# PostgreSQL Schema Design Reference

## Table of Contents

1. [Normalization Quick Guide](#normalization-quick-guide)
2. [Data Types Deep Dive](#data-types-deep-dive)
   - [JSONB](#jsonb)
   - [Arrays](#arrays)
   - [Range Types](#range-types)
   - [Composite Types](#composite-types)
   - [Domain Types](#domain-types)
3. [Constraints](#constraints)
4. [Generated Columns](#generated-columns)
5. [Table Inheritance and Partitioning](#table-inheritance-and-partitioning)
6. [Row-Level Security](#row-level-security)

---

## Normalization Quick Guide

### 1NF - First Normal Form
Each column holds atomic values; no repeating groups; each row uniquely identified.

```sql
-- Violates 1NF: phone_numbers is a comma-separated list
CREATE TABLE contacts_bad (
    id      integer PRIMARY KEY,
    name    text,
    phones  text   -- "555-1234, 555-5678"
);

-- 1NF compliant: one phone per row
CREATE TABLE contacts (
    id   integer PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE contact_phones (
    contact_id integer REFERENCES contacts(id),
    phone      text NOT NULL,
    PRIMARY KEY (contact_id, phone)
);
```

### 2NF - Second Normal Form
Must be 1NF. Every non-key column depends on the *entire* primary key (eliminates partial dependencies in composite-key tables).

```sql
-- Violates 2NF: product_name depends only on product_id, not the full key
CREATE TABLE order_items_bad (
    order_id     integer,
    product_id   integer,
    product_name text,    -- partial dependency
    quantity     integer,
    PRIMARY KEY (order_id, product_id)
);

-- 2NF compliant: move product_name to products table
CREATE TABLE products (
    id   integer PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE order_items (
    order_id   integer,
    product_id integer REFERENCES products(id),
    quantity   integer NOT NULL,
    PRIMARY KEY (order_id, product_id)
);
```

### 3NF - Third Normal Form
Must be 2NF. No transitive dependencies (non-key columns depending on other non-key columns).

```sql
-- Violates 3NF: zip_code -> city, zip_code -> state (transitive)
CREATE TABLE employees_bad (
    id        integer PRIMARY KEY,
    name      text,
    zip_code  text,
    city      text,   -- depends on zip_code, not id
    state     text    -- depends on zip_code, not id
);

-- 3NF compliant
CREATE TABLE zip_codes (
    zip   text PRIMARY KEY,
    city  text NOT NULL,
    state text NOT NULL
);

CREATE TABLE employees (
    id       integer PRIMARY KEY,
    name     text NOT NULL,
    zip_code text REFERENCES zip_codes(zip)
);
```

### When to Denormalize

Denormalization trades write complexity for read performance. Justify it with EXPLAIN ANALYZE evidence, not intuition.

| Scenario | Denormalization Approach |
|----------|--------------------------|
| Frequent aggregate reads | Materialized view or stored summary column |
| Immutable reference data | Embed directly (e.g., country name at order time) |
| Hot join path with no writes | Redundant column with trigger to keep in sync |
| Reporting / OLAP workload | Star schema, wide fact tables |

```sql
-- Example: store calculated total on order to avoid summing line items every read
ALTER TABLE orders ADD COLUMN total_cents integer NOT NULL DEFAULT 0;

-- Keep in sync via trigger
CREATE FUNCTION recalc_order_total() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    UPDATE orders
    SET total_cents = (
        SELECT COALESCE(SUM(unit_price_cents * quantity), 0)
        FROM order_items
        WHERE order_id = COALESCE(NEW.order_id, OLD.order_id)
    )
    WHERE id = COALESCE(NEW.order_id, OLD.order_id);
    RETURN NEW;
END;
$$;

CREATE TRIGGER trg_order_items_total
AFTER INSERT OR UPDATE OR DELETE ON order_items
FOR EACH ROW EXECUTE FUNCTION recalc_order_total();
```

---

## Data Types Deep Dive

### JSONB

JSONB stores JSON as a binary decomposed format. Supports indexing; operators work directly on the stored value. Use `jsonb` over `json` unless you need to preserve key order or duplicate keys.

#### Operators

```sql
-- @>  containment: does left contain right?
SELECT * FROM products WHERE attributes @> '{"color": "red"}';

-- ->  extract field as jsonb
SELECT data -> 'address' FROM users;

-- ->> extract field as text
SELECT data ->> 'email' FROM users;

-- #>  extract at path as jsonb
SELECT data #> '{address, city}' FROM users;

-- #>> extract at path as text
SELECT data #>> '{address, city}' FROM users;

-- jsonb_path_query (SQL/JSON path, PG12+)
SELECT jsonb_path_query(data, '$.orders[*].amount ? (@ > 100)') FROM users;

-- jsonb_path_exists
SELECT * FROM users WHERE jsonb_path_exists(data, '$.tags[*] ? (@ == "premium")');

-- Modifying JSONB
UPDATE users SET data = data || '{"verified": true}';         -- merge/overwrite key
UPDATE users SET data = data - 'temp_field';                   -- remove key
UPDATE users SET data = jsonb_set(data, '{address,zip}', '"90210"');
```

#### Indexing JSONB

```sql
-- GIN default: supports @>, ?, ?|, ?& on all keys and values
CREATE INDEX idx_products_attrs ON products USING gin(attributes);

-- GIN jsonb_path_ops: supports only @> but uses less space and is faster for containment
CREATE INDEX idx_products_attrs_path ON products USING gin(attributes jsonb_path_ops);

-- B-tree on extracted scalar: for equality/range on a known field
CREATE INDEX idx_users_email ON users ((data ->> 'email'));

-- B-tree on cast extracted value
CREATE INDEX idx_orders_amount ON orders ((data ->> 'amount')::numeric);
```

#### When to Use JSONB vs Relational Columns

| Use JSONB When | Use Relational Columns When |
|----------------|----------------------------|
| Schema varies per row (EAV alternative) | Column is queried in WHERE, JOIN, or ORDER BY frequently |
| Optional metadata with sparse keys | Column participates in foreign key |
| Storing external API payloads as-is | Strong type enforcement required |
| Prototyping before schema stabilizes | Aggregate functions (SUM, AVG) on the field |

---

### Arrays

PostgreSQL native arrays allow storing multiple values of the same type in a single column.

```sql
CREATE TABLE articles (
    id   integer PRIMARY KEY,
    tags text[]
);

INSERT INTO articles (id, tags) VALUES (1, ARRAY['postgres', 'sql', 'performance']);
INSERT INTO articles (id, tags) VALUES (2, '{"nosql","databases"}');  -- literal syntax
```

#### Operators

```sql
-- ANY: value matches any element
SELECT * FROM articles WHERE 'postgres' = ANY(tags);

-- ALL: condition holds for every element
SELECT * FROM articles WHERE 5 > ALL(ARRAY[1,2,3,4]);

-- @>  contains (left contains right)
SELECT * FROM articles WHERE tags @> ARRAY['sql', 'postgres'];

-- <@  is contained by
SELECT * FROM articles WHERE ARRAY['sql'] <@ tags;

-- &&  overlap (share at least one element)
SELECT * FROM articles WHERE tags && ARRAY['postgres', 'mysql'];

-- Appending / removing
UPDATE articles SET tags = tags || ARRAY['new-tag'] WHERE id = 1;
UPDATE articles SET tags = array_remove(tags, 'old-tag') WHERE id = 1;

-- Array length and access
SELECT array_length(tags, 1), tags[1] FROM articles;  -- 1-indexed
```

#### Indexing Arrays

```sql
-- GIN index for @>, <@, &&, ANY equality
CREATE INDEX idx_articles_tags ON articles USING gin(tags);
```

#### Arrays vs Junction Tables

| Use Arrays When | Use Junction Tables When |
|-----------------|--------------------------|
| List is small and bounded | Elements have their own attributes |
| No referential integrity needed | Many-to-many with query filters on the joined entity |
| Queries use containment/overlap operators | Need to query "all articles for a tag" efficiently |
| Ordering within the list matters | Cardinality is high or unbounded |

---

### Range Types

Range types represent a range of values of a base type. Built-in range types: `int4range`, `int8range`, `numrange`, `tsrange`, `tstzrange`, `daterange`.

```sql
CREATE TABLE room_bookings (
    id          serial PRIMARY KEY,
    room_id     integer NOT NULL,
    booked_at   tsrange NOT NULL
);

INSERT INTO room_bookings (room_id, booked_at) VALUES
    (1, '[2024-03-01 09:00, 2024-03-01 11:00)'),  -- inclusive start, exclusive end
    (1, '[2024-03-01 14:00, 2024-03-01 16:00)');
```

#### Operators

```sql
-- && overlap
SELECT * FROM room_bookings WHERE booked_at && '[2024-03-01 10:00, 2024-03-01 12:00)';

-- @> contains a point
SELECT * FROM room_bookings WHERE booked_at @> '2024-03-01 10:30'::timestamptz;

-- <@ is contained by
SELECT * FROM room_bookings WHERE booked_at <@ '[2024-03-01 00:00, 2024-03-02 00:00)';

-- Boundary extraction
SELECT lower(booked_at), upper(booked_at) FROM room_bookings;

-- Adjacency
SELECT * FROM schedules WHERE period1 -|- period2;  -- ranges are adjacent

-- daterange example
SELECT * FROM subscriptions
WHERE validity @> CURRENT_DATE::date;
```

#### Exclusion Constraints (prevent overlaps)

```sql
-- Requires btree_gist extension for non-geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;

ALTER TABLE room_bookings
ADD CONSTRAINT no_double_booking
EXCLUDE USING gist (room_id WITH =, booked_at WITH &&);

-- Multi-column exclusion with additional equality condition
ALTER TABLE room_bookings
ADD CONSTRAINT no_double_booking_per_tenant
EXCLUDE USING gist (tenant_id WITH =, room_id WITH =, booked_at WITH &&);
```

#### Custom Range Types

```sql
CREATE TYPE floatrange AS RANGE (subtype = float8, subtype_diff = float8mi);

SELECT '[1.5, 2.5]'::floatrange @> 2.0;  -- true
```

---

### Composite Types

Composite types group multiple fields into a single reusable type.

```sql
-- Define a composite type
CREATE TYPE address AS (
    street  text,
    city    text,
    state   text,
    zip     text
);

-- Use in a table
CREATE TABLE customers (
    id              serial PRIMARY KEY,
    name            text NOT NULL,
    billing_address address,
    shipping_address address
);

-- Insert and access
INSERT INTO customers (name, billing_address)
VALUES ('Acme Corp', ROW('123 Main St', 'Springfield', 'IL', '62701'));

SELECT (billing_address).city FROM customers;
SELECT * FROM customers WHERE (billing_address).state = 'IL';

-- Update a field within composite
UPDATE customers
SET billing_address.zip = '62702'
WHERE id = 1;
```

Composite types are also implicitly created for every table and are used as the row type in PL/pgSQL functions.

---

### Domain Types

Domains are named data types with optional constraints, providing centralized validation logic.

```sql
-- Email domain with CHECK constraint
CREATE DOMAIN email_address AS text
CHECK (VALUE ~ '^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$');

-- Non-negative money (in cents)
CREATE DOMAIN positive_cents AS integer
CHECK (VALUE > 0);

-- Non-empty text
CREATE DOMAIN nonempty_text AS text
CHECK (VALUE <> '' AND VALUE IS NOT NULL)
NOT NULL;

-- Use domains in tables
CREATE TABLE invoices (
    id            serial PRIMARY KEY,
    customer_email email_address NOT NULL,
    amount_cents   positive_cents NOT NULL,
    description    nonempty_text
);

-- Domain constraints can be altered without modifying tables
ALTER DOMAIN positive_cents ADD CONSTRAINT allow_zero CHECK (VALUE >= 0);
```

---

## Constraints

### CHECK Constraints

```sql
-- Column-level
CREATE TABLE products (
    id         serial PRIMARY KEY,
    price      numeric CHECK (price >= 0),
    status     text CHECK (status IN ('active', 'inactive', 'archived'))
);

-- Table-level (can reference multiple columns)
CREATE TABLE discounts (
    id              serial PRIMARY KEY,
    discount_pct    numeric,
    discount_flat   numeric,
    CONSTRAINT one_discount_type CHECK (
        (discount_pct IS NULL) != (discount_flat IS NULL)
    )
);

-- Named constraint for clearer error messages
ALTER TABLE orders ADD CONSTRAINT chk_positive_total
CHECK (total_cents > 0);
```

### UNIQUE Constraints

```sql
-- Single column
CREATE TABLE users (
    id    serial PRIMARY KEY,
    email text UNIQUE NOT NULL
);

-- Composite unique
CREATE TABLE team_members (
    team_id integer,
    user_id integer,
    UNIQUE (team_id, user_id)
);

-- Partial unique (unique only within a condition)
CREATE UNIQUE INDEX idx_users_active_email
ON users (email) WHERE deleted_at IS NULL;
```

### EXCLUDE Constraints

Exclusion constraints generalize UNIQUE by allowing any operator, not just equality. Require a GiST or SP-GiST index.

```sql
-- No two bookings for the same room may overlap
CREATE EXTENSION btree_gist;

CREATE TABLE bookings (
    id      serial PRIMARY KEY,
    room    text,
    during  tsrange,
    EXCLUDE USING gist (room WITH =, during WITH &&)
);
```

### Foreign Key Options

```sql
CREATE TABLE orders (
    id          serial PRIMARY KEY,
    customer_id integer,

    -- ON DELETE options:
    -- CASCADE     - delete order when customer deleted
    -- SET NULL    - set customer_id to NULL
    -- SET DEFAULT - set to column default
    -- RESTRICT    - error if customer has orders (default behavior)
    -- NO ACTION   - like RESTRICT but deferred-constraint-friendly

    CONSTRAINT fk_orders_customer
        FOREIGN KEY (customer_id)
        REFERENCES customers(id)
        ON DELETE RESTRICT
        ON UPDATE CASCADE
);
```

### Deferrable Constraints

Deferrable constraints are checked at transaction commit instead of statement time, enabling circular references and bulk data loading.

```sql
-- Define as deferrable
ALTER TABLE employees ADD CONSTRAINT fk_manager
FOREIGN KEY (manager_id) REFERENCES employees(id)
DEFERRABLE INITIALLY DEFERRED;

-- Or defer within a transaction
BEGIN;
SET CONSTRAINTS fk_manager DEFERRED;
-- Insert records that temporarily violate the constraint
INSERT INTO employees (id, manager_id, name) VALUES (1, 2, 'Alice');
INSERT INTO employees (id, manager_id, name) VALUES (2, 1, 'Bob');
COMMIT;  -- constraint checked here, both records now exist
```

---

## Generated Columns

Generated columns compute their value automatically from other columns. PG12+ supports STORED (persisted to disk). PG16+ added experimental VIRTUAL (computed on read, not stored).

```sql
-- STORED generated column
CREATE TABLE measurements (
    id            serial PRIMARY KEY,
    value_celsius numeric NOT NULL,
    -- Automatically computed and stored
    value_fahrenheit numeric GENERATED ALWAYS AS (value_celsius * 9/5 + 32) STORED
);

INSERT INTO measurements (value_celsius) VALUES (100);
SELECT value_celsius, value_fahrenheit FROM measurements;
-- Returns: 100, 212

-- Full name from parts
CREATE TABLE persons (
    id         serial PRIMARY KEY,
    first_name text NOT NULL,
    last_name  text NOT NULL,
    full_name  text GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
);

-- Searchable slug from title
CREATE TABLE posts (
    id    serial PRIMARY KEY,
    title text NOT NULL,
    slug  text GENERATED ALWAYS AS (
        lower(regexp_replace(trim(title), '[^a-zA-Z0-9]+', '-', 'g'))
    ) STORED
);

CREATE INDEX idx_posts_slug ON posts(slug);
```

Restrictions: generation expression cannot reference other generated columns, user-defined functions must be IMMUTABLE, cannot have a DEFAULT, cannot be written to directly.

---

## Table Inheritance and Partitioning

### Traditional Inheritance (pre-PG10)

```sql
CREATE TABLE events (
    id         bigserial PRIMARY KEY,
    occurred_at timestamptz NOT NULL,
    payload    jsonb
);

CREATE TABLE click_events (
    element_id text NOT NULL
) INHERITS (events);

-- Queries on parent include child rows
SELECT count(*) FROM events;  -- includes click_events rows
SELECT count(*) FROM ONLY events;  -- excludes child tables
```

Traditional inheritance is largely superseded by declarative partitioning for the partition use case.

### Declarative Partitioning (PG10+)

#### Range Partitioning

```sql
CREATE TABLE events (
    id          bigint NOT NULL,
    occurred_at timestamptz NOT NULL,
    payload     jsonb
) PARTITION BY RANGE (occurred_at);

CREATE TABLE events_2024_q1 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE events_2024_q2 PARTITION OF events
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Default partition catches unmatched rows
CREATE TABLE events_default PARTITION OF events DEFAULT;

-- Index on partition key (propagates to all partitions)
CREATE INDEX ON events (occurred_at);
```

#### List Partitioning

```sql
CREATE TABLE orders (
    id      bigint NOT NULL,
    region  text NOT NULL,
    total   numeric
) PARTITION BY LIST (region);

CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US', 'CA');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('DE', 'FR', 'GB');
CREATE TABLE orders_other PARTITION OF orders DEFAULT;
```

#### Hash Partitioning

```sql
CREATE TABLE user_events (
    user_id bigint NOT NULL,
    event   text
) PARTITION BY HASH (user_id);

CREATE TABLE user_events_0 PARTITION OF user_events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_events_1 PARTITION OF user_events FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_events_2 PARTITION OF user_events FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_events_3 PARTITION OF user_events FOR VALUES WITH (MODULUS 4, REMAINDER 3);
```

#### Sub-partitioning

```sql
CREATE TABLE metrics (
    tenant_id integer NOT NULL,
    recorded_at date NOT NULL,
    value numeric
) PARTITION BY LIST (tenant_id);

CREATE TABLE metrics_tenant1 PARTITION OF metrics
FOR VALUES IN (1) PARTITION BY RANGE (recorded_at);

CREATE TABLE metrics_tenant1_2024 PARTITION OF metrics_tenant1
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
```

---

## Row-Level Security

RLS restricts which rows a user can see or modify. Enabled per table; policies define the filter predicate.

### Enabling RLS

```sql
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Without this, the table owner bypasses all policies!
ALTER TABLE documents FORCE ROW LEVEL SECURITY;
```

### Policy Types

```sql
-- PERMISSIVE (default): policies are OR'd together; user sees rows matching ANY policy
-- RESTRICTIVE: policies are AND'd; user must match ALL restrictive policies

-- Allow users to see only their own rows
CREATE POLICY user_isolation ON documents
AS PERMISSIVE
FOR ALL
TO application_role
USING (owner_id = current_setting('app.user_id')::integer);

-- Separate read and write policies
CREATE POLICY documents_select ON documents
FOR SELECT
TO application_role
USING (owner_id = current_setting('app.user_id')::integer OR is_public = true);

CREATE POLICY documents_insert ON documents
FOR INSERT
TO application_role
WITH CHECK (owner_id = current_setting('app.user_id')::integer);

CREATE POLICY documents_update ON documents
FOR UPDATE
TO application_role
USING (owner_id = current_setting('app.user_id')::integer)
WITH CHECK (owner_id = current_setting('app.user_id')::integer);

CREATE POLICY documents_delete ON documents
FOR DELETE
TO application_role
USING (owner_id = current_setting('app.user_id')::integer);
```

### Multi-Tenant Pattern

```sql
-- Set tenant context at session start (via connection pooler or app middleware)
SET app.tenant_id = '42';

-- RLS policy using session variable
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::integer);

-- Superuser bypass: use a dedicated non-superuser role for the app
CREATE ROLE app_user NOLOGIN;
GRANT SELECT, INSERT, UPDATE, DELETE ON orders TO app_user;

-- Service role that bypasses RLS (for admin tasks)
CREATE ROLE service_role BYPASSRLS LOGIN;
```

### RESTRICTIVE Policies

```sql
-- Combine PERMISSIVE (what user owns) AND RESTRICTIVE (not deleted)
CREATE POLICY only_active ON documents
AS RESTRICTIVE
FOR ALL
USING (deleted_at IS NULL);

CREATE POLICY owner_access ON documents
AS PERMISSIVE
FOR ALL
USING (owner_id = current_setting('app.user_id')::integer);

-- Result: user sees rows where deleted_at IS NULL AND owner_id matches
```

### Common Pitfalls

| Pitfall | Fix |
|---------|-----|
| Table owner bypasses RLS silently | Add `FORCE ROW LEVEL SECURITY` to the table |
| No policy defined means no rows visible | Always define at least one PERMISSIVE policy per operation |
| Superuser always bypasses RLS | Use a non-superuser application role |
| `current_user` vs session variable | Use `current_setting()` for app-set context; `current_user` reflects DB login role |
| Performance: predicate not pushed down | Create index on the tenant/owner column used in policy USING clause |

```sql
-- Verify your policies are working
SET ROLE app_user;
SET app.user_id = '1';
SELECT count(*) FROM documents;  -- should only return user 1's documents
RESET ROLE;
```