# PostgreSQL Configuration & Tuning Reference

## Table of Contents

1. [Memory Settings](#memory-settings)
   - shared_buffers
   - work_mem
   - maintenance_work_mem
   - effective_cache_size
   - huge_pages
2. [WAL & Checkpoint Settings](#wal--checkpoint-settings)
   - wal_level
   - wal_buffers
   - checkpoint_completion_target
   - max_wal_size and min_wal_size
   - full_page_writes
3. [Query Planner Settings](#query-planner-settings)
   - random_page_cost and seq_page_cost
   - effective_io_concurrency
   - JIT compilation
4. [Parallelism Settings](#parallelism-settings)
5. [Connection Settings](#connection-settings)
6. [Logging](#logging)
7. [OLTP vs OLAP Profiles](#oltp-vs-olap-profiles)
8. [Extensions](#extensions)
   - pg_stat_statements
   - pg_trgm
   - PostGIS
   - timescaledb
   - pgcrypto
   - auto_explain

---

## Memory Settings

### shared_buffers

The PostgreSQL buffer cache: how much memory the server reserves for caching data pages.

```ini
shared_buffers = 8GB   # Recommended: 25% of total RAM
```

Rules of thumb:
- Start at 25% of RAM. Going above 40% rarely helps and can hurt because the OS page cache also buffers the same pages.
- On dedicated database servers, 25% is conservative but safe. Profile with `pg_buffercache` to measure actual cache hit rates.
- Requires a server restart to take effect.

Check cache hit ratio:

```sql
SELECT
    sum(heap_blks_hit)  AS heap_hit,
    sum(heap_blks_read) AS heap_read,
    round(
        sum(heap_blks_hit)::numeric /
        nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100, 2
    ) AS hit_ratio_pct
FROM pg_statio_user_tables;
-- Target: > 99% for OLTP, > 95% for OLAP
```

Identify which tables consume the most buffer space (requires `pg_buffercache`):

```sql
CREATE EXTENSION pg_buffercache;

SELECT
    relname,
    count(*) * 8192 / 1024 / 1024 AS cached_mb,
    round(count(*) * 100.0 / (SELECT count(*) FROM pg_buffercache), 2) AS pct_of_cache
FROM pg_buffercache bc
JOIN pg_class c ON bc.relfilenode = c.relfilenode
WHERE c.relkind = 'r'
GROUP BY relname
ORDER BY cached_mb DESC
LIMIT 20;
```

### work_mem

Memory granted per sort, hash, or merge operation. Each query node (sort, hash join, hash aggregate) can use up to `work_mem` individually.

```ini
work_mem = 64MB   # Default 4MB is usually too low
```

Critical nuance: if a query has 5 sort nodes and 20 parallel workers, it can consume `5 * 20 * work_mem` = 100x `work_mem`. For a 32GB server running 100 connections, setting `work_mem = 320MB` is catastrophic.

Sizing strategy:
1. Estimate concurrent queries: `max_connections * avg_active_fraction`
2. Reserve memory for OS + shared_buffers + maintenance_work_mem
3. Divide remainder: `work_mem = remaining / (active_connections * avg_nodes_per_query)`

For most OLTP systems: 16-64MB. For analytics: 256MB-1GB with fewer connections.

Override per session for specific heavy queries:

```sql
SET work_mem = '512MB';
SELECT ... FROM large_table ORDER BY ...;
RESET work_mem;
```

Monitor actual temporary file creation to detect under-allocation:

```ini
log_temp_files = 0   # Log all temp files (0 = log everything, N = only above N bytes)
```

```sql
-- Check existing temp file usage stats
SELECT query, temp_blks_written
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 10;
```

### maintenance_work_mem

Memory for maintenance operations: VACUUM, ANALYZE, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY, CLUSTER.

```ini
maintenance_work_mem = 2GB   # Recommended: up to 10% RAM or 1-4GB
```

Larger values dramatically speed up `CREATE INDEX` and VACUUM on large tables. Unlike `work_mem`, there are never many concurrent maintenance operations, so you can set this aggressively.

Override per session before a large index build:

```sql
SET maintenance_work_mem = '4GB';
CREATE INDEX CONCURRENTLY idx_events_created_at ON events (created_at);
RESET maintenance_work_mem;
```

### effective_cache_size

A hint to the query planner about total memory available for caching (RAM + OS page cache). It does not allocate memory; it only influences cost estimates.

```ini
effective_cache_size = 24GB   # Recommended: 75% of total RAM
```

Higher values make the planner prefer index scans (which benefit from caching) over sequential scans. Too low a value causes the planner to choose sequential scans even when an index scan would be faster.

### huge_pages

Huge pages (2MB pages on Linux instead of 4KB) reduce TLB pressure and can improve throughput on large `shared_buffers` values (above 8GB).

```ini
huge_pages = try    # 'try' falls back gracefully; use 'on' to enforce
```

Linux OS setup (must be done before starting PostgreSQL):

```bash
# Calculate pages needed: shared_buffers / 2MB
# For shared_buffers = 16GB: 16384 MB / 2 MB = 8192 huge pages, add 10% buffer
echo 9000 > /proc/sys/vm/nr_hugepages

# Persist across reboots
echo "vm.nr_hugepages = 9000" >> /etc/sysctl.conf
sysctl -p

# Verify allocation
grep HugePages /proc/meminfo
```

---

## WAL & Checkpoint Settings

### wal_level

Controls how much information is written to WAL.

```ini
wal_level = replica    # Minimum for streaming replication
wal_level = logical    # Required for logical replication (writes more)
```

`wal_level = minimal` disables replication and reduces WAL volume slightly. Use only for standalone servers where you never need PITR.

### wal_buffers

Memory for WAL writes before flushing to disk. PostgreSQL auto-tunes this to 1/32 of `shared_buffers`, capped at 16MB.

```ini
wal_buffers = 64MB   # Manual override; auto value is usually fine
```

Rarely needs manual tuning. Increase only if you see contention on `WALBufMappingLock` in `pg_stat_activity`.

### checkpoint_completion_target

Fraction of the checkpoint interval over which to spread checkpoint I/O. Reduces I/O spikes at checkpoint time.

```ini
checkpoint_completion_target = 0.9   # Recommended (default is 0.9 in PG14+)
```

With `max_wal_size = 4GB` and `checkpoint_completion_target = 0.9`, PostgreSQL spreads writes over 90% of the checkpoint interval instead of flushing all at once.

### max_wal_size and min_wal_size

Control WAL retention between checkpoints. Larger values reduce checkpoint frequency (less I/O) at the cost of more WAL on disk and longer crash recovery time.

```ini
min_wal_size = 1GB     # Minimum WAL to retain (default 80MB)
max_wal_size = 8GB     # Triggers checkpoint when exceeded (default 1GB)
```

For write-heavy workloads, increase `max_wal_size` to reduce checkpoint frequency. Monitor checkpoint frequency:

```sql
SELECT checkpoints_timed, checkpoints_req, checkpoint_write_time, checkpoint_sync_time
FROM pg_stat_bgwriter;
-- checkpoints_req >> checkpoints_timed means max_wal_size is too small
```

### full_page_writes

After a checkpoint, PostgreSQL writes the full page image of a modified page the first time it is touched. This protects against torn page writes when the OS crashes mid-write.

```ini
full_page_writes = on   # NEVER disable this
```

Disabling `full_page_writes` can cause unrecoverable data corruption after an OS crash. The only safe way to reduce full-page write overhead is to use a filesystem or storage that guarantees atomic page writes (ZFS, some SAN configurations) and you fully understand the implications.

---

## Query Planner Settings

### random_page_cost and seq_page_cost

Control the planner's cost model for I/O. Lower values make the planner favor the corresponding access method.

```ini
# For NVMe/SSD storage:
random_page_cost = 1.1
seq_page_cost = 1.0

# For traditional HDD:
random_page_cost = 4.0
seq_page_cost = 1.0
```

The default `random_page_cost = 4.0` is calibrated for spinning disk. On SSD, it causes the planner to undervalue index scans, leading to unnecessary sequential scans. Always set `random_page_cost = 1.1` on SSD-based servers.

Override per session to diagnose planner choices:

```sql
SET random_page_cost = 1.1;
EXPLAIN ANALYZE SELECT ...;
```

### effective_io_concurrency

Number of concurrent I/O operations the planner assumes the storage can handle. Affects bitmap index scan prefetching.

```ini
effective_io_concurrency = 200   # NVMe SSD (high parallelism)
effective_io_concurrency = 2     # Traditional HDD (low parallelism)
effective_io_concurrency = 1     # NFS/SAN (conservative)
```

### JIT Compilation

JIT (Just-In-Time compilation via LLVM) can speed up CPU-intensive queries (complex aggregations, many expressions) but adds compilation overhead that hurts short OLTP queries.

```ini
jit = on                  # Enable JIT globally (default on in PG11+)
jit_above_cost = 100000   # Only JIT-compile queries above this cost
jit_optimize_above_cost = 500000  # Apply expensive optimizations above this cost
jit_inline_above_cost = 500000    # Inline functions above this cost
```

For OLTP workloads where queries are fast and simple:

```ini
jit = off   # Disable entirely to avoid overhead
```

Check if JIT was used in a query:

```sql
EXPLAIN (ANALYZE, VERBOSE, FORMAT TEXT)
SELECT sum(total) FROM orders WHERE created_at > now() - interval '1 year';
-- Look for "JIT:" section in output
```

---

## Parallelism Settings

PostgreSQL can parallelize sequential scans, aggregations, joins, and index scans.

```ini
# Total background workers available to the instance
max_worker_processes = 16           # Default 8; should be >= CPU cores

# Maximum parallel workers available for queries at any time
max_parallel_workers = 8            # Default 8; cap at physical CPU cores

# Workers per individual query node
max_parallel_workers_per_gather = 4 # Default 2; practical limit 4-8

# Minimum table size before considering parallel scan
min_parallel_table_scan_size = 8MB  # Default; lower to enable on smaller tables
min_parallel_index_scan_size = 512kB

# Include leader process in parallel work (default on)
parallel_leader_participation = on
```

Force parallelism for testing (dangerous in production):

```sql
SET max_parallel_workers_per_gather = 8;
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
EXPLAIN ANALYZE SELECT count(*) FROM large_table;
```

Disable parallelism for a session (useful when debugging):

```sql
SET max_parallel_workers_per_gather = 0;
```

---

## Connection Settings

### max_connections

PostgreSQL creates one process per connection. High connection counts waste memory and cause lock contention.

```ini
max_connections = 200   # Keep below 300; use pgBouncer for more
```

Each idle connection consumes ~5MB RAM just for the process overhead. With `work_mem = 64MB` and a sort-heavy query, one connection can briefly use 64MB * N sort nodes.

Use PgBouncer in transaction mode for OLTP:

```ini
# pgbouncer.ini
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 20   # Connections to PostgreSQL per database/user pair
```

```ini
# Reserve connections for superusers (DBA access during emergencies)
superuser_reserved_connections = 5
```

### TCP Keepalives

Detect dead connections (e.g., after network partition) without relying on the application:

```ini
tcp_keepalives_idle = 60      # Start keepalives after 60s idle
tcp_keepalives_interval = 10  # Retry every 10s
tcp_keepalives_count = 6      # Drop connection after 6 failed probes (1 minute)
```

Monitor current connections and their state:

```sql
SELECT
    state,
    count(*),
    max(now() - state_change) AS longest_in_state
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
ORDER BY count DESC;

-- Find idle connections older than 10 minutes
SELECT pid, usename, application_name, state, state_change, query
FROM pg_stat_activity
WHERE state = 'idle'
  AND state_change < now() - interval '10 minutes';
```

---

## Logging

### Slow Query Logging

```ini
log_min_duration_statement = 1000   # Log queries taking > 1 second (ms)
                                     # Set to 0 to log all; -1 to disable
```

### Statement-Level Logging

```ini
log_statement = 'ddl'   # Recommended for most production servers
# Options: none | ddl | mod | all
# 'ddl'  = CREATE, DROP, ALTER, TRUNCATE
# 'mod'  = ddl + INSERT, UPDATE, DELETE, COPY
# 'all'  = everything (very verbose, for debugging only)
```

### Lock Logging

```ini
log_lock_waits = on       # Log if a query waits for a lock
deadlock_timeout = 1s     # Time before checking for deadlock (and logging wait)
```

Deadlocks are logged automatically at `log_error_verbosity` level. Lock waits (not deadlocks) require `log_lock_waits = on`:

```ini
# Also useful for identifying lock contention:
log_min_duration_statement = 500    # Catch queries slow due to lock waits
```

Query current lock waits:

```sql
SELECT
    blocked.pid                   AS blocked_pid,
    blocked.query                 AS blocked_query,
    blocking.pid                  AS blocking_pid,
    blocking.query                AS blocking_query,
    now() - blocked.query_start   AS wait_duration
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
    ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.cardinality(pg_blocking_pids(blocked.pid)) > 0;
```

### auto_explain

Automatically log EXPLAIN ANALYZE for slow queries without modifying application code:

```ini
# Load as a shared library (requires restart)
shared_preload_libraries = 'pg_stat_statements, auto_explain'

# auto_explain settings (in postgresql.conf or per session)
auto_explain.log_min_duration = 5000    # Log plans for queries > 5 seconds
auto_explain.log_analyze = on           # Include ANALYZE (actual vs estimated rows)
auto_explain.log_buffers = on           # Include buffer usage
auto_explain.log_format = text          # text | json | yaml | xml
auto_explain.log_verbose = off          # Include column-level output (very noisy)
auto_explain.log_nested_statements = off # Exclude PL/pgSQL internal queries
auto_explain.sample_rate = 1.0          # Sample 100% of queries; set lower under load
```

Enable per session without restart:

```sql
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '1s';
SET auto_explain.log_analyze = true;
```

---

## OLTP vs OLAP Profiles

Two complete configuration profiles showing key differences.

### OLTP Profile (32GB RAM, NVMe SSD, 200 connections)

```ini
# Memory
shared_buffers = 8GB
work_mem = 32MB
maintenance_work_mem = 1GB
effective_cache_size = 24GB
huge_pages = try

# WAL & Checkpoints
wal_level = replica
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
full_page_writes = on

# Planner
random_page_cost = 1.1
seq_page_cost = 1.0
effective_io_concurrency = 200
jit = off                        # Short queries don't benefit; avoid overhead

# Parallelism - conservative for OLTP
max_worker_processes = 16
max_parallel_workers = 4
max_parallel_workers_per_gather = 2

# Connections
max_connections = 200
superuser_reserved_connections = 5
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 6

# Logging
log_min_duration_statement = 500
log_statement = 'ddl'
log_lock_waits = on
deadlock_timeout = 1s

# Extensions
shared_preload_libraries = 'pg_stat_statements, auto_explain'
pg_stat_statements.track = all
auto_explain.log_min_duration = 2000
auto_explain.log_analyze = on
auto_explain.log_buffers = on
```

### OLAP Profile (128GB RAM, NVMe SSD, 20 connections, analytics workload)

```ini
# Memory - larger allocations per query
shared_buffers = 32GB
work_mem = 2GB                   # Large sorts and hash joins for analytics
maintenance_work_mem = 4GB
effective_cache_size = 96GB
huge_pages = on

# WAL & Checkpoints - less frequent, larger checkpoints
wal_level = replica
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 16GB              # Fewer checkpoints for write-heavy ETL
min_wal_size = 4GB
full_page_writes = on

# Planner - favor parallel plans and large scans
random_page_cost = 1.1
seq_page_cost = 1.0
effective_io_concurrency = 200
jit = on                         # CPU-heavy aggregations benefit from JIT
jit_above_cost = 50000           # Lower threshold to engage JIT sooner

# Parallelism - aggressive for analytics
max_worker_processes = 32
max_parallel_workers = 24
max_parallel_workers_per_gather = 12
min_parallel_table_scan_size = 1MB
min_parallel_index_scan_size = 128kB
parallel_leader_participation = on

# Connections - low count, use pooling at application layer
max_connections = 50
superuser_reserved_connections = 5
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 6

# Logging
log_min_duration_statement = 5000   # Only log very slow queries
log_statement = 'ddl'
log_lock_waits = on
deadlock_timeout = 5s

# Extensions
shared_preload_libraries = 'pg_stat_statements, auto_explain'
pg_stat_statements.track = all
auto_explain.log_min_duration = 10000
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_verbose = on        # Column-level detail useful for analytics
```

---

## Extensions

### pg_stat_statements

Tracks cumulative execution statistics for all SQL statements. Essential for identifying slow queries.

```ini
# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all          # top | all (includes nested statements)
pg_stat_statements.max = 10000          # Max distinct statements tracked
pg_stat_statements.track_utility = on   # Track VACUUM, CREATE, etc.
```

```sql
CREATE EXTENSION pg_stat_statements;

-- Top 10 queries by total execution time
SELECT
    round(total_exec_time::numeric, 2) AS total_ms,
    calls,
    round(mean_exec_time::numeric, 2) AS mean_ms,
    round(stddev_exec_time::numeric, 2) AS stddev_ms,
    rows,
    round(100.0 * total_exec_time / sum(total_exec_time) OVER (), 2) AS pct,
    left(query, 120) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Queries with high I/O (temp file usage)
SELECT query, calls, total_exec_time, temp_blks_written
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();
```

### pg_trgm

Trigram similarity enables fast fuzzy text search and LIKE/ILIKE acceleration with GIN or GiST indexes.

```sql
CREATE EXTENSION pg_trgm;

-- Similarity search (0 to 1 score)
SELECT name, similarity(name, 'PostgreSQL') AS sim
FROM products
WHERE similarity(name, 'PostgreSQL') > 0.3
ORDER BY sim DESC;

-- Accelerate LIKE/ILIKE with GIN index
CREATE INDEX idx_products_name_trgm ON products USING gin (name gin_trgm_ops);

-- Now this query uses the index:
EXPLAIN ANALYZE SELECT * FROM products WHERE name ILIKE '%ostgre%';

-- Word similarity (better for phrase matching)
SELECT word_similarity('PostgreSQL', 'Postgres SQL tutorial');
```

### PostGIS

Spatial and geographic data types, indexing, and functions. Use GiST indexes for geometry columns.

```sql
CREATE EXTENSION postgis;

-- Spatial columns
CREATE TABLE locations (
    id      bigserial PRIMARY KEY,
    name    text,
    geom    geometry(Point, 4326)   -- WGS84 lat/lng
);

CREATE INDEX idx_locations_geom ON locations USING gist (geom);

-- Find points within 10km of a given point
SELECT name, ST_Distance(geom::geography, ST_MakePoint(-73.9857, 40.7484)::geography) AS dist_m
FROM locations
WHERE ST_DWithin(geom::geography, ST_MakePoint(-73.9857, 40.7484)::geography, 10000)
ORDER BY dist_m;
```

### timescaledb

Automatically partitions time-series data into chunks, enables continuous aggregates, and provides compression.

```sql
CREATE EXTENSION timescaledb;

-- Convert a regular table to a hypertable (partitioned by time)
CREATE TABLE metrics (
    time        timestamptz NOT NULL,
    device_id   int,
    temperature double precision
);
SELECT create_hypertable('metrics', 'time', chunk_time_interval => interval '1 day');

-- Automatic compression for old chunks
ALTER TABLE metrics SET (
    timescaledb.compress,
    timescaledb.compress_orderby = 'time DESC',
    timescaledb.compress_segmentby = 'device_id'
);
SELECT add_compression_policy('metrics', interval '7 days');

-- Continuous aggregate (materialized, auto-refreshed)
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS bucket, device_id, avg(temperature) AS avg_temp
FROM metrics
GROUP BY bucket, device_id;
```

### pgcrypto

Cryptographic functions for hashing, encryption, and key generation.

```sql
CREATE EXTENSION pgcrypto;

-- Password hashing (bcrypt)
INSERT INTO users (email, password_hash)
VALUES ('user@example.com', crypt('user_password', gen_salt('bf', 12)));

-- Verify password
SELECT id FROM users
WHERE email = 'user@example.com'
  AND password_hash = crypt('supplied_password', password_hash);

-- Symmetric encryption (AES via pgp_sym_encrypt)
SELECT pgp_sym_encrypt('sensitive data', 'encryption_key');
SELECT pgp_sym_decrypt(encrypted_col, 'encryption_key') FROM secrets;

-- Generate random UUID
SELECT gen_random_uuid();

-- Generate cryptographically secure random bytes
SELECT encode(gen_random_bytes(32), 'hex');
```

### auto_explain

Logs query execution plans automatically for slow queries. Configured as a shared library (see [Logging](#logging) section). No SQL setup required beyond loading the library.

Load temporarily in a session for debugging without a server restart:

```sql
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;    -- Log everything in this session
SET auto_explain.log_analyze = true;
SET auto_explain.log_buffers = true;

-- Run your query; check PostgreSQL logs for the plan
SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 100;
```

Sample only a fraction of queries under high load to reduce log volume:

```ini
auto_explain.sample_rate = 0.01   # Log plans for ~1% of qualifying queries
```