Introduction
PostgreSQL performance optimization is essential for applications at scale. Slow queries, missing indexes, and improper configuration can cripple application performance. Senior engineers must understand query planning, indexing strategies, and configuration tuning to maintain responsive systems.
Analyzing Query Performance
EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT * FROM orders
WHERE user_id = 123
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 10;
Output interpretation:
- Seq Scan: Full table scan (often bad for large tables)
- Index Scan: Using an index (good)
- Index Only Scan: Best—data from the index alone
- Bitmap Heap Scan: Combines multiple index conditions
- actual time: Real execution time
- rows: Actual rows processed
Identifying Slow Queries
-- Enable slow query logging in postgresql.conf
-- log_min_duration_statement = 1000 -- Log queries > 1 second
-- Find currently running queries
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state <> 'idle'
AND query NOT LIKE '% FROM pg_stat_activity %'
ORDER BY duration DESC;
-- Find long-running queries
SELECT datname, pid, state, query, age(clock_timestamp(), query_start) AS age
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY age DESC;
Kill Long-Running Queries
-- Cancel query (graceful)
SELECT pg_cancel_backend(pid);
-- Terminate connection (forceful)
SELECT pg_terminate_backend(pid);
Indexing Strategies
Basic Indexes
-- Single-column index
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);
Partial Indexes
Index only rows matching a condition:
-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(user_id, created_at)
WHERE created_at > '2024-01-01';
Expression Indexes
-- Index on lowercase email
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Index on extracted year
CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM created_at));
Covering Indexes (INCLUDE)
Include non-indexed columns to enable index-only scans:
CREATE INDEX idx_orders_user ON orders(user_id)
INCLUDE (total, status, created_at);
-- Query can now be satisfied entirely from the index
SELECT total, status, created_at FROM orders WHERE user_id = 123;
When to Use Different Index Types
| Index Type | Use Case |
|---|
| B-tree | Default, equality and range queries |
| Hash | Equality only (rarely better than B-tree) |
| GIN | Full-text search, JSONB, arrays |
| GiST | Geometric data, full-text search |
| BRIN | Large tables with natural ordering |
JSONB Indexing
-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata ON products USING gin(metadata);
-- Query: WHERE metadata @> '{"category": "electronics"}'
-- Index a specific JSONB path
CREATE INDEX idx_products_category ON products((metadata->>'category'));
Query Optimization
Avoid SELECT *
-- Bad
SELECT * FROM orders WHERE user_id = 123;
-- Good
SELECT id, total, status, created_at FROM orders WHERE user_id = 123;
Use LIMIT with ORDER BY
-- Bad: Sorts all rows
SELECT * FROM orders ORDER BY created_at DESC;
-- Good: Stops early with index
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
Efficient Pagination
-- Bad: OFFSET scans and discards rows
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;
-- Good: Keyset pagination
SELECT * FROM orders
WHERE id < 12345 -- Last ID from previous page
ORDER BY id DESC
LIMIT 20;
Batch Operations
-- Bad: Many small transactions
INSERT INTO logs (message) VALUES ('log 1');
INSERT INTO logs (message) VALUES ('log 2');
-- ... 1000 more
-- Good: Single transaction with multiple rows
INSERT INTO logs (message) VALUES
('log 1'), ('log 2'), ('log 3'), ...;
-- Or use COPY for bulk loads
COPY logs(message) FROM '/path/to/data.csv' CSV;
Avoid N+1 in Application Code
-- Instead of querying users, then orders separately,
-- Use JOIN or a lateral join.
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id IN (1, 2, 3);
PostGIS Optimization
Geometry Queries
-- Store point
INSERT INTO locations (name, coordinates)
VALUES ('Office', ST_SetSRID(ST_MakePoint(-73.935242, 40.730610), 4326));
-- Query as GeoJSON
SELECT ST_AsGeoJSON(ST_Transform(coordinates, 3857))
FROM locations;
-- Find points within a radius
SELECT * FROM locations
WHERE ST_DWithin(
coordinates::geography,
ST_MakePoint(-73.935, 40.730)::geography,
1000 -- meters
);
-- Spatial index (essential!)
CREATE INDEX idx_locations_geo ON locations USING gist(coordinates);
Configuration Tuning
Memory Settings
# postgresql.conf
# Shared memory for caching (25% of RAM)
shared_buffers = 4GB
# Memory per operation (sort, hash)
work_mem = 256MB
# Memory for maintenance operations
maintenance_work_mem = 1GB
# Query planner's estimate of cache
effective_cache_size = 12GB
Connection Settings
# Maximum connections
max_connections = 200
# Connection pooling (use PgBouncer for better scaling)
Write Performance
# WAL settings for write-heavy workloads
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
Table Maintenance
VACUUM and ANALYZE
-- Update statistics (helps query planner)
ANALYZE orders;
-- Reclaim space from deleted rows
VACUUM orders;
-- Full cleanup (locks table)
VACUUM FULL orders;
-- Automatic vacuum settings
-- autovacuum = on (default)
REINDEX
-- Rebuild bloated index
REINDEX INDEX idx_orders_user_id;
-- Rebuild all indexes on table
REINDEX TABLE orders;
-- Concurrently (no lock, PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;
Table Statistics
-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('orders'));
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_orders_user_id'));
-- Table bloat estimation
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
Connection Pooling
PgBouncer Configuration
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
Monitoring
Essential Queries
-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public';
-- Table access patterns
SELECT schemaname, relname, seq_scan, idx_scan,
n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;
-- Cache hit ratio (should be > 99%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;
JSONB Manipulation
Rename JSON Field Properties
-- Example: Rename 'wrongname' to 'name' in nested JSONB
UPDATE products AS p
SET metadata = (
jsonb_set(
p.metadata::jsonb #- '{objectfield,wrongname}',
'{objectfield,name}',
p.metadata::jsonb #> '{objectfield,wrongname}'
)
)::json
WHERE p.metadata::jsonb ? 'objectfield'
AND p.metadata::jsonb #> '{objectfield}' ? 'wrongname';
Query JSONB Arrays
-- Find items where an array contains a specific value
SELECT * FROM products
WHERE metadata->'tags' ? 'featured';
-- Query nested array elements
SELECT * FROM products
WHERE metadata @> '{"categories": [{"name": "electronics"}]}';
-- Aggregate JSONB array elements
SELECT id, jsonb_array_elements(metadata->'variants') as variant
FROM products;
Data Anonymization
For development/staging environments, anonymize production data:
-- Anonymize user data for non-production use
UPDATE users SET
first_name = SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int),
last_name = SUBSTR(MD5(RANDOM()::text), 1, 4 + FLOOR(4*RANDOM())::int),
address = CONCAT(
SUBSTR(MD5(RANDOM()::text), 1, 5 + FLOOR(5*RANDOM())::int), ' ',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(2*RANDOM())::int), ', ',
FLOOR(100*RANDOM())::int, ', ',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int)
),
email = CONCAT(
SUBSTR(MD5(RANDOM()::text), 1, 5 + FLOOR(5*RANDOM())::int), '@',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int), '.test'
),
phone = CONCAT('+1', FLOOR(1000000000 + 8999999999*RANDOM())::bigint);
Finding Large Tables
Identify tables consuming the most storage:
SELECT
schemaname AS schema,
tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;
Foreign Key Management
Temporarily Disable for Bulk Operations
-- Disable foreign key checks (use with caution!)
SET session_replication_role = 'replica';
-- Perform bulk operations
TRUNCATE TABLE orders CASCADE;
-- Or complex data migrations...
-- Re-enable foreign key checks
SET session_replication_role = 'origin';
Best Practices Checklist
- [ ] Index columns used in WHERE, JOIN, ORDER BY
- [ ] Use EXPLAIN ANALYZE for slow queries
- [ ] Implement connection pooling
- [ ] Configure shared_buffers appropriately
- [ ] Run ANALYZE after bulk data changes
- [ ] Monitor and remove unused indexes
- [ ] Use keyset pagination for large datasets
- [ ] Keep statistics updated
- [ ] Use GIN indexes for JSONB queries
- [ ] Anonymize data in non-production environments
Conclusion
PostgreSQL performance optimization requires understanding query execution, proper indexing, and configuration tuning. Start with EXPLAIN ANALYZE to identify bottlenecks, add appropriate indexes, and tune configuration based on workload. Regular monitoring ensures performance doesn't degrade over time.