How to Optimize PostgreSQL Performance

About us: Personal website of Timofey Bugaevsky and the company Zetka Interactive

Guides: A comprehensive collection of technical guides written from a senior developer's perspective. Each article provides in-depth explanations, practical code examples, and production-ready patterns.

Databases: Database administration, optimization, and patterns

Introduction

PostgreSQL performance optimization is essential for applications at scale. Slow queries, missing indexes, and improper configuration can cripple application performance. Senior engineers must understand query planning, indexing strategies, and configuration tuning to maintain responsive systems.

Analyzing Query Performance

EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT * FROM orders
WHERE user_id = 123
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 10;

Output interpretation:

Seq Scan: Full table scan (often bad for large tables)
Index Scan: Using an index (good)
Index Only Scan: Best—data from the index alone
Bitmap Heap Scan: Combines multiple index conditions
actual time: Real execution time
rows: Actual rows processed

Identifying Slow Queries

-- Enable slow query logging in postgresql.conf
-- log_min_duration_statement = 1000 -- Log queries > 1 second
-- Find currently running queries
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state <> 'idle'
AND query NOT LIKE '% FROM pg_stat_activity %'
ORDER BY duration DESC;
-- Find long-running queries
SELECT datname, pid, state, query, age(clock_timestamp(), query_start) AS age
FROM pg_stat_activity
WHERE state <> 'idle'
ORDER BY age DESC;

Kill Long-Running Queries

-- Cancel query (graceful)
SELECT pg_cancel_backend(pid);
-- Terminate connection (forceful)
SELECT pg_terminate_backend(pid);

Indexing Strategies

Basic Indexes

-- Single-column index
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);

Partial Indexes

Index only rows matching a condition:

-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(user_id, created_at)
WHERE created_at > '2024-01-01';

Expression Indexes

-- Index on lowercase email
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Index on extracted year
CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM created_at));

Covering Indexes (INCLUDE)

Include non-indexed columns to enable index-only scans:

CREATE INDEX idx_orders_user ON orders(user_id)
INCLUDE (total, status, created_at);
-- Query can now be satisfied entirely from the index
SELECT total, status, created_at FROM orders WHERE user_id = 123;

When to Use Different Index Types

Index Type	Use Case
B-tree	Default, equality and range queries
Hash	Equality only (rarely better than B-tree)
GIN	Full-text search, JSONB, arrays
GiST	Geometric data, full-text search
BRIN	Large tables with natural ordering

JSONB Indexing

-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata ON products USING gin(metadata);
-- Query: WHERE metadata @> '{"category": "electronics"}'
-- Index a specific JSONB path
CREATE INDEX idx_products_category ON products((metadata->>'category'));

Query Optimization

Avoid SELECT *

-- Bad
SELECT * FROM orders WHERE user_id = 123;
-- Good
SELECT id, total, status, created_at FROM orders WHERE user_id = 123;

Use LIMIT with ORDER BY

-- Bad: Sorts all rows
SELECT * FROM orders ORDER BY created_at DESC;
-- Good: Stops early with index
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;

Efficient Pagination

-- Bad: OFFSET scans and discards rows
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 10000;
-- Good: Keyset pagination
SELECT * FROM orders
WHERE id < 12345 -- Last ID from previous page
ORDER BY id DESC
LIMIT 20;

Batch Operations

-- Bad: Many small transactions
INSERT INTO logs (message) VALUES ('log 1');
INSERT INTO logs (message) VALUES ('log 2');
-- ... 1000 more
-- Good: Single transaction with multiple rows
INSERT INTO logs (message) VALUES
('log 1'), ('log 2'), ('log 3'), ...;
-- Or use COPY for bulk loads
COPY logs(message) FROM '/path/to/data.csv' CSV;

Avoid N+1 in Application Code

-- Instead of querying users, then orders separately,
-- Use JOIN or a lateral join.
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.id IN (1, 2, 3);

PostGIS Optimization

Geometry Queries

-- Store point
INSERT INTO locations (name, coordinates)
VALUES ('Office', ST_SetSRID(ST_MakePoint(-73.935242, 40.730610), 4326));
-- Query as GeoJSON
SELECT ST_AsGeoJSON(ST_Transform(coordinates, 3857))
FROM locations;
-- Find points within a radius
SELECT * FROM locations
WHERE ST_DWithin(
coordinates::geography,
ST_MakePoint(-73.935, 40.730)::geography,
1000 -- meters
);
-- Spatial index (essential!)
CREATE INDEX idx_locations_geo ON locations USING gist(coordinates);

Configuration Tuning

Memory Settings

# postgresql.conf
# Shared memory for caching (25% of RAM)
shared_buffers = 4GB
# Memory per operation (sort, hash)
work_mem = 256MB
# Memory for maintenance operations
maintenance_work_mem = 1GB
# Query planner's estimate of cache
effective_cache_size = 12GB

Connection Settings

# Maximum connections
max_connections = 200
# Connection pooling (use PgBouncer for better scaling)

Write Performance

# WAL settings for write-heavy workloads
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB

Table Maintenance

VACUUM and ANALYZE

-- Update statistics (helps query planner)
ANALYZE orders;
-- Reclaim space from deleted rows
VACUUM orders;
-- Full cleanup (locks table)
VACUUM FULL orders;
-- Automatic vacuum settings
-- autovacuum = on (default)

REINDEX

-- Rebuild bloated index
REINDEX INDEX idx_orders_user_id;
-- Rebuild all indexes on table
REINDEX TABLE orders;
-- Concurrently (no lock, PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;

Table Statistics

-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('orders'));
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_orders_user_id'));
-- Table bloat estimation
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Connection Pooling

PgBouncer Configuration

# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

Monitoring

Essential Queries

-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Unused indexes (candidates for removal)
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND schemaname = 'public';
-- Table access patterns
SELECT schemaname, relname, seq_scan, idx_scan,
n_tup_ins, n_tup_upd, n_tup_del
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;
-- Cache hit ratio (should be > 99%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;

JSONB Manipulation

Rename JSON Field Properties

-- Example: Rename 'wrongname' to 'name' in nested JSONB
UPDATE products AS p
SET metadata = (
jsonb_set(
p.metadata::jsonb #- '{objectfield,wrongname}',
'{objectfield,name}',
p.metadata::jsonb #> '{objectfield,wrongname}'
)
)::json
WHERE p.metadata::jsonb ? 'objectfield'
AND p.metadata::jsonb #> '{objectfield}' ? 'wrongname';

Query JSONB Arrays

-- Find items where an array contains a specific value
SELECT * FROM products
WHERE metadata->'tags' ? 'featured';
-- Query nested array elements
SELECT * FROM products
WHERE metadata @> '{"categories": [{"name": "electronics"}]}';
-- Aggregate JSONB array elements
SELECT id, jsonb_array_elements(metadata->'variants') as variant
FROM products;

Data Anonymization

For development/staging environments, anonymize production data:

-- Anonymize user data for non-production use
UPDATE users SET
first_name = SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int),
last_name = SUBSTR(MD5(RANDOM()::text), 1, 4 + FLOOR(4*RANDOM())::int),
address = CONCAT(
SUBSTR(MD5(RANDOM()::text), 1, 5 + FLOOR(5*RANDOM())::int), ' ',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(2*RANDOM())::int), ', ',
FLOOR(100*RANDOM())::int, ', ',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int)
),
email = CONCAT(
SUBSTR(MD5(RANDOM()::text), 1, 5 + FLOOR(5*RANDOM())::int), '@',
SUBSTR(MD5(RANDOM()::text), 1, 2 + FLOOR(3*RANDOM())::int), '.test'
),
phone = CONCAT('+1', FLOOR(1000000000 + 8999999999*RANDOM())::bigint);

Finding Large Tables

Identify tables consuming the most storage:

SELECT
schemaname AS schema,
tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;

Foreign Key Management

Temporarily Disable for Bulk Operations

-- Disable foreign key checks (use with caution!)
SET session_replication_role = 'replica';
-- Perform bulk operations
TRUNCATE TABLE orders CASCADE;
-- Or complex data migrations...
-- Re-enable foreign key checks
SET session_replication_role = 'origin';

Best Practices Checklist

[ ] Index columns used in WHERE, JOIN, ORDER BY
[ ] Use EXPLAIN ANALYZE for slow queries
[ ] Implement connection pooling
[ ] Configure shared_buffers appropriately
[ ] Run ANALYZE after bulk data changes
[ ] Monitor and remove unused indexes
[ ] Use keyset pagination for large datasets
[ ] Keep statistics updated
[ ] Use GIN indexes for JSONB queries
[ ] Anonymize data in non-production environments

Conclusion

PostgreSQL performance optimization requires understanding query execution, proper indexing, and configuration tuning. Start with EXPLAIN ANALYZE to identify bottlenecks, add appropriate indexes, and tune configuration based on workload. Regular monitoring ensures performance doesn't degrade over time.

Site materials may be used with reference to the source. License

About us

Guides

Databases

How to Optimize PostgreSQL Performance

Свернуть [_]

Log In

How to Optimize PostgreSQL Performance

Introduction

Analyzing Query Performance

EXPLAIN ANALYZE

Identifying Slow Queries

Kill Long-Running Queries

Indexing Strategies

Basic Indexes

Partial Indexes

Expression Indexes

Covering Indexes (INCLUDE)

When to Use Different Index Types

JSONB Indexing

Query Optimization

Avoid SELECT *

Use LIMIT with ORDER BY

Efficient Pagination

Batch Operations

Avoid N+1 in Application Code

PostGIS Optimization

Geometry Queries

Configuration Tuning

Memory Settings

Connection Settings

Write Performance

Table Maintenance

VACUUM and ANALYZE

REINDEX

Table Statistics

Connection Pooling

PgBouncer Configuration

Monitoring

Essential Queries

JSONB Manipulation

Rename JSON Field Properties

Query JSONB Arrays

Data Anonymization

Finding Large Tables

Foreign Key Management

Temporarily Disable for Bulk Operations

Best Practices Checklist

Conclusion