Autovacuum Falling Behind

warningIncident Response

Dead tuple accumulation indicates autovacuum is not keeping up with update/delete workload, risking bloat and performance degradation.

Prompt: My Postgres tables are showing millions of dead tuples and autovacuum doesn't seem to be keeping up. How do I know if I need to tune autovacuum settings or if there's something blocking it?

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,029vs3,185
Latency
20.6smsvs1.2mms
Turns
2vs32
Tool Calls
0vs16
Schema Calls
0vs11
Response
2,874 charsvs2,771 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
The response with Schema provides deeper technical insight by explaining the mathematical reason why default autovacuum settings fail on large tables (a 10M row table requires 2M+ changes before triggering vacuum), while the response without Schema provides this tuning advice without the underlying rationale. The Schema-informed response also includes a more comprehensive list of blocking scenarios (prepared transactions, replication slots, hot_standby_feedback) and cites authoritative sources (Citus, pganalyze, PostgreSQL docs) that allow the operator to verify the guidance. However, the response without Schema offers a slightly more immediately actionable diagnostic workflow with four copy-paste-ready SQL queries. Both responses correctly identify the core issues and provide accurate guidance, but the Schema-informed answer gives the operator better understanding of the root cause and more confidence through external references.
Without Schema
I'll help you diagnose your autovacuum issues. Here's a systematic approach to identify whether autovacuum is blocked or just needs tuning: ## 1. **Check for Long-Running Transactions (Most Common Blocker)** ```sql -- Find long-running transactions that block autovacuum SELECT pid, usename, application_name, state, age(clock_timestamp(), xact_start) AS xact_age, age(clock_timestamp(), query_start) AS query_age, query FROM pg_stat_activity WHERE state != 'idle' AND xact_start IS NOT NULL ORDER BY xact_start; ``` **Why this matters:** Even a single long-running transaction prevents autovacuum from cleaning up rows. The transaction's snapshot keeps those "dead" tuples visible. ## 2. **Check Current Autovacuum Activity** ```sql -- See what autovacuum is currently doing SELECT pid, age(clock_timestamp(), query_start) AS duration, usename, query, wait_event_type, wait_event FROM pg_stat_activity WHERE query LIKE '%autovacuum%' AND pid != pg_backend_pid(); ``` ## 3. **Identify Tables with Bloat** ```sql -- Check dead tuple stats per table SELECT schemaname, relname, n_live_tup, n_dead_tup, round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct, last_autovacuum, last_vacuum FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC LIMIT 20; ``` ## 4. **Check Autovacuum Configuration** ```sql -- Current autovacuum settings SELECT name, setting, unit, context FROM pg_settings WHERE name LIKE '%autovacuum%' OR name LIKE '%vacuum%' ORDER BY name; ``` ## Key Diagnostics **If autovacuum is blocked:** - Long-running queries in step #1 (especially `xact_age > 1 hour`) - Idle transactions holding locks - Replication slots lagging behind **If autovacuum needs tuning:** - `last_autovacuum` is NULL or very old - Dead tuple % keeps climbing - Autovacuum processes are running but falling behind ## Quick Fixes **If blocked by transactions:** ```sql -- Find and kill the blocking transaction (use with caution) SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid = <blocking_pid>; ``` **If autovacuum is too conservative:** ```sql -- Temporarily more aggressive settings (requires reload) ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05; -- default: 0.2 ALTER SYSTEM SET autovacuum_vacuum_threshold = 50; -- default: 50 ALTER SYSTEM SET autovacuum_max_workers = 5; -- default: 3 SELECT pg_reload_conf(); ``` **Manual vacuum for immediate relief:** ```sql VACUUM VERBOSE ANALYZE your_table_name; ``` --- Run these queries and share the results - I can help interpret them and recommend specific tuning based on what we find. The most common culprit is usually an abandoned idle transaction or a slowly-consuming application connection.
With Schema
16 tool calls11 Schema
I'll check Schema.ai's expert guidance on diagnosing autovacuum issues with dead tuples. Based on Schema.ai's expert guidance, here's how to diagnose whether you need to tune autovacuum settings or if something is blocking it: ## First: Check if autovacuum is blocked Run this query to find long-running transactions that prevent vacuum from cleaning up dead tuples: ```sql SELECT pid, datname, usename, state, backend_xmin, age(backend_xmin) as xmin_age, now() - xact_start AS duration FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY age(backend_xmin) DESC; ``` **What blocks autovacuum:** - **Long-running transactions** (on any table, not just the one being vacuumed) - **Stale replication slots** - check `pg_replication_slots` for high `age(xmin)` - **Abandoned prepared transactions** - check `pg_prepared_xacts` - **Long queries on standbys** (if `hot_standby_feedback = on`) If you see "dead tuples not yet removable" in autovacuum logs, this is the smoking gun that something is holding back the xmin horizon. ## Second: Check current dead tuple accumulation ```sql SELECT schemaname, relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, n_dead_tup::float / NULLIF(n_live_tup, 0) AS dead_ratio FROM pg_stat_user_tables WHERE n_dead_tup > 0 ORDER BY n_dead_tup DESC LIMIT 20; ``` ## Third: Determine if tuning is needed **For large tables (>1M rows)**, the default settings cause problems: - Default: `autovacuum_vacuum_threshold = 50` + `autovacuum_vacuum_scale_factor = 0.2` (20%) - A 10M row table needs **2M+ changes** before autovacuum triggers - This causes massive dead tuple accumulation **Fix for large tables:** ```sql ALTER TABLE your_large_table SET ( autovacuum_vacuum_scale_factor = 0.05, -- or even 0.01 autovacuum_vacuum_threshold = 1000 ); ``` **Global tuning options** (if many tables affected): - `autovacuum_max_workers` - increase concurrent vacuum processes - `autovacuum_naptime` - reduce time between autovacuum runs - `autovacuum_vacuum_cost_limit` - increase to make vacuum more aggressive ## Resolution steps 1. **If blocked**: Terminate blocking transactions with `pg_terminate_backend(pid)` 2. **If misconfigured**: Adjust scale factors for large tables 3. **Never disable autovacuum** - it risks transaction ID wraparound **Sources:** - [Citus: Debugging autovacuum problems](https://www.citusdata.com/blog/2022/07/28/debugging-postgres-autovacuum-problems-13-tips) - [pganalyze VACUUM Advisor](https://pganalyze.com/blog/introducing-vacuum-advisor-postgres) - [PostgreSQL docs: Routine Vacuuming](https://www.postgresql.org/docs/current/routine-vacuuming.html) - [pghealth.io production guide](https://pghealth.io/blog/postgresql_health_check_production_guide)

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When autovacuum falls behind, start by quantifying the dead tuple accumulation to understand severity, then verify autovacuum is actually running by checking last vacuum timestamps. The most common blockers are long-running transactions preventing cleanup and worker pool saturation on busy systems; if those aren't the issue, you likely need to tune autovacuum thresholds and cost limits to match your write workload.

1Quantify dead tuple accumulation and identify affected tables
Check `dead-tuple` (n_dead_tup) and calculate dead tuple percentage using `live-tuple` across all tables. Tables with >1000 dead tuples or >10% dead ratio are bloated—query pg_stat_user_tables ordered by n_dead_tup DESC to find the worst offenders. This baseline tells you whether you have a widespread problem or just a few high-churn tables, which determines whether you need global config changes or per-table tuning.
2Check if autovacuum is running at all
Look at `last-vacuum` and `postgresql-last-vacuum-age` for your high-dead-tuple tables. If last_autovacuum is NULL or >24 hours old on active tables, autovacuum isn't running or is severely delayed. Check `postgresql-vacuum-count` to see if the counter is incrementing over time—if autovacuum hasn't run in 24+ hours on tables accumulating dead tuples, something is blocking or disabling it entirely.
3Look for long-running transactions blocking vacuum
Query pg_stat_activity for transactions with xact_start older than 1+ hours. Even idle transactions hold a transaction ID snapshot that prevents autovacuum from reclaiming rows visible to that snapshot. A single connection sitting idle in transaction for hours can block autovacuum across the entire database—this is the most common cause of autovacuum appearing to "not work" despite being configured correctly.
4Check autovacuum worker pool saturation
Count active autovacuum workers with `SELECT count(*) FROM pg_stat_activity WHERE query LIKE 'autovacuum:%'`. If this count frequently equals autovacuum_max_workers (default 3), your workers are saturated and tables are queuing. With dozens of tables needing vacuum but only 3 workers, each table waits its turn while bloat accumulates—increase autovacuum_max_workers to 5-10 on busy systems to allow parallel processing.
5Evaluate autovacuum threshold and cost settings for your workload
For large or high-write tables, the default autovacuum_vacuum_scale_factor of 0.2 means autovacuum won't trigger until 20% of rows change—on a 10M row table, that's 2M changes before vacuum starts. Check `insert-since-vacuum` and `mod-since-analyze` to see how many changes are queued. Lower scale_factor to 0.05-0.1 for high-churn tables, increase autovacuum_vacuum_cost_limit from default 200 to 500-1000, and reduce cost_delay to 10-20ms to make vacuum more aggressive.
6Check for transaction ID wraparound pressure
Review `postgresql-before-xid-wraparound` to see how many transactions remain before wraparound. If this is below 10 million, PostgreSQL will force aggressive emergency autovacuum that can saturate workers and delay normal maintenance vacuum. Wraparound pressure can make it seem like autovacuum is "falling behind" when it's actually prioritizing anti-wraparound work over bloat cleanup.

Technologies

Related Insights

Autovacuum lag causes bloat and planner estimate degradation
outcomewarning
Autovacuum lag causes table bloat and performance degradation
warning
Table bloat from suboptimal autovacuum settings slows query performance
warning
Table bloat from insufficient vacuuming requires VACUUM FULL
warning
High dead tuple percentage degrades query performance and bloats tables
warning
Bloated tables with dead rows skew performance metrics and waste storage
warning
Autovacuum worker starvation delays vacuum on high-churn tables
warning
Autovacuum has not run in 24+ hours on active tables
warning
Table bloat from dead tuples degrades performance and wastes disk space
warning
Index bloat wastes disk space and degrades query performance
warning
long-running transactions preventing vacuum
Stub insight created from scenario discovery hint: long-running transactions preventing vacuum
approaching transaction ID wraparound
Stub insight created from scenario discovery hint: approaching transaction ID wraparound
Autovacuum falling behind on tables causes bloat accumulation
diagnostic_stepwarning
Autovacuum falls behind on high-write tables due to conservative defaults
prerequisitewarning
autovacuum workers insufficient for workload
prerequisite
Stub insight created from scenario discovery hint: autovacuum workers insufficient for workload
autovacuum cost limits too conservative
prerequisite
Stub insight created from scenario discovery hint: autovacuum cost limits too conservative
Blocked autovacuum allows unbounded bloat accumulation
prerequisitecritical
Dead tuple accumulation causes table bloat and performance degradation
outcomewarning
Autovacuum worker pool saturation delays emergency vacuum completion
prerequisitewarning

Relevant Metrics

Monitoring Interfaces

PostgreSQL Prometheus
PostgreSQL Datadog
PostgreSQL Native
PostgreSQL CloudWatch Metrics
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Datadog Integration