I'll help you diagnose your autovacuum issues. Here's a systematic approach to identify whether autovacuum is blocked or just needs tuning:
## 1. **Check for Long-Running Transactions (Most Common Blocker)**
```sql
-- Find long-running transactions that block autovacuum
SELECT pid,
usename,
application_name,
state,
age(clock_timestamp(), xact_start) AS xact_age,
age(clock_timestamp(), query_start) AS query_age,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND xact_start IS NOT NULL
ORDER BY xact_start;
```
**Why this matters:** Even a single long-running transaction prevents autovacuum from cleaning up rows. The transaction's snapshot keeps those "dead" tuples visible.
## 2. **Check Current Autovacuum Activity**
```sql
-- See what autovacuum is currently doing
SELECT pid,
age(clock_timestamp(), query_start) AS duration,
usename,
query,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE query LIKE '%autovacuum%'
AND pid != pg_backend_pid();
```
## 3. **Identify Tables with Bloat**
```sql
-- Check dead tuple stats per table
SELECT schemaname,
relname,
n_live_tup,
n_dead_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum,
last_vacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;
```
## 4. **Check Autovacuum Configuration**
```sql
-- Current autovacuum settings
SELECT name, setting, unit, context
FROM pg_settings
WHERE name LIKE '%autovacuum%'
OR name LIKE '%vacuum%'
ORDER BY name;
```
## Key Diagnostics
**If autovacuum is blocked:**
- Long-running queries in step #1 (especially `xact_age > 1 hour`)
- Idle transactions holding locks
- Replication slots lagging behind
**If autovacuum needs tuning:**
- `last_autovacuum` is NULL or very old
- Dead tuple % keeps climbing
- Autovacuum processes are running but falling behind
## Quick Fixes
**If blocked by transactions:**
```sql
-- Find and kill the blocking transaction (use with caution)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <blocking_pid>;
```
**If autovacuum is too conservative:**
```sql
-- Temporarily more aggressive settings (requires reload)
ALTER SYSTEM SET autovacuum_vacuum_scale_factor = 0.05; -- default: 0.2
ALTER SYSTEM SET autovacuum_vacuum_threshold = 50; -- default: 50
ALTER SYSTEM SET autovacuum_max_workers = 5; -- default: 3
SELECT pg_reload_conf();
```
**Manual vacuum for immediate relief:**
```sql
VACUUM VERBOSE ANALYZE your_table_name;
```
---
Run these queries and share the results - I can help interpret them and recommend specific tuning based on what we find. The most common culprit is usually an abandoned idle transaction or a slowly-consuming application connection.
With Schema16 tool calls11 Schema
I'll check Schema.ai's expert guidance on diagnosing autovacuum issues with dead tuples.
Based on Schema.ai's expert guidance, here's how to diagnose whether you need to tune autovacuum settings or if something is blocking it:
## First: Check if autovacuum is blocked
Run this query to find long-running transactions that prevent vacuum from cleaning up dead tuples:
```sql
SELECT pid, datname, usename, state, backend_xmin,
age(backend_xmin) as xmin_age,
now() - xact_start AS duration
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC;
```
**What blocks autovacuum:**
- **Long-running transactions** (on any table, not just the one being vacuumed)
- **Stale replication slots** - check `pg_replication_slots` for high `age(xmin)`
- **Abandoned prepared transactions** - check `pg_prepared_xacts`
- **Long queries on standbys** (if `hot_standby_feedback = on`)
If you see "dead tuples not yet removable" in autovacuum logs, this is the smoking gun that something is holding back the xmin horizon.
## Second: Check current dead tuple accumulation
```sql
SELECT schemaname, relname, n_live_tup, n_dead_tup,
last_vacuum, last_autovacuum,
n_dead_tup::float / NULLIF(n_live_tup, 0) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC
LIMIT 20;
```
## Third: Determine if tuning is needed
**For large tables (>1M rows)**, the default settings cause problems:
- Default: `autovacuum_vacuum_threshold = 50` + `autovacuum_vacuum_scale_factor = 0.2` (20%)
- A 10M row table needs **2M+ changes** before autovacuum triggers
- This causes massive dead tuple accumulation
**Fix for large tables:**
```sql
ALTER TABLE your_large_table SET (
autovacuum_vacuum_scale_factor = 0.05, -- or even 0.01
autovacuum_vacuum_threshold = 1000
);
```
**Global tuning options** (if many tables affected):
- `autovacuum_max_workers` - increase concurrent vacuum processes
- `autovacuum_naptime` - reduce time between autovacuum runs
- `autovacuum_vacuum_cost_limit` - increase to make vacuum more aggressive
## Resolution steps
1. **If blocked**: Terminate blocking transactions with `pg_terminate_backend(pid)`
2. **If misconfigured**: Adjust scale factors for large tables
3. **Never disable autovacuum** - it risks transaction ID wraparound
**Sources:**
- [Citus: Debugging autovacuum problems](https://www.citusdata.com/blog/2022/07/28/debugging-postgres-autovacuum-problems-13-tips)
- [pganalyze VACUUM Advisor](https://pganalyze.com/blog/introducing-vacuum-advisor-postgres)
- [PostgreSQL docs: Routine Vacuuming](https://www.postgresql.org/docs/current/routine-vacuuming.html)
- [pghealth.io production guide](https://pghealth.io/blog/postgresql_health_check_production_guide)