Cache Hit Ratio Degradation
warningProactive Health
Buffer cache hit ratio has dropped, indicating more disk I/O and potential need for memory tuning.
Prompt: “I noticed our PostgreSQL cache hit ratio dropped from 99% to 85% over the past week. Is this a problem and should I increase shared_buffers or is something else going on?”
With vs Without Schema
Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.
Tokens
917vs1,625
Latency
18.1smsvs34.6sms
Turns
2vs14
Tool Calls
0vs6
Schema Calls
0vs4
Response
2,154 charsvs1,812 chars
Evaluation
Without Schema
B+
vs
With Schema
A-
Both responses correctly emphasize diagnosing the root cause before adjusting shared_buffers. The response without Schema provides more comprehensive diagnostic coverage (four potential causes vs. two query approaches) and specific SQL for investigating sequential scans and per-table cache behavior. However, the response with Schema delivers more decisive, actionable guidance by providing specific configuration thresholds (25% of RAM for shared_buffers, 50-75% for effective_cache_size) with concrete examples, plus a source citation for credibility. The key differentiator is completeness: without Schema ends by asking the user another question, while with Schema provides both diagnosis steps and the specific configuration values needed if tuning is warranted.
Without Schema
A drop from 99% to 85% is significant and worth investigating. Before increasing `shared_buffers`, you should determine **why** the ratio dropped - the cause matters more than the symptom.
## Common causes (in order of likelihood):
1. **Workload change** - New queries, larger tables, or different access patterns
- Check: Query patterns, table sizes, recent deploys
2. **Data growth** - Working set outgrew available cache
- Check: `pg_stat_user_tables` for table size growth
3. **Cache was flushed** - Server restart, config change, or maintenance
- Check: `pg_postmaster_start_time` and uptime
4. **Increased sequential scans** - Full table scans bypass/pollute the cache
- Check: `pg_stat_user_tables.seq_scan` and `seq_tup_read` trends
## How to investigate:
```sql
-- Check cache stats per database
SELECT datname,
blks_hit,
blks_read,
round(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS hit_ratio
FROM pg_stat_database
WHERE datname IS NOT NULL;
-- Find tables with low cache hit ratios
SELECT schemaname, relname,
heap_blks_hit,
heap_blks_read,
round(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY heap_blks_read DESC
LIMIT 20;
-- Check for sequential scans
SELECT schemaname, relname, seq_scan, seq_tup_read,
idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC
LIMIT 20;
```
## When to increase shared_buffers:
- ✅ If working set grew legitimately and you have RAM to spare
- ✅ If cache hit ratio has been declining gradually over time
- ❌ If the drop was sudden (investigate the cause first)
- ❌ If you're seeing excessive sequential scans (add indexes instead)
- ❌ If server was recently restarted (cache is still warming up)
**85% isn't catastrophic**, but if it stays there, you're reading 15% of data from disk unnecessarily. Find the root cause first - it might be a missing index, inefficient query, or legitimate growth.
What does your monitoring show about query patterns or table growth in the past week?
With Schema
6 tool calls4 Schema
I'll help you diagnose this PostgreSQL cache hit ratio drop. Let me get expert guidance on this issue.
**Yes, this is a problem** — an 85% cache hit ratio means 15% of your reads are hitting disk, which is significant performance degradation. However, don't immediately increase `shared_buffers` without diagnosing first.
## Diagnose the root cause:
**1. Verify the current buffer hit ratio:**
```sql
SELECT sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) as buffer_hit_ratio
FROM pg_stat_database;
```
**2. Identify which tables are causing cache misses:**
```sql
SELECT relname,
heap_blks_hit * 100.0 / nullif(heap_blks_hit + heap_blks_read, 0) as hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY hit_ratio;
```
## What to look for:
- **If specific tables have low hit ratios**: Your working set may have grown (new data, increased query patterns). This supports increasing `shared_buffers`.
- **If the drop is sudden**: Check for workload changes, new queries, or data growth over the past week.
## Configuration tuning:
If diagnosis confirms you need more buffer cache:
- **`shared_buffers`**: Set to **25% of total RAM** (max 40%)
- Example: 4GB for a 16GB server, 16GB for a 64GB server
- **`effective_cache_size`**: Verify it's set to **50-75% of total RAM** (this helps the planner estimate available OS + PostgreSQL cache)
## Important:
The 99% threshold is based on production incidents — 85% means you're doing significantly more physical I/O. But increasing `shared_buffers` without understanding *why* the working set grew could mask a different issue (inefficient queries, missing indexes, unexpected data growth).
**Source:** [PostgreSQL Production Configuration](https://oneuptime.com/blog/post/2026-01-21-postgresql-production-configuration/view)
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
A drop in cache hit ratio from 99% to 85% is significant and warrants investigation. Start by confirming the calculation and identifying which databases are affected, then check if shared_buffers is undersized relative to your working set. However, don't immediately increase memory — first rule out workload changes like new sequential scans or table growth that could be addressed with better indexing or query optimization.
1Calculate cache hit ratio per database and confirm the drop
First, verify the drop using the standard formula: `blks_hit / (blks_hit + blks_read) * 100` from pg_stat_database. Look at `block-hit` and `postgresql-blocks-read` metrics over the past week to confirm when the drop occurred and which databases are affected. A system-wide drop points to a memory or infrastructure issue, while a single-database drop suggests a workload or schema change. The `low-buffer-hit-ratio` and `cache-hit-ratio-below-99-percent` insights apply here — anything below 99% warrants attention, and 85% is concerning.
2Check shared_buffers configuration and utilization
Check your current shared_buffers setting in postgresql.conf — if it's still at the default 128MB, that's almost certainly your problem (see `insufficient-shared-buffers-cache` and `insufficient-shared-buffers-disk-reads`). Look at `postgresql-buffercache-used-buffers` and `buffercache-summary-buffer-used` to see if you're using close to 100% of allocated buffers. If buffers are maxed out and the setting is less than 25% of available RAM, that's a strong signal you need more memory. However, don't restart PostgreSQL to change this until you complete the next steps — there may be query issues you can fix first.
3Identify increases in sequential scans and heap reads
Check if `sequential-tuple-read` has spiked over the past week — large sequential scans bypass shared_buffers and read directly from disk, which tanks your cache hit ratio. Compare current `heap-block-read` and `index-block-read` to historical baselines. If you see a sudden increase in sequential scans on tables with >100K rows, you likely have a missing index or a new query pattern that's doing full table scans. This is the `large-sequential-scans-bypassing-cache` scenario, and it's better fixed with indexes than by throwing more RAM at it.
4Analyze per-table cache hit ratios using pg_statio
Query pg_statio_user_tables to find which specific tables have poor cache hit ratios. Look at `postgresql-heap-blocks-read` vs `postgresql-heap-blocks-hit` and `postgresql-index-blocks-read` vs `postgresql-index-blocks-hit` per table. If a few large tables dominate disk reads and they're part of your working set, that suggests `working-set-larger-than-shared-buffers`. If those tables weren't heavily queried a week ago, that's a workload change. If they've always been queried but recently grew significantly, that's data growth requiring more memory.
5Correlate the timing with application changes or data growth
Look at exactly when the cache hit ratio dropped — was it gradual over a week or a sudden cliff? Correlate this with application deployments, new features, batch job changes, or significant data growth. Check `postgresql-blocks-read` and `read` metrics around that time. A gradual decline suggests your working set is outgrowing shared_buffers. A sudden drop points to a specific code or schema change. If you find the trigger event, you might be able to fix it with query optimization or an index rather than adding memory.
6Make the decision: tune queries vs increase shared_buffers
Based on your findings, decide your next step. If shared_buffers is undersized (<25% of RAM) and buffers are fully utilized with no obvious query issues, increase shared_buffers to 25-40% of available RAM as recommended by `insufficient-shared-buffers-disk-reads` and `buffer-cache-hit-ratio-low`. If you found problematic sequential scans or missing indexes, fix those first — they'll give you immediate improvement without a restart. If your working set legitimately grew and you're already at 25% RAM, consider upgrading your instance size. Monitor `buffer-cache-hit-ratio-low` and aim to get back above 95%, ideally above 99%.
Technologies
Related Insights
Low buffer hit ratio indicates insufficient shared_buffers or cache pressure
warning
Insufficient shared_buffers causes excessive disk reads and slow queries
prerequisitewarning
cache hit ratio below healthy threshold
Stub insight created from scenario discovery hint: cache hit ratio below healthy threshold
workload change causing more disk I/O
prerequisite
Stub insight created from scenario discovery hint: workload change causing more disk I/O
shared_buffers undersized
Stub insight created from scenario discovery hint: shared_buffers undersized
large sequential scans bypassing cache
prerequisite
Stub insight created from scenario discovery hint: large sequential scans bypassing cache
need for memory increase
Stub insight created from scenario discovery hint: need for memory increase
Cache hit ratio below 99% indicates undersized shared_buffers or high disk I/O
threshold_indicatorwarning
Cache hit ratio below 99% indicates insufficient shared_buffers memory
prerequisitewarning
Buffer cache hit ratio below 95% causes excessive disk I/O
diagnostic_stepwarning
Cache hit ratio below optimal causes excessive disk I/O
outcomewarning
working set larger than shared_buffers
prerequisite
Stub insight created from scenario discovery hint: working set larger than shared_buffers
Insufficient shared_buffers limits PostgreSQL cache effectiveness
prerequisitewarning
Relevant Metrics
postgresql.blocks_hitpostgresql.blocks_readpostgresql.heap_blocks_hitpostgresql.heap_blocks_read(monitor)postgresql.index_blocks_hitpostgresql.index_blocks_read(monitor)heap_blks_readidx_blks_readhitsreadsbuffer_hit(primary_diagnostic)blks_hit(primary_diagnostic)blks_written(secondary_diagnostic)disk_read(secondary_diagnostic)heap_blks_hit(secondary_diagnostic)idx_blks_hit(secondary_diagnostic)buffercache_summary_buffer_used(monitor)buffercache_used_buffers(secondary_diagnostic)seq_tup_read(monitor)Monitoring Interfaces
PostgreSQL Native