Cache Hit Ratio Degradation

warningProactive Health

Buffer cache hit ratio has dropped, indicating more disk I/O and potential need for memory tuning.

Prompt: “I noticed our PostgreSQL cache hit ratio dropped from 99% to 85% over the past week. Is this a problem and should I increase shared_buffers or is something else going on?”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

917vs1,625

Latency

18.1smsvs34.6sms

Turns

2vs14

Tool Calls

0vs6

Schema Calls

0vs4

Response

2,154 charsvs1,812 chars

Evaluation

Without Schema

B+

With Schema

A-

Both responses correctly emphasize diagnosing the root cause before adjusting shared_buffers. The response without Schema provides more comprehensive diagnostic coverage (four potential causes vs. two query approaches) and specific SQL for investigating sequential scans and per-table cache behavior. However, the response with Schema delivers more decisive, actionable guidance by providing specific configuration thresholds (25% of RAM for shared_buffers, 50-75% for effective_cache_size) with concrete examples, plus a source citation for credibility. The key differentiator is completeness: without Schema ends by asking the user another question, while with Schema provides both diagnosis steps and the specific configuration values needed if tuning is warranted.

Without Schema

A drop from 99% to 85% is significant and worth investigating. Before increasing `shared_buffers`, you should determine **why** the ratio dropped - the cause matters more than the symptom.

## Common causes (in order of likelihood):

1. **Workload change** - New queries, larger tables, or different access patterns
   - Check: Query patterns, table sizes, recent deploys
   
2. **Data growth** - Working set outgrew available cache
   - Check: `pg_stat_user_tables` for table size growth
   
3. **Cache was flushed** - Server restart, config change, or maintenance
   - Check: `pg_postmaster_start_time` and uptime
   
4. **Increased sequential scans** - Full table scans bypass/pollute the cache
   - Check: `pg_stat_user_tables.seq_scan` and `seq_tup_read` trends

## How to investigate:

```sql
-- Check cache stats per database
SELECT datname, 
       blks_hit, 
       blks_read,
       round(100.0 * blks_hit / NULLIF(blks_hit + blks_read, 0), 2) AS hit_ratio
FROM pg_stat_database
WHERE datname IS NOT NULL;

-- Find tables with low cache hit ratios
SELECT schemaname, relname,
       heap_blks_hit,
       heap_blks_read,
       round(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS hit_ratio
FROM pg_statio_user_tables
WHERE heap_blks_read > 0
ORDER BY heap_blks_read DESC
LIMIT 20;

-- Check for sequential scans
SELECT schemaname, relname, seq_scan, seq_tup_read, 
       idx_scan, n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_scan DESC
LIMIT 20;
```

## When to increase shared_buffers:

- ✅ If working set grew legitimately and you have RAM to spare
- ✅ If cache hit ratio has been declining gradually over time
- ❌ If the drop was sudden (investigate the cause first)
- ❌ If you're seeing excessive sequential scans (add indexes instead)
- ❌ If server was recently restarted (cache is still warming up)

**85% isn't catastrophic**, but if it stays there, you're reading 15% of data from disk unnecessarily. Find the root cause first - it might be a missing index, inefficient query, or legitimate growth.

What does your monitoring show about query patterns or table growth in the past week?

With Schema

6 tool calls4 Schema

I'll help you diagnose this PostgreSQL cache hit ratio drop. Let me get expert guidance on this issue.
**Yes, this is a problem** — an 85% cache hit ratio means 15% of your reads are hitting disk, which is significant performance degradation. However, don't immediately increase `shared_buffers` without diagnosing first.

## Diagnose the root cause:

**1. Verify the current buffer hit ratio:**
```sql
SELECT sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) as buffer_hit_ratio 
FROM pg_stat_database;
```

**2. Identify which tables are causing cache misses:**
```sql
SELECT relname, 
       heap_blks_hit * 100.0 / nullif(heap_blks_hit + heap_blks_read, 0) as hit_ratio 
FROM pg_statio_user_tables 
WHERE heap_blks_read > 0 
ORDER BY hit_ratio;
```

## What to look for:

- **If specific tables have low hit ratios**: Your working set may have grown (new data, increased query patterns). This supports increasing `shared_buffers`.
- **If the drop is sudden**: Check for workload changes, new queries, or data growth over the past week.

## Configuration tuning:

If diagnosis confirms you need more buffer cache:

- **`shared_buffers`**: Set to **25% of total RAM** (max 40%)
  - Example: 4GB for a 16GB server, 16GB for a 64GB server
- **`effective_cache_size`**: Verify it's set to **50-75% of total RAM** (this helps the planner estimate available OS + PostgreSQL cache)

## Important:

The 99% threshold is based on production incidents — 85% means you're doing significantly more physical I/O. But increasing `shared_buffers` without understanding *why* the working set grew could mask a different issue (inefficient queries, missing indexes, unexpected data growth).

**Source:** [PostgreSQL Production Configuration](https://oneuptime.com/blog/post/2026-01-21-postgresql-production-configuration/view)

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

A drop in cache hit ratio from 99% to 85% is significant and warrants investigation. Start by confirming the calculation and identifying which databases are affected, then check if shared_buffers is undersized relative to your working set. However, don't immediately increase memory — first rule out workload changes like new sequential scans or table growth that could be addressed with better indexing or query optimization.

1Calculate cache hit ratio per database and confirm the drop

First, verify the drop using the standard formula: `blks_hit / (blks_hit + blks_read) * 100` from pg_stat_database. Look at `block-hit` and `postgresql-blocks-read` metrics over the past week to confirm when the drop occurred and which databases are affected. A system-wide drop points to a memory or infrastructure issue, while a single-database drop suggests a workload or schema change. The `low-buffer-hit-ratio` and `cache-hit-ratio-below-99-percent` insights apply here — anything below 99% warrants attention, and 85% is concerning.

Low buffer hit ratio indicates insufficient shared_buffers or cache pressure Cache hit ratio below 99% indicates undersized shared_buffers or high disk I/O blks_hitpostgresql.blocks_readpostgresql.blocks_hit

2Check shared_buffers configuration and utilization

Check your current shared_buffers setting in postgresql.conf — if it's still at the default 128MB, that's almost certainly your problem (see `insufficient-shared-buffers-cache` and `insufficient-shared-buffers-disk-reads`). Look at `postgresql-buffercache-used-buffers` and `buffercache-summary-buffer-used` to see if you're using close to 100% of allocated buffers. If buffers are maxed out and the setting is less than 25% of available RAM, that's a strong signal you need more memory. However, don't restart PostgreSQL to change this until you complete the next steps — there may be query issues you can fix first.

Insufficient shared_buffers limits PostgreSQL cache effectiveness Insufficient shared_buffers causes excessive disk reads and slow queries shared_buffers undersized buffercache_used_buffersbuffercache_summary_buffer_used

3Identify increases in sequential scans and heap reads

Check if `sequential-tuple-read` has spiked over the past week — large sequential scans bypass shared_buffers and read directly from disk, which tanks your cache hit ratio. Compare current `heap-block-read` and `index-block-read` to historical baselines. If you see a sudden increase in sequential scans on tables with >100K rows, you likely have a missing index or a new query pattern that's doing full table scans. This is the `large-sequential-scans-bypassing-cache` scenario, and it's better fixed with indexes than by throwing more RAM at it.

large sequential scans bypassing cache workload change causing more disk I/O seq_tup_readheap_blks_readidx_blks_read

4Analyze per-table cache hit ratios using pg_statio

Query pg_statio_user_tables to find which specific tables have poor cache hit ratios. Look at `postgresql-heap-blocks-read` vs `postgresql-heap-blocks-hit` and `postgresql-index-blocks-read` vs `postgresql-index-blocks-hit` per table. If a few large tables dominate disk reads and they're part of your working set, that suggests `working-set-larger-than-shared-buffers`. If those tables weren't heavily queried a week ago, that's a workload change. If they've always been queried but recently grew significantly, that's data growth requiring more memory.

working set larger than shared_buffers Cache hit ratio below optimal causes excessive disk I/O postgresql.heap_blocks_readpostgresql.heap_blocks_hitpostgresql.index_blocks_readpostgresql.index_blocks_hit

5Correlate the timing with application changes or data growth

Look at exactly when the cache hit ratio dropped — was it gradual over a week or a sudden cliff? Correlate this with application deployments, new features, batch job changes, or significant data growth. Check `postgresql-blocks-read` and `read` metrics around that time. A gradual decline suggests your working set is outgrowing shared_buffers. A sudden drop points to a specific code or schema change. If you find the trigger event, you might be able to fix it with query optimization or an index rather than adding memory.

workload change causing more disk I/O need for memory increase postgresql.blocks_readreads

6Make the decision: tune queries vs increase shared_buffers

Based on your findings, decide your next step. If shared_buffers is undersized (<25% of RAM) and buffers are fully utilized with no obvious query issues, increase shared_buffers to 25-40% of available RAM as recommended by `insufficient-shared-buffers-disk-reads` and `buffer-cache-hit-ratio-low`. If you found problematic sequential scans or missing indexes, fix those first — they'll give you immediate improvement without a restart. If your working set legitimately grew and you're already at 25% RAM, consider upgrading your instance size. Monitor `buffer-cache-hit-ratio-low` and aim to get back above 95%, ideally above 99%.

Insufficient shared_buffers causes excessive disk reads and slow queries Buffer cache hit ratio below 95% causes excessive disk I/O Cache hit ratio below 99% indicates insufficient shared_buffers memory blks_hitpostgresql.blocks_read