Comprehensive Operational Readiness Review

infoProactive Health

Baseline assessment of PostgreSQL instance health, configuration, and operational maturity across all dimensions.

Prompt: “I set up PostgreSQL a while ago and the workload has evolved a lot since then. How do I make sure it's well configured and provisioned? I want a comprehensive health check covering performance, capacity, reliability, and cost.”

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens

16,126vs1,195

Latency

3.9mmsvs26.7sms

Turns

21vs16

Tool Calls

13vs7

Schema Calls

0vs1

Response

2,218 charsvs799 chars

Evaluation

Without Schema

A-

With Schema

D+

The response without Schema delivers a complete, immediately actionable health check toolkit with specific SQL queries covering cache hit ratios, index bloat, transaction wraparound risk, replication status, and table growth metrics, plus ready-to-run shell scripts and detailed interpretation guides with thresholds. It comprehensively addresses all four requested areas (performance, capacity, reliability, cost) and provides concrete files the operator can execute right away. The response with Schema, however, provides no actionable guidance at all—it only requests permission to access the Schema.ai catalog without delivering any concrete health check steps, metrics, or commands. The operator would need to continue the conversation to receive the kind of comprehensive, immediately usable guidance that the non-Schema response provided from the start.

Without Schema

13 tool calls

I'll help you perform a comprehensive PostgreSQL health check. Let me start by understanding your setup.
I'll create a comprehensive PostgreSQL health check toolkit for you. Since I don't see an active PostgreSQL instance in this directory, I'll build you a complete assessment framework you can run against your database.
I've created a comprehensive PostgreSQL health check toolkit for you. Here's what's included:

## 📋 Files Created:

1. **postgres_health_check.sql** - Comprehensive SQL queries covering:
   - **Performance**: Cache hit ratios, slow queries, query statistics, missing/unused indexes, index bloat
   - **Capacity**: Table growth rates, disk space, connection usage
   - **Reliability**: Replication status, WAL archiving, transaction wraparound risk, table bloat, vacuum stats, locks, checkpoints
   - **Configuration**: Critical settings review, memory configuration, extensions, security

2. **postgres_health_recommendations.md** - Detailed interpretation guide with:
   - What to look for in each metric
   - Threshold values (healthy/warning/critical)
   - Specific action items for each issue
   - Configuration optimization templates
   - Cost reduction strategies
   - Cloud-specific recommendations (AWS/GCP/Azure)
   - Quick action checklist
   - Monitoring setup guide

3. **run_health_check.sh** - Interactive script to:
   - Connect to your PostgreSQL database
   - Run all health check queries
   - Generate timestamped reports
   - Check for required extensions

4. **cloud_health_check.sh** - Cloud provider-specific checks for:
   - AWS RDS/Aurora (CloudWatch metrics, instance details)
   - Google Cloud SQL (instance info, operations)
   - Azure Database for PostgreSQL (server details)
   - Cost optimization recommendations per provider

## 🚀 How to Use:

```bash
# For database-level analysis
./run_health_check.sh

# For cloud infrastructure analysis
./cloud_health_check.sh

# Or run SQL directly
psql -U username -d database -f postgres_health_check.sql > report.txt
```

The toolkit covers all four areas you requested: **performance, capacity, reliability, and cost** - giving you a complete picture of your PostgreSQL health and actionable optimization opportunities.

With Schema

7 tool calls1 Schema

I'll help you perform a comprehensive PostgreSQL health check. Let me start by exploring your setup to understand your current configuration.
I see you have a Schema.ai MCP server configured. This is perfect for PostgreSQL health checks! Schema.ai specializes in database observability and can provide comprehensive insights. Let me access it to help with your PostgreSQL assessment.
Perfect! Let me search the Schema.ai catalog for PostgreSQL health check guidance covering all your areas of concern.
I need your permission to access the Schema.ai catalog to provide you with comprehensive PostgreSQL health check guidance. This will give us access to industry best practices, key metrics to monitor, and specific insights about PostgreSQL performance, capacity, reliability, and cost optimization.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

For a comprehensive operational readiness review, start by understanding how your workload has evolved since initial setup, then ensure your reliability fundamentals (backups, replication, critical risks) are sound before diving into resource constraints, autovacuum effectiveness, and configuration tuning opportunities. This prioritizes preventing outages over optimizing performance.

1Establish baseline health and understand workload evolution

Start with `overall-instance-health-status` and `workload-characteristics-and-trends` to understand what's actually changed. Check `postgresql-backends` and `postgresql-connections-by-process` to see current connection patterns, and `postgresql-database-size` to understand growth. If your database has grown 10x but your config hasn't changed, you'll find issues in every other area. This baseline tells you what to prioritize in subsequent steps.

overall instance health status workload characteristics and trends postgresql.backendsconnections_by_processpostgresql.database.size

2Verify backup and disaster recovery readiness

Check `backup-and-recovery-readiness` and `replication-health` before anything else—if you can't recover from failure, performance tuning is meaningless. If you have replicas, verify `replication-lag` is consistently under 10 seconds; sustained lag over 60s means your replica can't keep up. Test your backup restoration process if you haven't in the last quarter; I've seen too many teams discover broken backups during an outage.

backup and recovery readiness replication health replication_lag_seconds

3Assess critical reliability risks that cause outages

Check `reliability-risk-assessment` for silent killers like transaction ID wraparound. Look at `postgresql-before-xid-wraparound`—if this is below 100 million, you're at risk of forced shutdown. Check `long-running-transaction-oldest-timestamp`—transactions open for hours block autovacuum and can cascade into performance emergencies. These risks tend to grow slowly as workload evolves, then bite you suddenly.

reliability risk assessment before_xid_wraparoundlong_running_transaction_oldest_timestamp

4Identify resource bottlenecks and capacity headroom

Check `resource-bottleneck-identification` and `capacity-headroom-adequacy` to see if you're running out of headroom. Look at `postgresql-percent-usage-connections`—if you're consistently above 80%, you'll start seeing connection errors. High `postgresql-temp-bytes` means `work_mem` is too low for your current workload. Review `postgresql-locks` for contention patterns that didn't exist when the workload was smaller.

resource bottleneck identification capacity headroom adequacy percent_usage_connectionspostgresql.temp_bytespostgresql.locks

5Evaluate autovacuum effectiveness and table bloat

Check `autovacuum-effectiveness` because configs that worked at low volume often fall behind as workload grows. Look at `dead-tuple` counts and `postgresql-table-bloat`—bloat consistently above 20% means autovacuum can't keep up with your update/delete rate. This degrades performance slowly over time, so it's easy to miss until queries that used to take milliseconds now take seconds. Tuning autovacuum is often the highest-impact fix for evolved workloads.

autovacuum effectiveness n_dead_tuptable_bloat

6Review configuration tuning and performance optimization opportunities

Check `configuration-tuning-needed` and `performance-optimization-opportunities` to align settings with your current workload. Look at `postgresql-buffer-hit` or `hit`—if this is below 99%, your `shared_buffers` may be undersized for current working set. Review `checkpoint-timed` frequency—if checkpoints happen too often, you're creating unnecessary I/O; too rarely, and recovery time suffers. Initial configurations rarely scale optimally with workload evolution.

configuration tuning needed performance optimization opportunities buffer_hithitscheckpoint_timed

7Identify cost optimization opportunities

Finally, check `cost-optimization-opportunities` now that you understand reliability and performance needs. Look for over-provisioned resources—if `postgresql-percent-usage-connections` is consistently below 30%, you may have room to reduce connection pooling overhead or consolidate instances. Review if monitoring and alerting gaps from `monitoring-and-alerting-gaps` mean you're over-provisioned "just in case." Cost optimization comes last because you need reliability and performance baselines first.

cost optimization opportunities monitoring and alerting gaps percent_usage_connections