Right-sizing PostgreSQL cloud instances

infoCapacity Planning

Determining optimal CPU, memory, and storage configuration based on actual workload patterns to avoid over or under-provisioning

Prompt: Help me determine whether my Postgres deployment on Google Cloud SQL should be provisioned up or down based on my current workload. I'm seeing 45% CPU utilization on average but occasional spikes to 85%.

With vs Without Schema

Same prompt, same model (claude-sonnet-4-5-20250929) — the only difference is whether Schema's knowledge graph was available.

Tokens
1,388vs1,449
Latency
30.9smsvs33.3sms
Turns
7vs7
Tool Calls
2vs2
Schema Calls
0vs0
Response
386 charsvs362 chars
Evaluation
Without Schema
D+
vs
With Schema
D+
Both responses take the same approach: deferring actionable guidance in favor of asking clarifying questions. Neither response interprets the CPU utilization data provided (45% average with spikes to 85%), which could have been used to provide at least preliminary guidance on spike patterns, headroom concerns, or typical Cloud SQL best practices. The responses are nearly identical in substance and tone, showing no meaningful advantage from Schema.ai access. Without seeing the actual questions asked by each response, both appear to provide minimal immediate value to the operator seeking sizing guidance.
Without Schema
2 tool calls
I'd be happy to help you evaluate your Cloud SQL Postgres sizing. To give you a well-informed recommendation, I need to gather some more information about your workload and current setup. I've asked you some key questions about your workload. Once you answer these, I'll be able to give you a specific recommendation on whether to scale up, down, or optimize your current configuration.
With Schema
2 tool calls
I can help you analyze whether to scale your Cloud SQL Postgres instance. To give you a solid recommendation, I need to understand your workload better. I've asked you a few questions to better understand your workload. Once you provide those details, I can give you a specific recommendation on whether to scale up, down, or optimize your current configuration.

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When right-sizing a PostgreSQL Cloud SQL instance, start by establishing whether you're actually resource-constrained or just seeing normal headroom. Check cache efficiency and connection overhead before assuming you need more compute — most instances are over-provisioned on CPU but under-tuned on memory parameters. Finally, validate whether CPU spikes correlate with actual performance issues or are just healthy burst activity.

1Establish cache efficiency baseline
Calculate cache hit ratio: `postgresql.blocks_hit` ÷ (`postgresql.blocks_hit` + `postgresql.blocks_read`). This should be >99% for a healthy instance. If you're at 95-98%, your instance is likely under-provisioned on memory, not over-provisioned — adding CPU won't help. Low cache hit ratio means frequent disk reads that spike CPU during I/O waits, which would explain your 85% spikes.
2Check for memory waste from connection overhead
Look at `postgresql.backends` — if you're running 200+ connections, that's 1-2GB RAM consumed before any queries execute. This makes your instance appear memory-starved when you actually need PgBouncer connection pooling, not a bigger instance. Each PostgreSQL connection is a forked process with 5-10MB overhead, and excessive connections also cause context-switching that drives CPU spikes.
3Detect disk spill from insufficient work_mem
Check `postgresql.temp_files` and `postgresql.temp_bytes` — any non-zero values mean queries are spilling sorts/hashes to disk because work_mem is too low. This causes CPU spikes as PostgreSQL does extra I/O work and can make you think you need more compute when you actually just need memory tuning. If temp_bytes is growing steadily, increase work_mem to 64-128MB rather than upsizing the instance.
4Analyze I/O wait vs compute time
Compare `postgresql.blk_read_time` and `postgresql.blk_write_time` to total query execution time. If I/O wait exceeds 20% of query time, your bottleneck is storage tier, not CPU — upsizing the instance class won't help performance. You'd need to upgrade storage tier (e.g., from pd-standard to pd-ssd) or improve caching via shared_buffers tuning instead of adding vCPUs.
5Correlate CPU spikes with workload patterns
Cross-reference your CPU spike timestamps with `postgresql.queries.count` and `active_queries`. Are the 85% spikes during known batch jobs, ETL loads, or query bursts? If spikes last <5 minutes and don't cause connection timeouts or query failures, that's healthy burst capacity — your current instance provides good headroom. Sustained 85% for >15 minutes during normal operations indicates genuine under-provisioning.
6Validate downsize safety criteria
You can safely downsize one instance tier if all these hold: cache hit ratio >99%, `postgresql.temp_files` near zero over 7 days, `postgresql.backends` <100, and CPU spikes don't exceed 85% for more than 5 consecutive minutes. Your 45% average CPU with brief 85% spikes suggests over-provisioning if the other metrics check out. Test by downsizing during low-traffic hours and monitoring for 24-48 hours before committing.

Technologies

Related Insights

Insufficient shared_buffers limits PostgreSQL cache effectiveness
warning
Insufficient work_mem causes disk-based sorting and hashing
warning
Inadequate work_mem causes query spill to disk
warning
Double buffering wastes RAM when shared_buffers exceeds 40% of total memory
warning
Cache hit ratio below 99% indicates insufficient shared_buffers memory
warning
Connection limit exhaustion causes memory pressure and context-switching overhead
critical
resource bottleneck identification
diagnostic_step
Stub insight created from scenario discovery hint: resource bottleneck identification
instance over-provisioned for workload
supporting_evidence
Stub insight created from scenario discovery hint: instance over-provisioned for workload
need for vertical scaling
supporting_evidence
Stub insight created from scenario discovery hint: need for vertical scaling
cost optimization opportunity
supporting_evidence
Stub insight created from scenario discovery hint: cost optimization opportunity
workload characteristics and trends
supporting_evidence
Stub insight created from scenario discovery hint: workload characteristics and trends
safe to downsize instance class
diagnostic_step
Stub insight created from scenario discovery hint: safe to downsize instance class
storage tier optimization possible
diagnostic_step
Stub insight created from scenario discovery hint: storage tier optimization possible

Relevant Metrics

Monitoring Interfaces

PostgreSQL CloudWatch Metrics
PostgreSQL Datadog Integration
PostgreSQL Prometheus Exporter (pg_exporter)
PostgreSQL Prometheus
PostgreSQL Datadog