When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When right-sizing a PostgreSQL Cloud SQL instance, start by establishing whether you're actually resource-constrained or just seeing normal headroom. Check cache efficiency and connection overhead before assuming you need more compute — most instances are over-provisioned on CPU but under-tuned on memory parameters. Finally, validate whether CPU spikes correlate with actual performance issues or are just healthy burst activity.
1Establish cache efficiency baseline
Calculate cache hit ratio: `postgresql.blocks_hit` ÷ (`postgresql.blocks_hit` + `postgresql.blocks_read`). This should be >99% for a healthy instance. If you're at 95-98%, your instance is likely under-provisioned on memory, not over-provisioned — adding CPU won't help. Low cache hit ratio means frequent disk reads that spike CPU during I/O waits, which would explain your 85% spikes.
2Check for memory waste from connection overhead
Look at `postgresql.backends` — if you're running 200+ connections, that's 1-2GB RAM consumed before any queries execute. This makes your instance appear memory-starved when you actually need PgBouncer connection pooling, not a bigger instance. Each PostgreSQL connection is a forked process with 5-10MB overhead, and excessive connections also cause context-switching that drives CPU spikes.
3Detect disk spill from insufficient work_mem
Check `postgresql.temp_files` and `postgresql.temp_bytes` — any non-zero values mean queries are spilling sorts/hashes to disk because work_mem is too low. This causes CPU spikes as PostgreSQL does extra I/O work and can make you think you need more compute when you actually just need memory tuning. If temp_bytes is growing steadily, increase work_mem to 64-128MB rather than upsizing the instance.
4Analyze I/O wait vs compute time
Compare `postgresql.blk_read_time` and `postgresql.blk_write_time` to total query execution time. If I/O wait exceeds 20% of query time, your bottleneck is storage tier, not CPU — upsizing the instance class won't help performance. You'd need to upgrade storage tier (e.g., from pd-standard to pd-ssd) or improve caching via shared_buffers tuning instead of adding vCPUs.
5Correlate CPU spikes with workload patterns
Cross-reference your CPU spike timestamps with `postgresql.queries.count` and `active_queries`. Are the 85% spikes during known batch jobs, ETL loads, or query bursts? If spikes last <5 minutes and don't cause connection timeouts or query failures, that's healthy burst capacity — your current instance provides good headroom. Sustained 85% for >15 minutes during normal operations indicates genuine under-provisioning.
6Validate downsize safety criteria
You can safely downsize one instance tier if all these hold: cache hit ratio >99%, `postgresql.temp_files` near zero over 7 days, `postgresql.backends` <100, and CPU spikes don't exceed 85% for more than 5 consecutive minutes. Your 45% average CPU with brief 85% spikes suggests over-provisioning if the other metrics check out. Test by downsizing during low-traffic hours and monitoring for 24-48 hours before committing.