How We Cut Rails on GKE Costs by 60%: The "Efficiency First" Roadmap
tl;dr: We reduced Google Kubernetes Engine(GKE) costs by 60%. The biggest wins came not from Kubernetes tuning, but from understanding why our Rails app needed so many Pods in the first place: Rail...

Source: DEV Community
tl;dr: We reduced Google Kubernetes Engine(GKE) costs by 60%. The biggest wins came not from Kubernetes tuning, but from understanding why our Rails app needed so many Pods in the first place: Rails was running 1 Puma worker with 33 threads. Ruby's GVL made this effectively single-core. We switched to 4 workers with 8 threads. API authentication used bcrypt on every request. We replaced it with a lighter method. GKE node generation was outdated: upgrading from n1 to n2d gave 56% more CPU, 23% more RAM, for 3% less cost. Only after fixing per-Pod efficiency did we add KEDA Cron autoscaling and GKE node autoscaling. The order mattered. We improved per-Pod efficiency first, then used autoscaling to stop paying for idle capacity. The interesting part was not any one change by itself, but why these four changes reinforced each other. The rest of this article walks through the reasoning behind each one. I work on a B2B SaaS platform that runs on Google Kubernetes Engine. The API server is a