Category: Infrastructure

In this article, we will explore a case when one of our services scaled to its maximum and how we changed our alerting to stop this from becoming an issue in the future. The service we are using as an example in this article is deployed on Kubernetes (K8s) with autoscaling enabled. We scale based on requests per second and K8s is configured to keep the requests per second (RPS) at 50.

This is set to a 15-minute duration because if we burst to maximum pods and scale back down, this is not the end of the world.

This should be changed to consider the number of requests rather than the number of pods in use.

Related Articles