In cloud computing, autoscaling is the process of dynamically adjusting the number of instances or resources in response to changes in demand or workload. Horizontal autoscaling is a process by which the number of running instances of a service or application is automatically increased or decreased (“scaling out” and “scaling in,” respectively). Meanwhile, vertical autoscaling involves dynamically adjusting the resources allocated to an instance, such as the amount of memory, CPU or disk storage.
Instead of adding more replicas to our workloads, the Kubernetes VPA scales the cluster capacity vertically by allocating more CPU or memory resources to the existing pods.
Kubernetes HPA adjusts the cluster horizontally by adjusting the cluster size to have the appropriate number of pods based on configuration thresholds.