Intro
Welcome to part 3 of our 'How to scale your Kubernetes Clusters' blog series. If you missed any of the previous parts, you can check out part 1: horizontal pod autoscaler here and part 2: cluster autoscaler here . In part 2, we're going to be covering how to use the Vertical Pod Autoscaler to dynamically give your pods more resources as they need it instead of spinning up another pod (which is what the HPA would do).
Let's get into it.
What is the Vertical Pod Autoscaler?
The Kubernetes Vertical Pod Autoscaler (VPA) automatically adjusts the resource requests and limits of pods based on their actual resource usage. It optimizes resource allocation in kubernetes clusters by dynamically adjusting the CPU and memory resource requests of pods to match their actual resource usage patterns. This makes it much easier to scale your pods without having to guess what resources you need.
Traditionally, you need to manually set resource requests and limits for each pod which can be challenging because it's hard to know the resource usage ahead of time. If you set the resource requests too low, it can impact the performance of your application, and if they are set too high, you may be paying for resources you don't need. The VPA solves this problem by continuously monitoring the resource utilization of pods and automatically adjusting their resource requests and limits as needed.
VPA vs HPA
The main difference between the VPA and HPA (horizontal pod autoscaler ) is that the VPA scales the existing pod while the HPA scales the number of pods. So the HPA creates more pods in order to handle requests while the VPA scales the existing pod to be able ot handle more requests. Here's a graphic to explain that a little further:
On the VPA side, you can see how we scale the existing pod from 2 vCPU to 4 vCPU. We're not creating new pods just scaling the existing one. On the HPA side, you can see how we're scaling from 1 pod with 2 vCPU to 2 pods, each with 2 vCPU. Both the VPA and HPA work on scaling pods just in two different ways.
One other thing to mention here is that in most cases you should not use the VPA and HPA. This is mainly because they both listen to CPU and memory metrics and using both at the same time will likely cause your pods to behave in unexpected ways. For more advanced use cases, you could use the VPA and HPA with the HPA listening to custom metrics.
How does the VPA actually work?
Now that we know how the VPA works at a high-level, let's dig into the details. The VPA is made up of three components:
- Recommender - monitors the current and past resource consumption and provides recommended CPU and memory request values for a container.
- Updater - determines which pods should be restarted based on the resource recommendation calculated by the Recommender. It will also try to evict the pod if it should be updated but it doesn't perform the actual resources update on the pod.
- Admission Controller - Sets the correct resource requests on new pods (that is, pods just created or recreated by their controller due to changes made by the Updater).
These three components work to identify pods that should be vertically scaled, restart those pods and then dynamically update the resource values for those pods. Additionally, there are four main modes that that VPA can run in to update values:
- Auto - VPA will automatically recycle the pods that should have their resource limits changed and restart them. This isn't recommended for production use since it could cause workload disruption.
- Initial - VPA will apply the recommended resource values it calculates to newly created pods, for example, when a new service is deployed.
- Off - VPA just stores the calculated resource values for reference and doesn't update the pods at all.
- Recreate - VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined). This mode should be used rarely, only if you need to ensure that the pods are restarted whenever the resource request changes. Otherwise, prefer the "Auto" mode which may take advantage of restart-free updates once they are available.
Let's take a look at a diagram and then a step-by-step workflow of the VPA in Auto
mode.
- The Recommender pulls the resource utilization metrics from the metrics server (needed here similar to HPA) and provides pod resource recommendations to the VPA
- The Updater reads the recommendations from the Recommender and initiates pod termination
- The deployment sees that the pod was terminated and recreates the pod
- While the pod is being created, the Admission Controller gest the pod resources recommendation and injects the updated resources values into the new pod's spec
One important thing to note here is that the VPA doesn't have access to the node resources and therefore might recommend more resources to a pod than the node has which would cause Kubernetes to not scheduled the pod onto a node. You can use the cluster autoscaler here to help with this but generally we recommend setting the LimitRange
to the maximum available resources
to ensure that the VPA doesn't recommend more resources than are available in the node.
Installing the VPA
Let's walk through how to install the VPA into your cluster. We're going to use AWS as our cloud provider.
Similar to how we started with the HPA, the first step is to install the metrics server. Luckily, this is pretty straightforward.
You can deploy the metrics server using the following command in kubetctl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify that the metrics-server
deployment is running in the kube-system
namespace.
kubectl get deployment metrics-server -n kube-system
You should see something that looks like this:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 6m
Now that we have the metrics server installed, we can go ahead with the VPA installation.
Pull down the VPA source code:
git clone https://github.com/kubernetes/autoscaler.git
Inside the Vertical-pod-autoscaler
directory, run:
./hack/vpa-up.sh
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalercheckpoints.autoscaling.k8s.io created
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalers.autoscaling.k8s.io created
clusterrole.rbac.authorization.k8s.io/system:metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:vpa-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:evictioner created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-actor created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-target-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-target-reader-binding created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-evictionter-binding created
serviceaccount/vpa-admission-controller created
clusterrole.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrole.rbac.authorization.k8s.io/system:vpa-status-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-status-reader-binding created
serviceaccount/vpa-updater created
deployment.apps/vpa-updater created
serviceaccount/vpa-recommender created
deployment.apps/vpa-recommender created
Generating certs for the VPA Admission Controller in /tmp/vpa-certs.
Generating RSA private key, 2048 bit long modulus (2 primes)
........................+++++
.................................+++++
e is 65537 (0x010001)
Generating RSA private key, 2048 bit long modulus (2 primes)
.....................................................+++++
..........+++++
e is 65537 (0x010001)
Signature ok
subject=CN = vpa-webhook.kube-system.svc
Getting CA Private Key
Uploading certs to the cluster.
secret/vpa-tls-certs created
Deleting /tmp/vpa-certs.
deployment.apps/vpa-admission-controller created
service/vpa-webhook created
Nice! Now that we have the VPA installed, we can create a few custom CRDs to manage the deployments that we want the VPA to automatically scale. Let's look at an example manifest titled service-deployment-vpa.yaml
:
apiVersion: autoscaling.k8s.io/v1beta1
kind: VerticalPodAutoscaler
metadata:
name: service-deployment-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: service-deployment
updatePolicy:
updateMode: "Off"
A couple of things to point out here. The first is the name
tag in the targetRef
section. This should match exactly to the name of the deployment that you want to watch. Second is the updateMode
in the updatePolicy
section which we have set to off
. This tells the VPA to run in the off
mode and not actually update the pod's resource values but still provide the recommendations.
Let's create this resource by applying it with:
kubectl apply -f service-deployment-vpa.yaml`
We can let our pod run for a few minutes and then check back in to see the recommended resources with the following command:
kubectl describe vpa service-deployment-vpa
This will print out the description of the service-deployment-vpa
resource we just created and in the recommendations
section we should something that looks like this:
recommendation:
containerRecommendations:
- containerName: service-deployment
lowerBound:
cpu: 50m
memory: 1100k
target:
cpu: 60m
memory: 2000k
upperBound:
cpu: 872m
memory: 6000k
We can now see the resource recommendations from the VPA working. And that's it! For every resource that you want VPA to monitor you'll just have to create a VPA CRD like we did above.
Wrapping up
In this blog, we've talked about what Vertical Pod Autoscaler is and how it can help teams automatically scale their pods. We covered a few limitations with the VPA
Until then!