Nucleus Logo


How to scale your Kubernetes Clusters Part 3: Vertical Pod Autoscaler

Evis Drenova




single cube


Welcome to part 3 of our 'How to scale your Kubernetes Clusters' blog series. If you missed any of the previous parts, you can check out part 1: horizontal pod autoscaler here and part 2: cluster autoscaler here . In part 2, we're going to be covering how to use the Vertical Pod Autoscaler to dynamically give your pods more resources as they need it instead of spinning up another pod (which is what the HPA would do).

Let's get into it.

What is the Vertical Pod Autoscaler?

The Kubernetes Vertical Pod Autoscaler (VPA) automatically adjusts the resource requests and limits of pods based on their actual resource usage. It optimizes resource allocation in kubernetes clusters by dynamically adjusting the CPU and memory resource requests of pods to match their actual resource usage patterns. This makes it much easier to scale your pods without having to guess what resources you need.

Traditionally, you need to manually set resource requests and limits for each pod which can be challenging because it's hard to know the resource usage ahead of time. If you set the resource requests too low, it can impact the performance of your application, and if they are set too high, you may be paying for resources you don't need. The VPA solves this problem by continuously monitoring the resource utilization of pods and automatically adjusting their resource requests and limits as needed.


The main difference between the VPA and HPA (horizontal pod autoscaler ) is that the VPA scales the existing pod while the HPA scales the number of pods. So the HPA creates more pods in order to handle requests while the VPA scales the existing pod to be able ot handle more requests. Here's a graphic to explain that a little further:


On the VPA side, you can see how we scale the existing pod from 2 vCPU to 4 vCPU. We're not creating new pods just scaling the existing one. On the HPA side, you can see how we're scaling from 1 pod with 2 vCPU to 2 pods, each with 2 vCPU. Both the VPA and HPA work on scaling pods just in two different ways.

One other thing to mention here is that in most cases you should not use the VPA and HPA. This is mainly because they both listen to CPU and memory metrics and using both at the same time will likely cause your pods to behave in unexpected ways. For more advanced use cases, you could use the VPA and HPA with the HPA listening to custom metrics.

How does the VPA actually work?

Now that we know how the VPA works at a high-level, let's dig into the details. The VPA is made up of three components:

  1. Recommender - monitors the current and past resource consumption and provides recommended CPU and memory request values for a container.
  2. Updater - determines which pods should be restarted based on the resource recommendation calculated by the Recommender. It will also try to evict the pod if it should be updated but it doesn't perform the actual resources update on the pod.
  3. Admission Controller - Sets the correct resource requests on new pods (that is, pods just created or recreated by their controller due to changes made by the Updater).

These three components work to identify pods that should be vertically scaled, restart those pods and then dynamically update the resource values for those pods. Additionally, there are four main modes that that VPA can run in to update values:

  1. Auto - VPA will automatically recycle the pods that should have their resource limits changed and restart them. This isn't recommended for production use since it could cause workload disruption.
  2. Initial - VPA will apply the recommended resource values it calculates to newly created pods, for example, when a new service is deployed.
  3. Off - VPA just stores the calculated resource values for reference and doesn't update the pods at all.
  4. Recreate - VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined). This mode should be used rarely, only if you need to ensure that the pods are restarted whenever the resource request changes. Otherwise, prefer the "Auto" mode which may take advantage of restart-free updates once they are available.

Let's take a look at a diagram and then a step-by-step workflow of the VPA in Auto mode.


  1. The Recommender pulls the resource utilization metrics from the metrics server (needed here similar to HPA) and provides pod resource recommendations to the VPA
  2. The Updater reads the recommendations from the Recommender and initiates pod termination
  3. The deployment sees that the pod was terminated and recreates the pod
  4. While the pod is being created, the Admission Controller gest the pod resources recommendation and injects the updated resources values into the new pod's spec

One important thing to note here is that the VPA doesn't have access to the node resources and therefore might recommend more resources to a pod than the node has which would cause Kubernetes to not scheduled the pod onto a node. You can use the cluster autoscaler here to help with this but generally we recommend setting the LimitRange to the maximum available resources to ensure that the VPA doesn't recommend more resources than are available in the node.

Installing the VPA

Let's walk through how to install the VPA into your cluster. We're going to use AWS as our cloud provider.

Similar to how we started with the HPA, the first step is to install the metrics server. Luckily, this is pretty straightforward.

You can deploy the metrics server using the following command in kubetctl

kubectl apply -f

Verify that the metrics-server deployment is running in the kube-system namespace.

kubectl get deployment metrics-server -n kube-system

You should see something that looks like this:

metrics-server   1/1     1            1           6m

Now that we have the metrics server installed, we can go ahead with the VPA installation.

Pull down the VPA source code:

git clone

Inside the Vertical-pod-autoscaler directory, run:

./hack/ created created created created created created created created created created created created
serviceaccount/vpa-admission-controller created created created created created
serviceaccount/vpa-updater created
deployment.apps/vpa-updater created
serviceaccount/vpa-recommender created
deployment.apps/vpa-recommender created
Generating certs for the VPA Admission Controller in /tmp/vpa-certs.
Generating RSA private key, 2048 bit long modulus (2 primes)
e is 65537 (0x010001)
Generating RSA private key, 2048 bit long modulus (2 primes)
e is 65537 (0x010001)
Signature ok
subject=CN = vpa-webhook.kube-system.svc
Getting CA Private Key
Uploading certs to the cluster.
secret/vpa-tls-certs created
Deleting /tmp/vpa-certs.
deployment.apps/vpa-admission-controller created
service/vpa-webhook created

Nice! Now that we have the VPA installed, we can create a few custom CRDs to manage the deployments that we want the VPA to automatically scale. Let's look at an example manifest titled service-deployment-vpa.yaml:

  kind: VerticalPodAutoscaler
    name: service-deployment-vpa
      apiVersion: "apps/v1"
      kind:       Deployment
      name:       service-deployment
      updateMode: "Off"

A couple of things to point out here. The first is the name tag in the targetRef section. This should match exactly to the name of the deployment that you want to watch. Second is the updateMode in the updatePolicy section which we have set to off. This tells the VPA to run in the off mode and not actually update the pod's resource values but still provide the recommendations.

Let's create this resource by applying it with:

kubectl apply -f service-deployment-vpa.yaml`

We can let our pod run for a few minutes and then check back in to see the recommended resources with the following command:

kubectl describe vpa service-deployment-vpa

This will print out the description of the service-deployment-vpa resource we just created and in the recommendations section we should something that looks like this:

    - containerName: service-deployment
        cpu: 50m
        memory: 1100k
        cpu: 60m
        memory: 2000k
        cpu: 872m
        memory: 6000k

We can now see the resource recommendations from the VPA working. And that's it! For every resource that you want VPA to monitor you'll just have to create a VPA CRD like we did above.

Wrapping up

In this blog, we've talked about what Vertical Pod Autoscaler is and how it can help teams automatically scale their pods. We covered a few limitations with the VPA

Until then!

Table of Contents

  • Intro
  • What is the Vertical Pod Autoscaler?
  • VPA vs HPA
  • How does the VPA actually work?
  • Installing the VPA
  • Wrapping up

Latest Articles



3 types of Zero-Downtime Deployments in Kubernetes

A guide to the 3 types of zero-downtime deployments in Kubernetes




Subscribe to new blogs from Nucleus.