Intro
One of the most compelling reasons to use Kubernetes is that it can help developers and devops teams effortlessly scale applications to handle variable workloads and millions of requests. This typically means being able to quickly and automatically scale up and scale down your clusters and pods to handle these changes.
Kubernetes (and others in the CNCF landscape) provide add-ons that you can install into your cluster that give you control over how Kubernetes scales your applications. We call these Autoscalers. There are different kinds of autoscalers that make sense for different scenarios and workloads depending on the context of your application, so it’s important to know which autoscaler you want to use and why.
In the first part of this three part series covering Kubernetes autoscalers, we're going to take a look at how the Horizontal Pod Autoscaler works and how you can install it into your clusters.
Let’s dive in.
What are Kubernetes Autoscalers?
Kubernetes Autoscalers automatically adjust the number of running pods/nodes or the resources of a given pod/cluster based on current demand or workload metrics. Their goal: maintain performance levels across the system, even as application loads fluctuate.
This leads to a number of benefits across the platform infrastructure:
- Resource efficiency. Autoscalers drive efficient resource allocation by actively working to avoid both over-provisioning and under-utilization.
- Enhanced performance & scalability. Autoscalers ensure applications can handle varying levels of traffic and workload demands by reducing response times, improving availability, and preventing performance bottlenecks.
- Improved availability. Autoscalers contribute to application availability through a proactive scaling approach built to minimize downtime.
- Cost optimization. By avoiding upfront overprovisioning of resources, autoscalers help to cut down on your cloud bill.
- Ease of management & mental load relief. Once properly configured, autoscalers continuously monitor the workload and adjust the infrastructure accordingly, freeing up engineers, developers, and ops to focus on other critical tasks.
Types of Kubernetes Autoscalers?
There are three types of Kubernetes autoscalers. Each have unique advantages and disadvantages, depending on your specific challenges and needs:
- Horizontal Pod Autoscalers (HPA) adjust the number of replicas within an application
- Cluster Autoscalers (CA) adjusts the number of nodes in a cluster
- Vertical Pod Autoscalers (VPA) adjust the resource requests and limits of containers in the cluster
In this part, we'll be looking at the Horizontal Pod Autoscaler. In future blogs, we'll check out the Cluster Autoscaler and Vertical Pod Autoscalers.
Horizontal Pod Autoscaler
The Kubernetes HPA automatically adjusts the number of pods (or pod replicas) in a Kubernetes deployment, replica set, or replication controller. It adjusts these replicas based on workload metrics like CPU utilization, memory utilization, or custom metrics collected from the Metrics Server.
As a devops engineer or developer, you can set a metric threshold, for example 80% of CPU utilization and the HPA will automatically spin up another pod once average CPU utilization passes 80% within a given replicaSet. What does that exactly mean?
ReplicaSets tell Kubernetes how many pods should be running at any point in time within a cluster. The HPA will query the pods within a replicaSet and take the average CPU (for ex. it can be any standard or custom metric you set) utilization of those pods and if it is above the threshold that you set (80% in our case), it will spin up another pod to reduce the load. Once average CPU usage subsides, Kubernetes will (gracefully, ideally!) shut down pods to get back down to a maintainable set.
We can summarize this in the following workflow:
- HPA will collect metrics from the Metrics Server to establish performance threshold (e.g. CPU utilization, memory utilization, or any custom metric defined by the user)
- Compare observed metric values with target metric values to determine whether the system needs to scale up or down
- Interact with Kubernetes API server to determine the necessary number of replicas for the target workload based on the Deployment and ReplicaSet
- Adjust replicas by creating or terminating replicas as needed
Let's now take a look at the HPA manifest and how to configure and install it. Here is an example of a straightforward HPA yaml file.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: service-1-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: service-1
minReplicas: 2 # Minimum number of replicas
maxReplicas: 5 # Maximum number of replicas
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
A couple of things to point out here. The first is the name
field under the metadata
section. This should be the same name as your deployment (service) and tells the HPA which deployment to watch. Secondly, is the minReplicas
and maxReplicas
. These tell the HPA the minimum and maximum number of pods to have at any one point in time. In our case, the minReplicas
is 2, meaning that the HPA will always maintain at least 2 pods for the given deployment or service. The last thing to point out is the metrics
section. Here we're defining cpu
as the metric we want HPA to query for and the type
as utilization
meaning that we want HPA to watch the CPU utilization and once it passes an averageUtilization
of 70% to then spin up another pod.
Before we install the HPA let's first set up our Metrics Server. Luckily, this is pretty straightforward.
You can deploy the metrics server using the following command in kubetctl
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verify that the metrics-server
deployment is running in the kube-system
namespace.
kubectl get deployment metrics-server -n kube-system
You should see something that looks like this:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 6m
Now that we have the metrics server installed, we can apply our service-1-hpa.yaml
config file from above using the following kubectl
command:
kubectl apply -f service-1-hpa.yaml -n <namespace-of-your-service>
Verify that HPA is running with the following command:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
service-1-hpa Deployment/service-1-hpa 0%/70% 2 5 0 8s
The output of the kubectl get hpa
command tells us that HPA is running and watching service-1-hpa
, targeting a 70% average CPU utilization and is maintaining between 2 and 5 pods of the deployment. And that's it. We've successfully configured and installed the Kubernetes Horizontal Pod Autoscaler!
Autoscaling on Nucleus
Nucleus automates autoscaling for teams who want a more hands off or automated approach towards scaling their infrastructure. By default, when you deploy a Nucleus Environment, HPA already comes preinstalled and configured. All you have to do is configure the min and max replicas that you'd like Nucleus to maintain and then an average CPU utilization. Here's what it looks like in the Nucleus dashboard:
Easy as that. Nucleus takes out all of the hassle of having to implement and manage HPA across your different services.
Wrapping up
In this blog, we've talked about what Horizontal Autoscaler is and how it can help teams automatically scale their services and deployments. We've also seen how to implement without using Nucleus and with Nucleus. I
In the next part of this series, we'll take a look at Cluster Autoscaling and how teams are scaling their nodes to meet their performance demands.
Until then!