Create Horizontal Pod Autoscaler for Deployment

Create Horizontal Pod Autoscaler for Deployment

Horizontal Pod Autoscaler (HPA) is the new feature integrated in the Advanced Edition. The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment based on observed CPU utilization or Memory usage. The controller periodically adjusts the number of replicas in a deployment to match the observed average CPU utilization to the target value specified by user.

How does the HPA work

The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s HPA sync-period flag (with a default value of 15 seconds). For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted pods, and produces a ratio used to scale the number of desired replicas.

Objective

This document walks you through an example of configuring Horizontal Pod Autoscaler for the Nginx deployment.

We will create a deployment to send an infinite loop of queries to the Nginx application, see how the HPA reacts to increased CPU load. That is, simulating multiple users access the service at the same time, demonstrating its autoscaling function and the HPA Principle.

Prerequisites

You need to create a workspace and project, see the Admin Quick Start if not yet.
You need to sign in with project-regular and enter into the corresponding project.

Estimated Time

About 25 minutes.

Create HPA

Step 1: Create a Deployment

Sign in with project-regular, enter into one project (e.g. demo-namespace), then select Workload → Deployments. Click Create Deployment button.

Then fill in the basic information in the pop-up window.

Name: A concise and clear name for this deployment, which is convenient for users to browse and search, e.g. hpa-example.
Alias: Helps you better distinguish resources and supports Chinese.
Description: A brief introduction to this deployment.

Click Next when you’re done.

Step 2: Configure the HPA

Choose Horizontal Pod Autoscaling, and fill in the table as following:

Min Replicas Number: 2
Max Replicas Number: 8
CPU Request Target(%): 50, represents the percent of target CPU utilization

Step 3: Add a Container

Click on Add Container, then fill in the Pod template table with following values.

Container Name: nginx
Image: nginx

Click Save to save these settings.
Leave the Update Strategy as RollingUpdate, then click Next.
There is no need to set Volume, click Next to skip Volume Settings.
Leave the Label as app : hpa-example, then click Create. Now the Nginx deployment has been created successfully.

Create a Service

Step 1: Basic Information

Choose Network & Services → Services on the left menu, then click Create Service.

Fill in the basic information, enter hpa-example into name then click Next.

Step 2: Service Settings

Since the service is accessed within the cluster, we choose the first item (Virtual IP) for the service Settings.

In Selector blanks, click Specify Workload, select the hpa-example as the backend workload.

For Ports and Session Affinity, fill in the blanks with following hints.

Ports:
- name: Enter port
- protocol: TCP
- port: Enter 80
- target port: Enter 80
Session Affinity: Leave “None:

Then click Save when you’re done.

Virtual IP Settings

Step 3: Label Settings

Leave the Label as app : hpa-example, then click Create. Now the Nginx service has been created successfully. Next, we will create a load generator.

Create Load-generator

Step 1: Fill in the Basic Information

In the current project, select Workload → Deployments. Click Create Deployment button.

Then fill in the basic information in the pop-up window.

Name: A concise and clear name for this deployment, which is convenient for users to browse and search, e.g. load-generator.
Alias: Helps you better distinguish resources and supports Chinese.
Description: A brief introduction to this deployment.

Click Next when you’re done.

Step 2: Pod Template

Click on Edit Mode and modify the replicas to 50, then click Edit Mode to redirect to the creation table.

Click Add Container, and fill in the Pod template as following:

Container Name: busybox-container
Image: busybox

Click on the Advanced Options to expand the table.

Reference the following commands and arguments to fill in the table. Click Add command and Add argument to add the second line.

# Commands
sh
-c
# Arguments (Note: the http service address like http://{$service name}.{$project name}.svc.cluster.local)
while true; do wget -q -O- http://hpa-example.demo-namespace.svc.cluster.local; done

Click on the Save button when you’re done, then click Next.

Step 3: Label Settings

Click Next to skip the Volume Settings, leave the label as app:load-generator.

Click Create to complete creation.

So far, we have created 2 deployments (i.e. hpa-example and load-generator) and 1 service (hpa-example).

Verify HPA

Step 1: Inspect the Status

Click into hpa-example and inspect the changes, please pay attention to the HPA status and the CPU utilization, as well as the Pods monitoring graphs.

Step 2: Verify the Scaling

When all of the load-generator pods are successfully created and begins to access the hpe-example service, as shown in the following figure, the CPU utilization is significantly increased after refreshing the page, currently rising to 55%, and the desired replicas and current replicas is rising to 3.

Since the Horizontal Pod Autoscaler is working right now, the load-generator looply requests the hpa-example service to make the CPU utilization rise rapidly. After the HPA starts working, it makes the backend of the service increases fast to handle a large number of requests together, and the replicas of hpa-example continues to increase follow with the CPU utilization increases, which demonstrates the working principle of HPA.

In theory, from the CPU monitoring curve of the Pod, it can be seen that the CPU usage of the two Pods that we originally created, showing a significant upward trend. When HPA starts working, it can be found that the CPU usage has a significant decreased trend, and finally it tends to be smooth, and the CPU usage is increasing on the newly created Pod.

Note: After HPA works, the replicas of the deployment may take a few minutes to be stable, and it takes a few minutes for the Pods to recover normal after the load is removed. Because of environmental differences, the replicas in different environments may be different as well.

Stop Load

Select Workload → Deployments and delete load-generator to cease the load increasing.

Looking at the status of the hpa-example again, you can see that its CPU utilization has slowly dropped to 0% in a few minutes, and HPA has reduced its replicas to 2 (initial value), eventually returning to normal level. The trend reflected by the monitoring curve can also help us to further understand the working principle of HPA;

It enables user to inspect the monitoring status of Deloyment, see the CPU utilization trend and Network inbound/outbound, they just match with the HPA example.

Modify HPA

If you need to modify the settings of the HPA, you can click into the deployment, and click More → Horizontal Pod Autoscaler. The following page displays how to reset HPA: