Create Horizontal Pod Autoscaler for Deployment

Edit

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment based on observed CPU utilization or memory usage. The controller periodically adjusts the number of replicas in a deployment to match the observed average CPU utilization or memory usage to the target value specified by user.

How does the HPA work

The Horizontal Pod Autoscaler is implemented as a control loop with a period of default 30 seconds controlled by the controller manager HPA sync-period flag. For per-pod resource metrics like CPU, the controller fetches the metrics from the resource metrics API for each pod targeted by the Horizontal Pod Autoscaler. See Horizontal Pod Autoscaler for more details.

HPA Arch

Objective

This document walks you through an example of configuring Horizontal Pod Autoscaler for the hpa-example deployment. In addition, we will create a deployment to send an infinite loop of queries to the hpa-example application, demonstrating its autoscaling function and the HPA principle.

Estimate Time

About 25 minutes

Prerequisites

Hands-on Lab

Step 1: Create Stateless Service

1.1. Log in with project-regular account. Enter demo-project, then select Application Workloads → Services

Service List

1.2. Click Create Service and choose Stateless service, name it hpa, then click Next.

Create Service

1.3. Click Add Container Image, then input mirrorgooglecontainers/hpa-exampleand press return key. It will automatically search and load the image information, choose Use Default Ports.

Add Container

1.4. Click to save it, and click Next. Then skip Mount Volumes and Advanced Settings, and click Create. At this point, the stateless service hpa has been created successfully.

Note: At the same time, the corresponding Deployment and Service have been created in KubeSphere.

HPA Service

Step 2: Configure HPA

2.1. Choose Workloads → Deployments. Enter hpa to view its detailed page.

Deployment List

2.2. Choose More → Horizontal Pod Autoscaler.

HPA Menu

2.3. Give some sample values for HPA configuration as follows. Click OK to finish the configuration.

  • CPU Request Target(%): 50 (represents the percent of target CPU utilization)
  • Min Replicas Number: 1
  • Max Replicas Number: 10

Note: After setting HPA for Deployment, it will create a Horizontal Pod Autoscaler in Kubernetes for autoscaling.

HPA Settings

Step 3: Create Load-generator

3.1. In the current project, navigate to Workloads → Deployments. Click Create and fill in the basic information in the pop-up window, name it load-generator, click Next.

3.2. Click on the Add Container Image, enter busybox into Image edit box, and press return key.

3.3. Scroll down to Start command. Add commands and parameters as follows. These commands are used to request hpa service and create CPU load.

Run command

  1. sh,-c

Parameters

Note the http address example is like http://{$service-name}.{$project-name}.svc.cluster.local. You need to replace the following http address with the actual name of service and project.

  1. while true; do wget -q -O- http://hpa.demo-project.svc.cluster.local; done

Load Generator configuration

3.4. Click on the button when you are done, then click Next. We do not use volume in this demo, therefore click Next → Create to complete the creation.

So far, we have created two deployments, i.e. hpa and load-generator, and one service, i.e. hpa.

Deployments

Step 4: Verify HPA

View Deployment Status

Choose Workloads → Deployments, enter the deployment hpa to view detailed page. Please pay attention to the replicas, Pod status and CPU utilization, as well as the Pods monitoring graphs.

Deployment Status

View HPA Status

When the load-generator Pod works, it will continuously request hpa service. As shown from the following screenshot, the CPU utilization is significantly increased after refreshing the page. Currently it is rising to 1012%, and the desired replicas and current replicas is rising to 10/10.

HPA Status

After around two minutes, the CPU decreased to 509%, which proves the principle of HPA.

HPA Changed Status

Step 5: Verify Monitoring

5.1. Scroll down to the Pods list, and pay attention to the first Pod that we created. Generally, we can see the CPU usage of the Pod shows a significant upward trend in the monitoring graph. When HPA starts working, the CPU usage has an obvious decreased trend. Finally it tends to be smooth.

HPA Monitoring

View workloads monitoring

5.2. Switch to the Monitoring tab and select Last 30 minutes in the filter.

Detailed Monitoring

View all replicas monitoring

5.3. Click View all replicas on the right of monitoring graph to inspect all replicas monitoring graphs.

Replicas Monitoring

Step 6: Stop Load Generation

6.1. Go back to Workloads → Deployments and delete load-generator to cease the load increasing.

6.2. Inspect the status of the hpa again. You will find that its current CPU utilization has slowly dropped to 10% in a few minutes. Eventually the HPA reduces its deployment replicas to one which is the initial value. The trend in the monitoring curve can also help us to understand the working principle of HPA.

Stop Load Generator

6.3. Now, drill into the Pod detailed page from Pod list, inspect the monitoring graph and review the CPU utilization and Network inbound/outbound trends. We can find the trends match this HPA example.

HPA Result

6.4. Then drill into the container of this Pod, we can find it has the same trend as the Pod.

Pod Monitoring

Modify HPA Settings

If you need to modify the settings of the HPA, you can go to the deployment detailed page, and click More → Horizontal Pod Autoscaler, edit the pop-up window at your will.

Cancel HPA

If you do not need HPA for deployment, you can click ··· → Cancel.

Cancel HPA

Congratulation! You have been familiar with how to set HPA for deployment through KubeSphere console.