Performance
- Notes on performance testing (or load-testing)
  - Checklist
  - Common mistakes for performance-testing a project:

Performance

Notes on performance testing (or load-testing)

This page is designed to help you get a realistic and representative view of the performance of OpenFaaS whether you want to run a performance test, load-test or benchmark.

You may have started using OpenFaaS already or perhaps you have been asked to do some due diligence before starting a new project. Before proceeding please run through the project checklist to make sure your environment is properly tuned and that you are using an appropriate function template tuned for performance.

The default configuration for OpenFaaS targets a development-environment and not production which is why you should pay attention to both your method and your configuration.

Checklist

Load-testing should only be carried out with Kubernetes.

Method:

I have created a test-plan with a hypothesis and have documented my method so I can share it with the project team
I’m using a performance testing tool such as hey, jMeter, LoadRunner or Gattling
My environment is hosted in an isolated and repeatable environment
I understand the difference between a benchmark and a “DoS attack”

HA:

I have scaled the gateway service in proportion to the load with more than one replica
I have set min / max replicas and understand how auto-scaling works
I have read and applied production-environment recommendations

Project tuning:

I have extended or removed memory limits / quotas for each service and function
I have created my own function using one of the new HTTP templates and the of-watchdog (see below for a list)
I understand the difference between the original default watchdog which forks one process per request and the new of-watchdog’s HTTP mode and I am using that
I have turned off write_debug and read_debug so that the logs for the function are kept sparse
I am monitoring / collecting logs from the core services and function under test
I am monitoring the system for feedback through Prometheus and / or Grafana - i.e. throughput and 200/500 errors
I am using Kubernetes 1.13 or newer
I am not using Docker Swarm
If running on Docker Swarm I’ve verified that I am using a proper HEALTHCHECK (read up more on watchdog readme)
I am using Endpoint load-balancing or Linkerd2

A note on DNS: there are known issues with CoreDNS under high load, you should consider implementing on of the approaches described in KEP 30 NodeLocal DNS Cache

Watchdog differences

The current version of OpenFaaS templates use the original watchdog which forks processes - a bit like CGI. The newer watchdog of-watchdog is more similar to fastCGI/HTTP and should be used for any benchmarking or performance testing along with one of the newer templates.

of-watchdog templates:

Common mistakes for performance-testing a project:

Not communicating intent

Communicate your intent to the project so that we can ensure you haven’t missed anything and can share results we have obtained during our own testing. Asking arbitary questions out of context will result in a poor interaction with the community and project.

Not documenting method and environment

The method and approach should be documented including any important details such as the networking between the test machine and the cluster under test. The Linux version, Kubernetes version, the OpenFaaS component versions, the Docker version and the underlying filesystem being used. The specs of both the test cluster and the test runner including the network overlay driver being used for Kubernetes.

Using an inappropriate method

There is a difference between performance testing and Denial of Service DoS attacks (i.e. with siege). You should use tools which allow a gradual ramp-up and controlled conditions such as hey, jMeter, LoadRunner or Gattling.

See also: Lab9 - auto-scaling in OpenFaaS

Choosing an inappropriate test environment

Do not try to performance test OpenFaaS on your laptop within a VM - this carries an overhead of virtualisation and will likely cause contention.

The test environment needs to replicate the production environment you are likely to use. Take note that most AWS virtual machines are subject to CPU throttling and a credits system which will make performance testing hard and unscientific.

Poor choice of test function

There are several sample functions provided in the project, but that does not automatically qualify them for high-throughput benchmarking or load-testing. It’s important to create your own function and understand exactly what is being put into it so you can measure it effectively. You should also use an OpenFaaS of-watchdog template for this or your own microservice conforming to the required healthchecks.

Not using the of-watchdog

Only the watchdog and of-watchdog implement the correct healthcheck and shutdown signals to be compatible with Kubernetes’ startup, scaling and termination mechanisms. If you are running a container exposing port 8080 which does not use of-watchdog then the results are not representative. You will need to mirror the mechanisms in the of-watchdog or use it as a shim.

Only picking the best/worst case figure

When using a scientific method you need to carry out multiple test runs and account for caching/memory/paging of the operating system including any additional background processes that may be running. The 99th percentile figures should be used, not the best or worst case figure from arbitrary runs.

Ignoring CPU / memory limits

OpenFaaS enforces memory limits on core services. If you are going to perform a high load test you will want to extend these beyond the defaults or remove them completely.

Ignoring the effects of KeepAlive with the TCP protocol

By default TCP implements KeepAlive, if your client uses KeepAlive (which is extremely likely), then you need to enable endpoint load-balancing so that the load is spread between the function replicas, by bypassing KeepAlive for the last step of the invocation. See the checklist.