Building an End to End load test automation system on top of Kubernetes

Learn how we built an end-to-end load test automation system to make load tests a routine task.

photo of Amila Kumaranayaka
Amila Kumaranayaka

Senior Software Engineer

photo of Carsten Timpert
Carsten Timpert

Software Engineer

Posted on Mar 02, 2021


At Zalando we continuously invent new ways for customers to interact with fashion. In order to provide an excellent customer experience, we must ensure our systems can technically handle high traffic events such as Cyber Week or other sales campaigns. We have published a detailed article on how Zalando prepares for the Cyberweek. Checkout and payments related systems are particularly important during sales events. As we continuously evolve our systems and add new features to optimize the customer experience, it is cumbersome and expensive to manually test our systems capability to handle high traffic.

Our department is responsible for payments processing systems of Zalando, these systems must maintain high availability and reliability. We set out to build an automated end-to-end load testing system capable of simulating real user behaviour across the whole system composed of microservices in order to achieve high stability in our systems. This testing system automatically steers generated traffic based on a dynamically adjusted orders per minute configuration. In order to really push our services to the edge, we wanted to run the load testing system in our test cluster, as this enables us to break things when necessary without causing customer impact. These tests can then be conveniently managed and triggered by our team and serve as the first quality gate of the Payment system. As part of the Cyber Week preparation, we formed a dedicated project team tasked with making our vision come to life.

To summarize, we wanted to build a load testing tool with the following features:

  • Automatic load test execution based on a schedule.
  • Simple API through which developers can manually trigger a load test.
  • Load test tool to be ran in our test environment, that scales our Kubernetes services and Amazon ECS1(Elastic Container Service) environment up to our production configuration and then execute load tests.
  • Automated alarms if a load test causes SLO (Service Level Objective) breaches.
  • The generated load test traffic must imitate our customer's checkout flow.

The diagram below illustrates how the testing system (NodePool A) and our Payment platform (NodePool B and ECS) is deployed: Load Test Flow

Traffic generation

Our first step was to select a load testing framework. We considered multiple options such as Locust, Vegeta and JMeter. This was filtered down to Locust and Vegeta due to JMeter not being popular internally. We chose Locust as it was more popular within our development teams, thus the test suite would be easier to maintain. We have also blogged before on how we leveraged Locust in prior preparations for sales events.

Locust works both in standalone and distributed mode. It operates a controller with multiple workers in distributed mode. In order to generate higher loads a distributed setup is required to overcome resource limitations. We created locust scripts covering multiple business processes mimicking real world traffic patterns to our services. These scripts were then packaged as a docker container and deployed as a distributed locust system.

Mock External Dependencies

When we defined the scope of the load tests we all agreed we would only focus on testing internal service components and did not want to involve external dependencies for routine tests. Therefore we decided to mock these dependencies.

The table below compares a variety of tools that can be used to implement mocks.

Github star/fork1289/1733453/9342280/6161402/631468/131
Config (API, route, ...)Json configJsonJs configJs configJson
Latency simulationFixedFixed / RandomFixedFixedFixed / Random
Fault simulationYesYesYesYesYes
Stateful behaviourNoState machineNoNokey-value map
Easy to extendNoYesYesNoYes
Response templatingYesYesNoYesYes
Request matchingYesYesYesNoYes
Record & ReplayNoNoYesNoYes

After evaluating multiple options we settled on using Hoverfly as the mocking solution. Hoverfly provides the ability to easily set up mocks with static or dynamic responses. Mocks were created and deployed for multiple external dependencies. Furthermore, we wanted to run the load tests against services that could at the same time be used for other tests. This meant that the service needed to dynamically switch the dependency between the real service and its mock. For this, we leveraged header-based routing using Skipper, so a service can decide whether to use mocks or actual dependent service by examining if the request belongs to a load test or not.

Hoverfly example mocking a service with PATCH endpoint:

    "data": {
        "pairs": [
                "request": {
                    "path": [{
                        "matcher": "exact",
                        "value": "/test"
                    "method": [{
                        "matcher": "exact",
                        "value": "PATCH"
                "response": {
                    "status": 204,
                    "body": "",
                    "encodedBody": false,
                    "headers": {
                        "Date": [
                            "{{ currentDateTime 'Mon, 02 Jan 2020 15:04:05 GMT' }}"
                        "Load-Test": [
                    "templated": true
        "globalActions": {
            "delays": []
    "meta": {
        "schemaVersion": "v5",
        "hoverflyVersion": "v1.1.2",
        "timeExported": "2020-01-07T13:21:02+02:00"

To start Hoverfly using this configuration, one can simply run:

hoverfly -webserver -import simulation.json

Load Test Conductor

In order to meet our goal of running automated load tests in the test cluster, we needed to design a system that could manage the full lifecycle of a load test and ensure the cluster and deployed applications match our production configuration. So applications in load test environment is updated to match resource allocation, number of instances and application version of the production environment.

Load test lifecycle

We defined the lifecycle of one load test as follows:

  1. Deploy all applications in the test environment to be the same version as production.
  2. Scale up the applications in the test environment to meet the resource configuration of the production environment.
  3. Generate load test traffic that replicates real customer behaviour.
  4. Scale down applications in the test environment after the test as a cost saving measure.
  5. Clean up databases and remove unnecessary test data.

For this purpose, we built a microservice in Golang called the load-test-conductor that executes and manages these load test phases and transitions. Our service design was heavily influenced by what Kubernetes popularized for infrastructure management. We wanted our system to be a declarative system. Therefore, the service provides a simple API that can be used by engineers to run load tests by defining the desired state of load test. Executing a load test is now just one API call away!

On the diagram below, you can find the system components of the Load Test Conductor: Conductor Components

Deployment and Scaling

To ensure that the exact version of the service running in production is deployed and services are pre-scaled, we automated deployment and scaling of the application within the Load Test Conductor. We use our Continuous Delivery Platform (CDP) to find the version deployed in production using the Kubernetes client and trigger a new deployment of this exact version in our staging environment. Applications which need to be included in a load test can be provided as an environment-specific configuration. The Deployer component will trigger a deployment and wait till all the deployments are completed. Afterwards, the Scaler component triggers scaling based on the target configuration. Our load test conductor currently supports scaling resources in Kubernetes and AWS ECS environments. It also handles scaling down to the previous state once the load test is completed or failed.

Load generation

We chose to run locust in distributed mode to mimic customer traffic. Each Locust worker executes our test scripts and interacts with our microservices in order to simulate the customer journey through our systems. We wanted to be able to test different load scenarios, so we decided to implement an algorithm in the load-test-conductor that can instrument the locust workers through the API provided by Locust. The Locust API provides the functionality to change the count and the rate at which Locust workers are spawned. We designed an algorithm that ramps up locust workers based on a business KPI (orders placed per minute). Users of the test system can define a ramp-up time, a plateau time and the target orders per minute that the test should reach. Our algorithm then hatches the locust workers based on the configured parameters and dynamically recalculates the hatch rate and locust worker count needed to reach the defined orders per minute target.

Load generation pseudo code

set initial number of users to 1
set calculation interval to 60 seconds
while load test time has not exceeded
    get locust status
    calculate orders per defined calculation interval
    calculate orders per minute
    set number of orders to value from number of orders reported by locust.

    if user count in locust status is equal to zero
        print "load test is being initialized."
        set loadtest hatch rate to one
        set loadtest user count to initial number of users
        set loadtest orders per minute to 0
        set loadtest number of orders to 0
    else if orders per minute equal to zero
        print "load test stalled due to no orders getting generated."
        set loadtest hatch rate to one
        set loadtest user count to one
        calculate total users needed to achive target orders per minute rate using
        current locust users per minute rate and orders per minute rate.
        calculate users that needs to be created.
        calculate time left for the load test.
        calculate iterations left for the load test.
        calculate users to spawn in this iteration.
        calculate hatchrate
        set loadtest hatch rate to calculated hatchrate
        set loadtest hatch rate to calculated users
    update locust with load test parameters, this triggers load generation.
    sleep for calculaton interval time.

Test Execution & Test Evaluation

To trigger the load test, we used a Kubernetes CronJob that calls the API of the load test conductor. For our Payment system, load tests take about 2 hours to complete.

To monitor the system during test execution, we leverage Grafana dashboards that provide insights into the most important metrics, for example - latency, throughput and response code rates. Through manual inspection of the graphs, we also evaluate if a load test was successful or not. Additionally, we use alerts that trigger when a service did not meet its SLO during a test.

Test results have to be manually evaluated to decide if the outcome is successful or not, which is sufficient for us for the time being.


Overall, the solution fulfilled the goal of a successful preparation and scaling of our applications. However, running load tests on the test cluster posed several challenges. Sometimes, new deployments were rolled out during tests, which caused the service to point to pods with minimal resources instead of the scaled up one. Several infrastructure components like cluster node type, databases, centrally managed event queues (Nakadi) had to be adjusted for similarity with the production environment. This required a lot of communication effort and alignment with teams managing the services.

We made the deployment of the production versions of the applications an optional feature, so that developers can test their feature branch code. The load test tool has become our standard way to verify for every developed change that the applications can handle peak production traffic.

Giving developers the possibility to run load tests by a simple API call encourages and enables them to thoroughly load test applications.

Since these load tests are conducted in a non-production environment, we could stress the services till they fail. In combination with load tests in production, this was essential for preparing our production services for higher load.

  1. ECS is only used by a small set of isolated services, all other services run on Kubernetes