At the recent Codemotion in Rome, Katie Koschland delivered an interesting talk about load testing.

She talked about her experience of finding a performance problem on the Financial Times website. Although she and her team didn’t manage to get to the root of the problem, she did learn a few key points that could help anyone in the same situation. In this article, we will try to understand what she faced and what are the outcomes. But first of all, let’s talk about load testing.

What is Load Testing and how does it work?

Load testing is a type of performance test that measures how your system behaves under a high load of users. What is important here is to collect metrics and runtime problems to understand how your system performs and what are its limits to help you do some tweaks. A more advanced type of performance test which usually runs in conjunction with a load test is stress testing, in which the aim is to break your system using predictable or unpredictable scenarios.

But how does load testing technically work? It mainly consists of testing a requirement scenario based on a URL address (this could be a website page, a REST API) that is called multiple times simulating user interactions. Indeed, two main requirements for load testing are execution time and the number of concurrent virtual users.

Today there are many frameworks available for load testing. It mainly depends on what are your necessities and how you would like to run this kind of test. Generally, load testing frameworks require some hardware infrastructure, to provide a more realistic testing environment (such as test controllers and many test agents). If you can’t afford it, you can always use a cloud service such as Microsoft Azure Cloud Load Test to provide in no time a ready-to-use infrastructure.

Back to Katie’s situation, the load testing framework used is Artillery, due to its simplicity of installation and use; we just need to install it on a machine using NPM:

npm install -g artillery

With artillery, we can execute a one-shot test, or we can use a YAML script to configure a scenario with one or more specific stages. Here’s an example of a simple script:

In this script, we are testing the website artillery.io using 20 virtual users for 60 seconds. We can run the script using:

artillery run test-site.yml

At the end of the execution, artillery will provide us with a detailed report that can help us to identify bottlenecks and, more generally, any performance issue. Lastly, you can easily integrate such load tests into your delivery pipeline thanks to CI tools flexibility.

Collect your metrics

When you load test your application, you need to plan what scenario and amount of load you want to apply to it. Usually, you can define levels of load regarding a number of concurrent virtual users and execution time.

The right way to plan it is to define different phases in which you apply a different amount of load and understand how your system performs.

In Katie’s situation, she defined three different phases:

  1. warm-up: 10 virtual users for 60 seconds;
  2. ramp-up: from 10 to 25 virtual users for 120 seconds;
  3. cruise: 25 virtual users for 1200 seconds.

Also, she did stress testing on her system through the break phase with the following configuration:

  1. crash: 100 virtual users for 30 seconds.

As we said before, the aim of load testing is to collect metrics. Indeed, metrics are the only way to understand what is not performing in your system as well as expected. Depending on the testing framework, you can choose what metrics you want to collect and the level of detail related to them. You can even integrate your reports into your external monitoring systems such as Datadog, Grafana or Telegraph to help you to have a broader view of how your system is performing.


Katie Koschland showing load testing phases

If my system breaks, what can I do?

Load testing also helps you understand what you can do when your system breaks. You need to be prepared for situations in which your application is down and you need to bring it up as soon as possible.

For this reason, you need to have an emergency plan that helps you to:

  1. scale your application when it is under stress;
  2. fallback deploy in case that new features are not performing well.

Also, it’s a good design choice to have a courtesy page, in case your application is having runtime errors.

What are the outcomes?

Finally, let’s understand what are the outcomes. Katie didn’t succeed in understanding what the problem was in her system. But, in her case, the experience is the first step to learn and improve. She did learn how to manage these kind of situations and most of all collected a set of tools that are her everyday toolbox.