How to Run Benchmarks

June 4, 2020

There can be many reasons to do benchmarking:

In this post we’ll take a look at the various options for running RabbitMQ benchmarks. But before we do, you’ll need a way to see the results and look at system metrics.

RabbitMQ Observability

You have routed X number of messages per second through your RabbitMQ cluster and concluded that you have reached peak throughput, but have you considered:

You have to be able to see more than just a throughput number. 

Since 3.8.0, we have made the rabbitmq_prometheus plugin available. See how to get Grafana, Prometheus and RabbitMQ to work together and get amazing insight into your RabbitMQ instances.

See our published Grafana dashboards for insight into not only the queue counts, connection counts, message rates etc, but also insight into what is going on under the hood from the Erlang perspective. There are also countless system metrics dashboards and agents you can install to see the system metrics such as CPU, RAM, network and disk IO. For example, check out the node_exporter which gives insights into hardware and OS behaviour.

Another reason to use a solution like Prometheus is that when you push RabbitMQ to its limit, the management UI can become sluggish or unresponsive. The UI is trying to operate on a machine that might already be close to 100% CPU utilisation.

Some Benchmarking Do’s and Don’ts


If you run a benchmark both in an isolated environment and in your main IT infrastructure, it can help you isolate and optimise sub-optimal areas of your prod/qa environment.


Some Common Impacts on Performance

Below are some things you can expect as you vary different aspects of a benchmark.

One common pattern is that once you get past a few tens of queues and/or clients, total throughput will drop. The more connections and connection there are, the more context switching there is and the less efficient things become. There are only a limited number of CPU cores. If you have thousands of queues and clients then that is not a bad thing, but realise that you may not get the same total throughput compared to when you have tens or hundreds of clients/queues.

Option #1 - Use Your Existing Applications

If you need to benchmark for capacity planning or to find the best configuration, then using your existing applications is most likely to yield the most useful results.

The trouble with synthetic benchmarks is that they tell you how your RabbitMQ installation will cope with loads generated by the chosen load generator, which may be quite different to your real usage.

The trouble with using your real applications is that generating load may take some work to set up. 

Secondly, unless you already have it instrumented, you won’t get end-to-end latency metrics. Of course you can add that. You could add a timestamp to a message header and extract that header in the consumer and publish the metric. Most languages have libraries for emitting metrics efficiently, without the need to hand-roll anything (for example Also take into account that without clock synchronisation like NTP, the end-to-end latency metrics will not be accurate, and even then, there may be jitter.

Option #2 - Perf Test

PerfTest is our recommended tool for doing synthetic benchmarking of simple workloads with RabbitMQ. PerfTest is on GitHub and has some nice instructions. To run it yourself, please follow the installation instructions

It even has its own Grafana dashboard

Option #3 - Perf Test + CloudFoundry

See our workloads project on GitHub. It will show you have to deploy and test for various workloads on CloudFoundry.

Option #4 - RabbitTestTool

This is an experimental tool that I use (and built) personally to do automated exploratory testing. It is a powerful but complex tool and probably not your ideal choice for that reason. It’s more of a QA tool than for customers to benchmark their own setups.

But it has some features that might interest you.

Firstly it has a model driven, property based test mode that detects data loss, ordering violations (without redelivered flag), duplicate delivery (without redelivered flag) and availability. It is the data loss and availability detection that might interest you. Duplicates and ordering are only useful in our alpha and pre-alpha builds that might have bugs in new features.

You can use this tool to practice a blue/green deployment or a rolling upgrade, to ensure that you can perform the operation without data loss and availability.

It also has highly customisable EC2 deployment and benchmark orchestration. It is possible to set up many side-by-side benchmarks on different AWS hardware, RabbitMQ versions and configurations. But, again, this is so configurable that it is also complex.

Option #5 - RabbitMQ Benchmark X-Project

Ok so the name is a work-in-progress, but it represents a new project which we aim to benefit our own team but also the wider community, it will be the one benchmark project to rule them all.

The plan is to use a benchmark tool like PerfTest plus the Kubernetes API to combine both orchestration (the brokers, the disks etc) and the benchmark tool itself. Orchestration (deployment of RabbitMQ, load gen, observability) is often the most onerous part of benchmarking so we hope this project will solve that once and for all - not just for us but for everyone.


Benchmarking can be hard to get right, but if done correctly it can provide valuable information. The key is to use as much of the observability tooling as possible and to try and model your actual workloads as closely as possible. It is hugely important to understand the usage of publisher confirms and consumer acknowledgements, their role in flow control and the impact they have on performance.

In subsequent blog posts that cover performance I’ll be including the PerfTest arguments that you’ll need to recreate the load generation side of things.

Written by: Jack Vanlightly

Categories: Performance