Menu

Archive for the 'Performance' Category

Erlang 24 Support Roadmap

March 23, 2021 by Michael Klishin

TL;DR

Read More...

Cluster Sizing Case Study – Quorum Queues Part 2

June 22, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using quorum queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 7 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $54
  2. Cluster: 9 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $69
  3. Cluster: 5 nodes, 8 vCPUs (c5.2xlarge), st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $98
  5. Cluster: 7 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $107

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3
  • x-max-in-memory-length=0

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read More...

Cluster Sizing Case Study – Quorum Queues Part 1

June 21, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, the tests, and the cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with quorum queues. We also ran a sizing analysis on mirrored queues.

In this post we’ll run the increasing intensity tests that will measure our candidate cluster sizes at varying publish rates, under ideal conditions. In the next post we’ll run resiliency tests that measure whether our clusters can handle our target peak load under adverse conditions.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3 (replication factor)
  • x-max-in-memory-length=0

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read More...

Cluster Sizing Case Study – Mirrored Queues Part 2

June 20, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using mirrored queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 5 nodes, 8 vCPUs, gp2 SDD. Cost: $58
  2. Cluster: 7 nodes, 8 vCPUs, gp2 SDD. Cost: $81
  3. Cluster: 5 nodes, 8 vCPUs, st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs, gp2 SDD. Cost: $98
  5. Cluster: 9 nodes, 8 vCPUs, gp2 SDD. Cost: $104

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

Read More...

Cluster Sizing Case Study - Mirrored Queues Part 1

June 19, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with mirrored queues.

The first phase of our sizing analysis will be assessing what intensities each of our clusters and storage volumes can handle easily and which are too much.

All tests use the following policy:

  • ha-mode: exactly
  • ha-params: 2
  • ha-sync-mode: manual
Read More...

Cluster Sizing and Other Considerations

June 18, 2020 by Jack Vanlightly

This is the start of a short series where we look at sizing your RabbitMQ clusters. The actual sizing wholly depends on your hardware and workload, so rather than tell you how many CPUs and how much RAM you should provision, we’ll create some general guidelines and use a case study to show what things you should consider.

Read More...

How to Run Benchmarks

June 4, 2020 by Jack Vanlightly

There can be many reasons to do benchmarking:

  • Sizing and capacity planning
  • Product assessment (can RabbitMQ handle my load?)
  • Discover best configuration for your workload

In this post we’ll take a look at the various options for running RabbitMQ benchmarks. But before we do, you’ll need a way to see the results and look at system metrics.

Read More...

Quorum Queues and Flow Control - Stress Tests

May 15, 2020 by Jack Vanlightly

In the last post we ran some simple benchmarks on a single queue to see what effect pipelining publisher confirms and consumer acknowledgements had on flow control. 

Specifically we looked at:

  • Publishers: Restricting the number of in-flight messages (messages sent but pending a confirm).
  • Consumers: Prefetch (the number in-flight messages the broker will allow on the channel)
  • Consumers: Ack Interval (multiple flag usage)

Unsurprisingly, we saw when we restricted publishers and the brokers to a small number of in-flight messages at a time, that throughput was low. When we increased that limit, throughput increased, but only to a point, after which we saw no more throughput gains but instead just latency increases. We also saw that allowing consumers to use the multiple flag was beneficial to throughput.

In this post we’re going to look at those same three settings, but with many clients, many queues and different amounts of load, including stress tests. We’ll see that publisher confirms and consumer acknowledgements play a role in flow control to help prevent overload of a broker. 

Read More...

Quorum Queues and Flow Control - Single Queue Benchmarks

May 14, 2020 by Jack Vanlightly

In the last post we covered what flow control is, both as a general concept and the various flow control mechanisms available in RabbitMQ. We saw that publisher confirms and consumer acknowledgements are not just data safety measures, but also play a role in flow control. 

In this post we’re going to look at how application developers can use publisher confirms and consumer acknowledgements to get a balance of safety and high performance, in the context of a single queue. 

Flow control becomes especially important when a broker is being overloaded. A single queue is unlikely to overload your broker. If you send large messages then sure, you can saturate your network, or if you only have a single CPU core, then one queue could max it out. But most of us are on 8, 16 or 30+ core machines. But it’s interesting to break down the effects of confirms and acks on a single queue. From there we can take our learnings and see if they apply to larger deployments (the next post).

Read More...

Quorum Queues and Flow Control - The Concepts

May 4, 2020 by Jack Vanlightly

As part of our quorum queue series we’re taking a look at flow control, how it protects RabbitMQ from being overloaded and how that relates to quorum queues.

What is Flow Control?

Flow control is a concept that has been in computer networking and networked software for decades. Essentially it is a mechanism for applying back pressure to senders to avoid overloading receivers. Receivers typically buffer incoming packets/messages as a way of dealing with a send rate that exceeds its processing rate. But receiver buffers cannot grow forever so either the send rate should only transiently exceed receiver processing capacity (bursty traffic) or the sender must be slowed down (back pressure).

Flow control is a way of applying this back pressure on the sender, slowing them down so that the receiver’s buffers do not overflow and latencies do not grow too large. In a chain of sender/receivers, this back pressure can propagate up the chain to the origin of the traffic. In more complex graphs of connected components, flow control can balance incoming traffic between fast and slow senders, avoiding overload but allowing the system to reach full utilisation despite different numbers of senders, different rates and different load patterns (steady or bursty).

Read More...

Quorum queues and why disks matter

April 21, 2020 by Jack Vanlightly

Quorum queues are still relatively new to RabbitMQ and many people have still not made the jump from classic mirrored queues. Before you migrate to this new queue type you need to make sure that your hardware can support your workload and a big factor in that is what storage drives you use.

In this blog post we’re going to take a closer look at quorum queues and their performance characteristics on different storage configurations.

HDD or SSD? One drive or multiple drives?

The TL;DR is that we highly recommend SSDs when using quorum queues. The reason for this is that quorum queues are sensitive to IO latency and SSDs deliver lower latency IO than HDDs. With higher IO latency, you’ll see lower throughput, higher end-to-end latency and some other undesirable effects.

Further down in this post we’ll demonstrate why we recommend this, using various benchmarks with different SSD and HDD configurations.

Read More...

RabbitMQ Java Client Metrics with Micrometer and Datadog

April 10, 2018 by Arnaud Cogoluègnes

In this post we’ll cover how the RabbitMQ Java client library gathers runtime metrics and sends them to monitoring systems like JMX and Datadog.

Read More...

New Reactive Client for RabbitMQ HTTP API

October 18, 2017 by Arnaud Cogoluègnes

The RabbitMQ team is happy to announce the release of version 2.0 of HOP, RabbitMQ HTTP API client for Java and other JVM languages. This new release introduce a new reactive client based on Spring Framework 5.0 WebFlux.

Read More...

Metrics support in RabbitMQ Java Client 4.0

November 30, 2016 by Arnaud Cogoluègnes

Version 4.0 of the RabbitMQ Java Client brings support for runtime metrics. This can be especially useful to know how a client application is behaving. Let’s see how to enable metrics collection and how to monitor those metrics on JMX or even inside a Spring Boot application.

Read More...

New Credit Flow Settings on RabbitMQ 3.5.5

October 6, 2015 by Alvaro Videla

In order to prevent fast publishers from overflowing the broker with more messages than it can handle at any particular moment, RabbitMQ implements an internal mechanism called credit flow that will be used by the various systems inside RabbitMQ to throttle down publishers, while allowing the message consumers to catch up. In this blog post we are going to see how credit flow works, and what we can do to tune its configuration for an optimal behaviour.

Read More...

Finding bottlenecks with RabbitMQ 3.3

April 14, 2014 by Simon MacMullen

One of the goals for RabbitMQ 3.3 was that you should be able to find bottlenecks in running systems more easily. Older versions of RabbitMQ let you see that you were rate-limited but didn’t easily let you see why. In this blog post we’ll talk through some of the new performance indicators in version 3.3.

Read More...

Consumer Bias in RabbitMQ 3.3

April 10, 2014 by Simon MacMullen

I warn you before we start: this is another wordy blog post about performance-ish changes in RabbitMQ 3.3. Still with us? Good.

So in the previous post I mentioned “a new feature which I’ll talk about in a future blog post”. That feature is consumer bias.

Read More...

An end to synchrony: performance improvements in 3.3

April 3, 2014 by Simon MacMullen

Well, we got the bad news out of the way yesterday, so today let’s talk about (some of) the good news: some types of publishing and consuming are now a great deal faster, especially in clusters.

Read More...

RabbitMQ Performance Measurements, part 2

April 25, 2012 by Simon MacMullen

Welcome back! Last time we talked about flow control and latency; today let’s talk about how different features affect the performance we see. Here are some simple scenarios. As before, they’re all variations on the theme of one publisher and one consumer publishing as fast as they can.

Read More...

RabbitMQ Performance Measurements, part 1

April 16, 2012 by Simon MacMullen

So today I would like to talk about some aspects of RabbitMQ’s performance. There are a huge number of variables that feed into the overall level of performance you can get from a RabbitMQ server, and today we’re going to try tweaking some of them and seeing what we can see.

Read More...