Menu

Archive for year 2020

RabbitMQ Kubernetes Operator reaches 1.0

November 17, 2020 by Yaron Parasol

We are pleased to announce that the RabbitMQ Operator for Kubernetes is now generally available. The RabbitMQ Operator makes it easy to provision and manage RabbitMQ clusters consistently on any certified Kubernetes distribution. Operators inform the Kubernetes container orchestration system how to provision and control specific applications. The Kubernetes (hereafter K8s) Operator pattern is a way to extend the K8s API and state management to include the provisioning and management of custom resources – resources not provided in a default K8s deployment. In this post, we’ll discuss how the Operator enables the K8s system to control a RabbitMQ cluster.

Read More...

This Month in RabbitMQ, Aug/Sep 2020 Recap

November 6, 2020 by Michael Klishin

This month in RabbitMQ features a blog from Michael Klishin on deploying RabbitMQ on Kubernetes. Also this month: RabbitMQ consumers on AWS, a three-part series on developing microservices with Lumen and RabbitMQ, and several articles on RabbitMQ and ASP.NET Core.

Read More...

This Month in RabbitMQ, July 2020 Recap

August 31, 2020 by Michael Klishin

It’s not the holidays yet, but the RabbitMQ community has presents for you anyway! The RabbitMQ Kubernetes cluster operator is now open-sourced and developed in the open in GitHub. Also, Gavin Roy has a new Python app that migrates queues between types. Finally, a webinar on RabbitMQ consumers from Ayanda Dube, Head of RabbitMQ Engineering at Erlang Solutions.

Read More...

Deploying RabbitMQ to Kubernetes: What's Involved?

August 10, 2020 by Michael Klishin

Over time, we have seen the number of Kubernetes-related queries on our community mailing list and Slack channels soar. In this post we’d like to explain the basics of a DIY deployment of RabbitMQ on Kubernetes: what Kubernetes resources will be necessary, how to make sure RabbitMQ nodes use durable storage, how to approach configuration of sensitive values, and so on.

Read More...

This Month in Rabbitmq June 2020 Recap

July 30, 2020 by Michael Klishin

This month in RabbitMQ features the release of the RabbitMQ Cluster Kubernetes Operator, benchmarks and cluster sizing case studies by Jack Vanlightly (@vanlightly), and a write up of RabbitMQ cluster migration by Tobias Schoknecht (@tobischo), plus lots of other tutorials by our vibrant community!

Read More...

Disaster Recovery and High Availability 101

July 7, 2020 by Jack Vanlightly
Be aware this post has out of date information
RabbitMQ now has Disaster Recovery capabalities in the commercial editions via the Warm Standby Replication feature

In this post I am going to cover perhaps the most commonly asked question I have received regarding RabbitMQ in the enterprise.

How can I make RabbitMQ highly available and what architectures/practices are recommended for disaster recovery?

RabbitMQ offers features to support high availability and disaster recovery but before we dive straight in I’d like to prepare the ground a little. First I want to go over Business Continuity Planning and frame our requirements in those terms. From there we need to set some expectations about what is possible. There are fundamental laws such as the speed of light and the CAP theorem which both have serious impacts on what kind of DR/HA solution we decide to go with.

Finally we’ll look at the RabbitMQ features available to us and their pros/cons.

Read More...

This Month in RabbitMQ, May 2020 Recap

June 30, 2020 by Michael Klishin

This month, Jack Vanlightly continues his blog series on Quorum Queues in RabbitMQ. Also, be sure to watch the replay of his related webinar.

Finally, Episode 5 of TGI RabbitMQ is out – Gerhard Lazu walks us through how to run RabbitMQ on Kubernetes. Don’t miss!

Read More...

How quorum queues deliver locally while still offering ordering guarantees

June 23, 2020 by Jack Vanlightly

The team was recently asked about whether and how quorum queues can offer the same message ordering guarantees as classic queues given that they will deliver messages from a local queue replica (leader or follower) when possible. Mirrored queues always deliver from the master (the leader), so delivering from any queue replica sounds like it could impact those guarantees. 

That is the subject of this post. Be warned, this post is a technical deep dive for the curious and the distributed systems enthusiast. We’ll take a look at how quorum queues can deliver messages from any queue replica, leader or follower, without additional coordination (extra to Raft) but maintaining message ordering guarantees.

Read More...

Cluster Sizing Case Study – Quorum Queues Part 2

June 22, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using quorum queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 7 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $54
  2. Cluster: 9 nodes, 8 vCPUs (c5.2xlarge), gp2 SDD. Cost: $69
  3. Cluster: 5 nodes, 8 vCPUs (c5.2xlarge), st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $98
  5. Cluster: 7 nodes, 16 vCPUs (c5.4xlarge), gp2 SDD. Cost: $107

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3
  • x-max-in-memory-length=0

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read More...

Cluster Sizing Case Study – Quorum Queues Part 1

June 21, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, the tests, and the cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with quorum queues. We also ran a sizing analysis on mirrored queues.

In this post we’ll run the increasing intensity tests that will measure our candidate cluster sizes at varying publish rates, under ideal conditions. In the next post we’ll run resiliency tests that measure whether our clusters can handle our target peak load under adverse conditions.

All quorum queues are declared with the following properties:

  • x-quorum-initial-group-size=3 (replication factor)
  • x-max-in-memory-length=0

The x-max-in-memory-length property forces the quorum queue to remove message bodies from memory as soon as it is safe to do. You can set it to a longer limit, this is the most aggressive - designed to avoid large memory growth at the cost of more disk reads when consumers do not keep up. Without this property message bodies are kept in memory at all times which can place memory growth to the point of memory alarms setting off which severely impacts the publish rate - something we want to avoid in this workload case study.

Read More...

Cluster Sizing Case Study – Mirrored Queues Part 2

June 20, 2020 by Jack Vanlightly

In the last post we started a sizing analysis of our workload using mirrored queues. We focused on the happy scenario that consumers are keeping up meaning that there are no queue backlogs and all brokers in the cluster are operating normally. By running a series of benchmarks modelling our workload at different intensities we identified the top 5 cluster size and storage volume combinations in terms of cost per 1000 msg/s per month.

  1. Cluster: 5 nodes, 8 vCPUs, gp2 SDD. Cost: $58
  2. Cluster: 7 nodes, 8 vCPUs, gp2 SDD. Cost: $81
  3. Cluster: 5 nodes, 8 vCPUs, st1 HDD. Cost: $93
  4. Cluster: 5 nodes, 16 vCPUs, gp2 SDD. Cost: $98
  5. Cluster: 9 nodes, 8 vCPUs, gp2 SDD. Cost: $104

There are more tests to run to ensure these clusters can handle things like brokers failing and large backlogs accumulating during things like outages or system slowdowns.

Read More...

Cluster Sizing Case Study - Mirrored Queues Part 1

June 19, 2020 by Jack Vanlightly

In a first post in this sizing series we covered the workload, cluster and storage volume configurations on AWS ec2. In this post we’ll run a sizing analysis with mirrored queues.

The first phase of our sizing analysis will be assessing what intensities each of our clusters and storage volumes can handle easily and which are too much.

All tests use the following policy:

  • ha-mode: exactly
  • ha-params: 2
  • ha-sync-mode: manual
Read More...

Cluster Sizing and Other Considerations

June 18, 2020 by Jack Vanlightly

This is the start of a short series where we look at sizing your RabbitMQ clusters. The actual sizing wholly depends on your hardware and workload, so rather than tell you how many CPUs and how much RAM you should provision, we’ll create some general guidelines and use a case study to show what things you should consider.

Read More...

How to Run Benchmarks

June 4, 2020 by Jack Vanlightly

There can be many reasons to do benchmarking:

  • Sizing and capacity planning
  • Product assessment (can RabbitMQ handle my load?)
  • Discover best configuration for your workload

In this post we’ll take a look at the various options for running RabbitMQ benchmarks. But before we do, you’ll need a way to see the results and look at system metrics.

Read More...

This Month in RabbitMQ, April 2020 Recap

June 1, 2020 by Michael Klishin

A Webinar on Quorum Queues

Before we start with RabbitMQ project and community updates from April, we have a webinar to announce! Jack Vanlightly, a RabbitMQ core team member, will present on High Availability and Data Safety in Messaging on June 11th, 2020.

In this webinar, Jack Vanlightly will explain quorum queues, a new replicated queue type in RabbitMQ. Quorum queues were introduced in RabbitMQ 3.8 with a focus on data safety and efficient, predictable recovery from node failures. Jack will cover and contrast the design of quorum and classic mirrored queues.

After this webinar, you’ll understand:

  • Why quorum queues offer better data safety than mirrored queues
  • How and why server resource usage changes when switching to quorum queues from mirrored queues
  • Some best practices when using quorum queues
Read More...

Quorum Queues and Flow Control - Stress Tests

May 15, 2020 by Jack Vanlightly

In the last post we ran some simple benchmarks on a single queue to see what effect pipelining publisher confirms and consumer acknowledgements had on flow control. 

Specifically we looked at:

  • Publishers: Restricting the number of in-flight messages (messages sent but pending a confirm).
  • Consumers: Prefetch (the number in-flight messages the broker will allow on the channel)
  • Consumers: Ack Interval (multiple flag usage)

Unsurprisingly, we saw when we restricted publishers and the brokers to a small number of in-flight messages at a time, that throughput was low. When we increased that limit, throughput increased, but only to a point, after which we saw no more throughput gains but instead just latency increases. We also saw that allowing consumers to use the multiple flag was beneficial to throughput.

In this post we’re going to look at those same three settings, but with many clients, many queues and different amounts of load, including stress tests. We’ll see that publisher confirms and consumer acknowledgements play a role in flow control to help prevent overload of a broker. 

Read More...

Quorum Queues and Flow Control - Single Queue Benchmarks

May 14, 2020 by Jack Vanlightly

In the last post we covered what flow control is, both as a general concept and the various flow control mechanisms available in RabbitMQ. We saw that publisher confirms and consumer acknowledgements are not just data safety measures, but also play a role in flow control. 

In this post we’re going to look at how application developers can use publisher confirms and consumer acknowledgements to get a balance of safety and high performance, in the context of a single queue. 

Flow control becomes especially important when a broker is being overloaded. A single queue is unlikely to overload your broker. If you send large messages then sure, you can saturate your network, or if you only have a single CPU core, then one queue could max it out. But most of us are on 8, 16 or 30+ core machines. But it’s interesting to break down the effects of confirms and acks on a single queue. From there we can take our learnings and see if they apply to larger deployments (the next post).

Read More...

Quorum Queues and Flow Control - The Concepts

May 4, 2020 by Jack Vanlightly

As part of our quorum queue series we’re taking a look at flow control, how it protects RabbitMQ from being overloaded and how that relates to quorum queues.

What is Flow Control?

Flow control is a concept that has been in computer networking and networked software for decades. Essentially it is a mechanism for applying back pressure to senders to avoid overloading receivers. Receivers typically buffer incoming packets/messages as a way of dealing with a send rate that exceeds its processing rate. But receiver buffers cannot grow forever so either the send rate should only transiently exceed receiver processing capacity (bursty traffic) or the sender must be slowed down (back pressure).

Flow control is a way of applying this back pressure on the sender, slowing them down so that the receiver’s buffers do not overflow and latencies do not grow too large. In a chain of sender/receivers, this back pressure can propagate up the chain to the origin of the traffic. In more complex graphs of connected components, flow control can balance incoming traffic between fast and slow senders, avoiding overload but allowing the system to reach full utilisation despite different numbers of senders, different rates and different load patterns (steady or bursty).

Read More...

Quorum queues and why disks matter

April 21, 2020 by Jack Vanlightly

Quorum queues are still relatively new to RabbitMQ and many people have still not made the jump from classic mirrored queues. Before you migrate to this new queue type you need to make sure that your hardware can support your workload and a big factor in that is what storage drives you use.

In this blog post we’re going to take a closer look at quorum queues and their performance characteristics on different storage configurations.

HDD or SSD? One drive or multiple drives?

The TL;DR is that we highly recommend SSDs when using quorum queues. The reason for this is that quorum queues are sensitive to IO latency and SSDs deliver lower latency IO than HDDs. With higher IO latency, you’ll see lower throughput, higher end-to-end latency and some other undesirable effects.

Further down in this post we’ll demonstrate why we recommend this, using various benchmarks with different SSD and HDD configurations.

Read More...

RabbitMQ Gets an HA Upgrade

April 20, 2020 by Jack Vanlightly

This is the first part of a series on quorum queues, our new replicated queue type. We’ll be covering everything from what quorum queues are, to hardware requirements, migration from mirrored queues and best practices.

Introducing Quorum Queues

Mirrored queues, also known as HA queues have been the de facto option for years when requiring extra data safety guarantees for your messages. Quorum queues are the next generation of replicated queue that aim to replace most use cases for mirrored queues and are available from the 3.8 release and onward.

In this blog series we’re going to cover the following:

Read More...

This Month in RabbitMQ: March 2020 Recap

April 13, 2020 by Michael Klishin

Due to the uncertainties of the COVID-19 virus, the RabbitMQ Summit team is canceling the Berlin Summit in June 2020. We do still hope that we can proceed with the plans for a summit in November in New York. Check back for updates.

Among other contributions this month, we have resources on using RabbitMQ successfully in a microservices architecture, why you should use messaging in your project with Rabbit and SpringBoot, and many other tips and tricks. So dive in, the water’s fine! And please stay safe, everyone.

Read More...

This Month in RabbitMQ, February 2020 Recap

March 10, 2020 by Michael Klishin

This Month in RabbitMQ — February 2020 Recap!

RabbitMQ Summit is coming again! This time, the gathering will be in Berlin on June 9 and the call for proposals (to speak at the event) is open until March 22.

Mark your calendars, brush up on your Deutsch, and buy your tickets for the next chance to immerse yourself in all things RabbitMQ. I’m sure there will be at least a couple of RabbitMQ influencers there, too :)

Read More...

This Month in RabbitMQ, January 2020 Recap

February 12, 2020 by Michael Klishin

This Month in RabbitMQ, January 2020 Recap

Introducing TGI RabbitMQ! Inspired by TGI Kubernetes, RabbitMQ engineer, Gerhard Lazu has begun a series of tutorial videos. Tune in at the end of each month for the latest release. In January, Gerhard covered upgrading from 3.7 to 3.8. Star and watch the repository for future episode updates.

Also, be sure to check out the dashboards we’ve published to Grafana. These are a great way to get started with the new Prometheus and Grafana support in 3.8.

Read More...

This Month in RabbitMQ, December 2019 Recap

January 9, 2020 by Michael Klishin

This Month in RabbitMQ — December Recap!

Happy new year! 3.8.x has been available for over three months now and we’re seeing a lot of great uptake. This is good news, since the upgrade process is even easier with the addition of feature flags. Keep up the upgrading!

Over at the CloudAMQP blog, you’ll now find videos transcripts of all the RabbitMQ Summit talks. Those are useful if you didn’t make it to the event and want to know what’s in the talk before watching the full 30 minute replay.

Take a look at Observe and Understand RabbitMQ, for example.

We also published a new case study about LAIKA, the animation company that brought you Coraline, The BoxTrolls, and Missing Link. If you are interested in having your use case for RabbitMQ profiled on rabbitmq.com, drop a note in the mailing list or email info@rabbitmq.com.

Read More...