Skip to main content

RabbitMQ 4.0: New Quorum Queue Features

· 12 min read

RabbitMQ 4.0 (currently in beta) includes new quorum queue features:

  • message priorities
  • consumer priorities combined with Single Active Consumer
  • default delivery limit is now 20 (breaking change!)
  • faster recovery of long queues

Message Priorities

Support for message priorities has been probably the most demanded quorum queue feature, mostly requested by existing classic mirrored queue users who wanted to migrate to quorum queues (remember, support for classic queue mirroring was removed for 4.0).

However, the way priorities are supported is significantly different compared to how classic queues handle them. Classic queues require x-max-priority argument to define the maximum number of priorities for a given queue (if this argument is not provided, the queue will treat all messages equally). Values from 0 to 255 are technically allowed, although no more than a handful of priorities per queue should really be used. Quorum queues don't require any upfront declaration (no need to enable priorities for a given queue), but there are exactly two priorities per queue: normal and high. The behavior matches that of AMQP 1.0 specification (see chapter 3.2.1 of the AMQP 1.0 specification):

  • a priority value between 0 and 4 (inclusive) is treated as the normal priority
  • any value above 4 is considered a high priority
  • if the publisher doesn't specify the priority of a message, the value of 4 is assumed (normal priority)

If a quorum queue contains both normal and high priority messages, consumers will receive a mix of both, with a ratio of 2 high priority messages for every 1 normal priority message. This approach avoids starvation, since regardless of the number of high priority messages, a progress is made on processing the normal priority messages as well. This is in contrast with the classic queue implementation, which will always deliver higher priority messages first, if there are any, and therefore the normal priority messages may never get delivered (or, more likely, their delivery latency will be very high).

Here's a visual representation of how this works. In preparation for this test, we first published 100k normal priority messages and then 100k high priority messages. Since quorum queues were not priority-aware until 4.0, if we did that in an older version and then start a consumer, it would simply receive the normal priority messages first (as they are older) and then all the high priority messages. With 4.0, we can see that the consumer immediately starts receiving a mix of about 1500 normal priority messages per second and twice as many high priority messages, for a total of ~4500 messages per second (the actual delivery rates are not important here, they depend on many factors; the 2:1 high/normal priority ratio is what matters in the context of priorities). Once the queue delivers all the high priority messages, the consumer starts receiving ~4500 normal priority messages per second - as many as it can handle in this test scenario. The blue dotted line (with the axis scale on the right) is the number of ready messages in the queue (total for both priorities) - we can see it starts at 200k and ultimately drops to zero.

Consumer receives a mix of high (yellow) and normal priority (green) messages
Consumer receives a mix of high (yellow) and normal priority (green) messages

Let's consider the opposite scenario - what if we publish all the high priority messages first and only then all the normal priority messages? In this case, the consumer will receive the messages in order of publishing. There's simply no reason for a normal priority message to overtake a higher priority message.

Normal priority messages (green) don't overtake high priority (yellow) messages published first
Normal priority messages (green) don't overtake high priority (yellow) messages published first

How Was This Test Performed?

For this test we used omq, a testing client for AMQP 1.0, MQTT and STOMP. The quorum queue behaviour doesn't depend on the protocol used - an AMQP 1.0 was simply used because omq emits message consumption metrics by message priority.

# declare a quorum queue (you can use the Management UI or any other method)
rabbitmqadmin declare queue name=qq queue_type=quorum
# publish normal priority messages (10 publishers, 10k messages each)
omq amqp --publishers 10 --consumers 0 --publish-to /queues/qq --message-priority 1 --pmessages 10000
# publish high priority messages
omq amqp --publishers 10 --consumers 0 --publish-to /queues/qq --message-priority 10 --pmessages 10000
# consume all messages from the queue
omq amqp --publishers 0 --consumers 1 --consume-from /queues/qq --consumer-credits 100

For the second scenario, just run the publishing commands in the reverse order.

What If I Need More Control?

If two priorities with a 2:1 delivery ratio doesn't meet your requirements, we can recommend two things:

  1. Reconsider your requirements. 😄 Reasoning about the message delivery order with many priorities is really hard. It's probably easier to make sure that all your messages are delivered sufficiently quickly and use the priorities just to make sure that in case of an occasional long backlog, important messages can skip the queue.
  2. If you really need more priorities and/or more control over how different priorities are handled, using multiple queues is your best bet. You can develop a consumer that subscribes to multiple queues and then decides which queue to consume from.

Consumer Priorities Combined with Single Active Consumer

Starting with RabbitMQ 4.0, quorum queues will take the consumer priority into account when selecting the single active consumer. If a higher priority becomes available (subscribes), a quorum queue will switch over to it. This is particularly useful if you have multiple queues that should have a single consumer each, but you don't want a single instance of your application to be the consumer for all of them, which is likely to happen when the first application instance that starts, subscribes to all those single-active-consumer queues. Now you can pick a different priority when subscribing to different queues, to make sure that each instance consumes only from its "favorite" queue and only serves as a backup consumer for other queues.

To better explain this functionality, let's review all the moving parts. A Single Active Consumer is a queue argument which prevents a queue from delivering messages to more than one consumer, regardless of how many subscribed to the queue. One consumer is active, all other consumers are not. If the active consumer disconnects, one of the other consumers is activated. This feature is used if a strict message processing order needs to be maintained.

Consumer Priority allows you to specify that rather than delivering messages to all subscribed consumers in a fair round-robin fashion (which is the default behavior of both classic and quorum queues), a certain consumer should be preferred.

Until version 4.0, these features were effectively mutually exclusive - if Single Active Consumer was enabled, a new consumer would never become active, regardless of its priority, as long as the previous consumer remained active. Starting with 4.0, if the new consumer's priority is higher than the currently active consumer's, the quorum queue will switch over to the higher priority consumer: it will stop delivering messages to the current consumer, wait for all the messages to be acknowledged, and then will deactivate the old consumer, and activate the higher priority consumer instead.

The graph below shows this behavior. There are three metrics on this graph:

  • the green line shows the number of messages consumed by the first (default priority) consumer (which happens to be configured to consume 10 msgs/s)
  • yellow, shows the same value but for the second, higher priority consumer
  • blue, shows the number of unacknowledged messages (axis scale on the right)

Single Active Consumer switchover: the normal-priority consumer (green) gets deactivated after it has acknowledged its messages, then the higher-priority consumer (yellow) gets activated
Single Active Consumer switchover: the normal-priority consumer (green) gets deactivated after it has acknowledged its messages, then the higher-priority consumer (yellow) gets activated

Initially, we only have one consumer and as expected, it consumes 9-10 msgs/s (these jumps between 9 and 10 are simply a result of how the metrics are emitted and then displayed). This consumer is configured with the prefetch of 1000 messages and since there were many messages in the queue - the prefetch buffer is used to the maximum. Then the yellow line appears, initially at 0 msgs/s. This is the higher priority consumer, which is already connected, but not yet active. From the moment it connected, we can see the number of unacknowledged messages going down, since the queue no longer delivers messages to the original consumer. Once all of them are acknowledged, the new consumer becomes the single active consumer and receives 1000 messages, since that's its prefetch value. It then happily consumes around 10 msgs/s as configured.

How Was This Test Performed?

For this test we used perf-test, a testing client for AMQP 0.9.1.

# Publish 5000 messages to have a backlog (perf-test will declare a quorum queue `qq-sac`)
perf-test --quorum-queue --queue qq-sac --pmessages 5000 --confirm 100 -qa x-single-active-consumer=true --consumers 0
# Start a consumer with the default priority and prefetch of 1000; consume ~10 msgs/s
perf-test --producers 0 --predeclared --queue qq-sac --consumer-latency 100000 --qos 1000
# In another window, some time after starting the first consumer, start a higher priority consumer
perf-test --producers 0 --predeclared --queue qq-sac --consumer-latency 100000 --qos 1000 --consumer-args x-priority=10

After some time you should see that the first consumer stopped receiving messages (no more output from perf-test), while the second one receives messages.

note

The settings used in this example were chosen to highlight the switchover process and are not great for real world scenarios. If a consumer can only process 10 msgs/s, there's usually no reason to configure the prefetch value as high as 1000.

Delivery Limit is Now 20 by Default

warning

This can be a breaking change for some applications

Quorum queues now have the delivery limit set to 20 by default. In the past, the limit wasn't set and therefore was quorum queues would attempt the delivery forever, until the message is either acknowledged or discarded by the consumer. This could lead to a situation where a message is stuck in the queue and can never be delivered.

The downside of this change is that if dead lettering is not configured, messages will be dropped after 20 attempts. Therfore, it is highly recommended to configure dead lettering for all quorum queues.

Faster Recovery of Long Queues

This is less of a feature and more an internal change, but certainly worth mentioning. Up until now, if a RabbitMQ node was restarted, all quorum queues on that node had to read through all of their data (the Raft log) since the last snapshot to re-build their in-memory state. For example, if you publish a few million messages to a quorum queue right now and then restart a node, you will see that after the node is up, the queue will report 0 ready messages for quite some time (at least a few seconds) and you won't be able to start consuming these messages. The queue is simply not yet ready to serve traffic - it's still reading the data from disk (note: this doesn't mean that all that data is then kept in memory, a vast majority of it is not, but an index / summary of the queue data is). Starting with RabbitMQ 4.0, quorum queues create checkpoint files which include the state of the queue at a certain point in time. Upon startup, the queue can read the most recent checkpoint and only the part of the Raft log from that point in time. These means that quorum queues take significantly less time to start.

For example, a RabbitMQ node with one quorum queue containing 10 million 12-byte messages, takes about 30 seconds to start on my machine. With RabbitMQ 4.0, it just takes a fraction of a second.

You may wonder what the difference is between a snapshot and a checkpoint. In many ways, they are the same - they actually share the code that writes them to disk. The difference is that a snapshot is only created when the Raft log is truncated. For many common queue use cases, this all that is needed - older messages are consumed, we create a snapshot that no longer contains them and we truncate the log. At this point the queue has no memory of those messages ever being present. Checkpoints on the other hand, are created periodicailly when we can't truncate the log. The test case scenario is a good example - since we didn't consume any messages, the oldest messages are still there, we can't just forget about them. But a checkpoint still allows the queue to start more quickly. A checkpoint can be promoted to a snapshot when the log is truncated (in this example - after some of the older messages are consumed).

How Can I Try This?

Once again, we'll use perf-test to declare the queue and publish messages

# Publish 10 million 12-byte messages (feel free to play with other values)
perf-test --quorum-queue --queue qq --consumers 0 --pmessages 5000000 --confirm 1000 --producers 2
# restart the node
rabbitmqctl stop_app && rabbitmqctl start_app
# list the queues (repeat this command until the number of messages is 10 million instead of 0)
rabbitmqctl list_queues

Summary

RabbitMQ 4.0 is a significant milestone for RabbitMQ. With the removal of classic queue mirroring, quorum queues become the only option for highly available, replicated queues (note: streams are also highly available and replicated, but technically not queues; nevertheless, they might still be a good choice for some use cases where classic mirrored queues were used in the past). Quorum queues have offered higher data safety guarantees and much better performance than mirrored queues for years and with these latest improvements, they become even more robust and performant in a wider range of scenarios.

You can play with RabbitMQ 4.0 beta now: https://github.com/rabbitmq/rabbitmq-server/releases/tag/v4.0.0-beta.5