Rolling (in-place) Upgrade
Rolling upgrade is a popular upgrade strategy, in which nodes are upgraded one by one: each node is stopped, upgraded and then started. Upgraded nodes rejoin the cluster, which temporarily works in a mixed-version mode: some nodes run the old version, some run the new one.
While all nodes have to be restarted during the upgrade, the cluster as a whole remains available throughout the process (unless it only has one node, of course).
Rolling upgrades don't support skipping versions, except patch releases (for example you can upgrade directly from 3.13.0 to 3.13.7 but you cannot upgrade directly from 3.12.x to 4.0). Moreover, for specific upgrades, additional constraints may apply. Please refer to the version upgradability table for more information.
Before the Upgrade
Investigate if the current and target versions have a rolling upgrade path
Please refer to the version upgradability table for information about the supported upgrade paths.
Check Erlang Version Requirements
Refer to Erlang Version Requirements.
If the same Erlang version is supported by both the current and target RabbitMQ versions, you can leave Erlang as is. However, you can consider upgrading Erlang to the latest supported version at the same time. Both Erlang and RabbitMQ upgrades require a restart, so it may be more convenient to do both at the same time.
If the target RabbitMQ version requires a newer Erlang version, you need to prepare to upgrade Erlang together with RabbitMQ.
Carefully Read the Release Notes Up to the Selected RabbitMQ Version
The release notes may indicate specific additional upgrade steps. Always consult the release notes of all versions between the one currently deployed and the target one.
Verify All Stable Feature Flags Are Enabled
All stable feature flags should be enabled after each upgrade. Otherwise, the upgrade process is not really complete, since some of the changes are not effective. If you follow this advice, there should be nothing to do with regards to the feature flags before the upgrade, since they were all enabled after the previous upgrade.
However, since attempting an upgrade with disabled feature flags may lead to serious issues, it's a good
practice to check if all stable feature flags are enabled before starting the upgrade. You can safely
run rabbitmqctl enable_feature_flag all
- it will do nothing if all flags are already enabled.
Make Sure All Package Dependencies (including Erlang) are Available
If you are using Debian or RPM packages, you must ensure that all dependencies are available. In particular, the correct version of Erlang. You may have to setup additional third-party package repositories to achieve that.
Please read recommendations for Debian-based and RPM-based distributions to find the appropriate repositories for Erlang.
Assess Cluster Health
Make sure nodes are healthy and there are no network partition or disk or memory alarms in effect.
RabbitMQ management UI, CLI tools or HTTP API can be used for assessing the health of the system.
The overview page in the management UI displays effective RabbitMQ and Erlang versions, multiple cluster-wide metrics and rates. From this page ensure that all nodes are running and they are all "green" (w.r.t. file descriptors, memory, disk space, and so on).
We recommend recording the number of durable queues, the number of messages they hold and other pieces of information about the topology that are relevant. This data will help verify that the system operates within reasonable parameters after the upgrade.
Use node health checks to vet individual nodes.
Queues in flow state or blocked/blocking connections might not be a problem, depending on your workload. It's up to you to determine if this is a normal situation or if the cluster is under unexpected load and thus, decide if it's safe to continue with the upgrade.
However, if there are queues in an undefined state (a.k.a. NaN
or
"ghost" queues), you should first start by understanding what is
wrong before starting an upgrade.
Ensure Cluster Has the Capacity for Upgrading
Please refer to changes in system resource usage for information about how the upgrade process can affect resource usage.
Perform the Upgrade
The main part of the upgrade process is performed by stopping, upgrading and starting each node one by one. The following steps should be performed for all nodes.
Stop the Node
The exact way to stop a node depends on how it was started.
Take a Backup
Optionally, when the node is stopped, you can backup its data folder.
Upgrade the Node
Install the new version of RabbitMQ and other packages if necessary.
Make sure you have an Erlang version compatible with the new RabbitMQ version.
Start the Node
Start the node and verify that it joins the cluster.
You can perform the following checks to ensure that the node started and rejoined the cluster successfully:
- run
rabbitmqctl cluster_status
and verify the output- the upgraded node should be listed as running
- there should be no network partitions nor active alarms
- check the management UI
- all nodes should be listed on the main page
- resource usage should be within acceptable limits
- check the logs
- there should be no errors
After the Upgrade
Verify that the Upgrade Has Succeeded
Like you did before the upgrade, verify the health and monitoring data to make sure all cluster nodes are in good shape and the service is running again.