If you want to be notified when your RabbitMQ deployments have a problem, now you can set up the RabbitMQ monitoring and alerting that we have made available in the RabbitMQ Cluster Operator repository. Rather than asking you to follow a series of steps for setting up RabbitMQ monitoring & alerting, we have combined this in a single command. While this is a Kubernetes-specific quick-start, and you can use these Prometheus alerts outside of Kubernetes, the setup will require more consideration and effort on your part. We share the quick & easy approach, open source and free for all.
When everything is set up and there is a problem with RabbitMQ, this is an example of a notification that you can expect:
The above is a good example of a problem that may not be obvious when it happens, and takes a few steps to troubleshoot. Rather than losing messages due to a misconfiguration, this notification makes it clear when incoming messages are not routed within RabbitMQ.
You will need the following:
kubectlpointing to your Kubernetes deployment and matching the Kubernetes server version
Now you are ready to run the following in your terminal:
git clone https://github.com/rabbitmq/cluster-operator.git # Optionally, set the name of the Slack channel and the Slack Webhook URL # If you don't have a Slack Webhook URL, create one via https://api.slack.com/messaging/webhooks # export SLACK_CHANNEL='#my-channel' # export SLACK_API_URL='https://hooks.slack.com/services/paste/your/token' ./cluster-operator/observability/quickstart.sh
The last command takes about 5 minutes, and it sets up the entire RabbitMQ on Kubernetes stack:
RabbitmqClusteras a custom resource definition (CRD) and manages all RabbitMQ clusters in Kubernetes
ServiceMonitorcustom resource definitions
To trigger an alert, we need a RabbitMQ cluster. This is the easiest way to create one:
# Add kubectl-rabbitmq plugin to PATH so that it can be used directly export PATH="$PWD/cluster-operator/bin:$PATH" # Use kubectl-rabbitmq plugin to create RabbitmqClusters via kubectl kubectl rabbitmq create myrabbit --replicas 3
To trigger the
NoMajorityOfNodesReady alert, we stop the
rabbit application on two out of three nodes:
kubectl exec myrabbit-server-0 --container rabbitmq -- rabbitmqctl stop_app kubectl exec myrabbit-server-1 --container rabbitmq -- rabbitmqctl stop_app
Within 2 minutes, two out of three RabbitMQ nodes will be shown as not
kubectl rabbitmq get myrabbit NAME READY STATUS RESTARTS AGE - pod/myrabbit-server-0 1/1 Running 0 70s + pod/myrabbit-server-0 0/1 Running 0 3m - pod/myrabbit-server-1 1/1 Running 0 70s + pod/myrabbit-server-1 0/1 Running 0 3m pod/myrabbit-server-2 1/1 Running 0 3m
The pods are still
Running because the
rabbitmqctl stop_app command leaves the Erlang VM system process running.
To see the
NoMajorityOfNodesReady alert triggered in Prometheus, we open the Prometheus UI in our browser: http://localhost:9090/alerts.
For this to work, we forward local port 9090 to Prometheus port 9090 running inside Kubernetes:
kubectl -n kube-prometheus port-forward svc/prom-kube-prometheus-stack-prometheus 9090
NoMajorityOfNodesReady alert is first orange which means it is in a
After 5 minutes the colour changes to red and the state becomes
This will send an alert to Alertmanager.
After we port-forward - same as above - we open the Alertmanager UI: http://localhost:9093
kubectl -n kube-prometheus port-forward svc/prom-kube-prometheus-stack-alertmanager 9093
Alertmanager groups alerts by
You see a single alert which Alertmanager forwards to your configured Slack channel:
Congratulations, you triggered your first RabbitMQ alert! To resolve the alert, start the
rabbit application on both nodes:
kubectl exec myrabbit-server-0 --container rabbitmq -- rabbitmqctl start_app kubectl exec myrabbit-server-1 --container rabbitmq -- rabbitmqctl start_app
The alert will transition to green in Prometheus, it will be removed from Alertmanager, and a RESOLVED notification will be sent to your Slack channel.
To see all past and current RabbitMQ alerts across all your RabbitMQ clusters, look at the RabbitMQ-Alerts Grafana dashboard: http://localhost:3000/d/jjCq5SLMk (username:
admin & password:
kubectl -n kube-prometheus port-forward svc/prom-grafana 3000:80
In the example above, we have triggered multiple alerts across multiple RabbitMQ clusters.
We have shared the simplest and most useful alerts that we could think of. Some of you already asked us about missing alerts such as memory threshold, Erlang processes & atoms, message redeliveries etc. Commercial customers asked us for runbooks and automated alert resolution.
What are your thoughts on the current alerting rules? What alerts are you missing? Let us know via a GitHub discussion.