If you want to be notified when your RabbitMQ deployments have a problem, now you can set up the RabbitMQ monitoring and alerting that we have made available in the RabbitMQ Cluster Operator repository. Rather than asking you to follow a series of steps for setting up RabbitMQ monitoring & alerting, we have combined this in a single command. While this is a Kubernetes-specific quick-start, and you can use these Prometheus alerts outside of Kubernetes, the setup will require more consideration and effort on your part. We share the quick & easy approach, open source and free for all.
When everything is set up and there is a problem with RabbitMQ, this is an example of a notification that you can expect:
The above is a good example of a problem that may not be obvious when it happens, and takes a few steps to troubleshoot. Rather than losing messages due to a misconfiguration, this notification makes it clear when incoming messages are not routed within RabbitMQ.
You will need the following:
kubectl
pointing to your Kubernetes deployment and matching the Kubernetes server versionhelm
version 3Now you are ready to run the following in your terminal:
git clone https://github.com/rabbitmq/cluster-operator.git
# Optionally, set the name of the Slack channel and the Slack Webhook URL
# If you don't have a Slack Webhook URL, create one via https://api.slack.com/messaging/webhooks
# export SLACK_CHANNEL='#my-channel'
# export SLACK_API_URL='https://hooks.slack.com/services/paste/your/token'
./cluster-operator/observability/quickstart.sh
The last command takes about 5 minutes, and it sets up the entire RabbitMQ on Kubernetes stack:
RabbitmqCluster
as a custom resource definition (CRD) and manages all RabbitMQ clusters in KubernetesPrometheusRule
and ServiceMonitor
custom resource definitionsTo trigger an alert, we need a RabbitMQ cluster. This is the easiest way to create one:
# Add kubectl-rabbitmq plugin to PATH so that it can be used directly
export PATH="$PWD/cluster-operator/bin:$PATH"
# Use kubectl-rabbitmq plugin to create RabbitmqClusters via kubectl
kubectl rabbitmq create myrabbit --replicas 3
To trigger the NoMajorityOfNodesReady
alert, we stop the rabbit
application on two out of three nodes:
kubectl exec myrabbit-server-0 --container rabbitmq -- rabbitmqctl stop_app
kubectl exec myrabbit-server-1 --container rabbitmq -- rabbitmqctl stop_app
Within 2 minutes, two out of three RabbitMQ nodes will be shown as not READY
:
kubectl rabbitmq get myrabbit
NAME READY STATUS RESTARTS AGE
- pod/myrabbit-server-0 1/1 Running 0 70s
+ pod/myrabbit-server-0 0/1 Running 0 3m
- pod/myrabbit-server-1 1/1 Running 0 70s
+ pod/myrabbit-server-1 0/1 Running 0 3m
pod/myrabbit-server-2 1/1 Running 0 3m
The pods are still Running
because the rabbitmqctl stop_app
command leaves the Erlang VM system process running.
To see the NoMajorityOfNodesReady
alert triggered in Prometheus, we open the Prometheus UI in our browser: http://localhost:9090/alerts.
For this to work, we forward local port 9090 to Prometheus port 9090 running inside Kubernetes:
kubectl -n kube-prometheus port-forward svc/prom-kube-prometheus-stack-prometheus 9090
NoMajorityOfNodesReady
alert is first orange which means it is in a pending
state.
After 5 minutes the colour changes to red and the state becomes firing
.
This will send an alert to Alertmanager.
After we port-forward - same as above - we open the Alertmanager UI: http://localhost:9093
kubectl -n kube-prometheus port-forward svc/prom-kube-prometheus-stack-alertmanager 9093
Alertmanager groups alerts by namespace
and rabbitmq_cluster
.
You see a single alert which Alertmanager forwards to your configured Slack channel:
Congratulations, you triggered your first RabbitMQ alert! To resolve the alert, start the rabbit
application on both nodes:
kubectl exec myrabbit-server-0 --container rabbitmq -- rabbitmqctl start_app
kubectl exec myrabbit-server-1 --container rabbitmq -- rabbitmqctl start_app
The alert will transition to green in Prometheus, it will be removed from Alertmanager, and a RESOLVED notification will be sent to your Slack channel.
To see all past and current RabbitMQ alerts across all your RabbitMQ clusters, look at the RabbitMQ-Alerts Grafana dashboard: http://localhost:3000/d/jjCq5SLMk (username: admin
& password: admin
)
kubectl -n kube-prometheus port-forward svc/prom-grafana 3000:80
In the example above, we have triggered multiple alerts across multiple RabbitMQ clusters.
We have shared the simplest and most useful alerts that we could think of. Some of you already asked us about missing alerts such as memory threshold, Erlang processes & atoms, message redeliveries etc. Commercial customers asked us for runbooks and automated alert resolution.
What are your thoughts on the current alerting rules? What alerts are you missing? Let us know via a GitHub discussion.
Tags: monitoring prometheus kubernetes
Written by: David Ansari Gerhard Lazu
Categories: New Features Kubernetes