Monitoring with Prometheus and Grafana
Overview
This guide covers RabbitMQ monitoring with two popular tools: Prometheus, a monitoring toolkit; and Grafana, a metrics visualisation system.
These tools together form a powerful toolkit for long-term metric collection and monitoring of RabbitMQ clusters. While RabbitMQ management UI also provides access to a subset of metrics, it by design doesn't try to be a long term metric collection solution.
Please read through the main guide on monitoring first. Monitoring principles and available metrics are mostly relevant when Prometheus and Grafana are used.
Some key topics covered by this guide are
- Prometheus support basics
- Grafana support basics
- Quick Start for local experimentation
- Installation steps for production systems
- Two types of scraping endpoint responses: Aggregated vs. Individual Entity Metrics
Grafana dashboards follow a number of conventions to make the system more observable and anti-patterns easier to spot. Its design decisions are explained in a number of sections:
- RabbitMQ Overview Dashboard
- Health indicators on the Overview dashboard
- Graph colour labelling conventions
- Graph thresholds
- Relevant documentation for each graph (metric)
- Spotting Anti-patterns
- Other available dashboards
- TLS support for Prometheus scraping endpoint
Built-in Prometheus Support
RabbitMQ ships with built-in Prometheus & Grafana support.
Support for Prometheus metric collector ships in the rabbitmq_prometheus
plugin.
The plugin exposes all RabbitMQ metrics on a dedicated TCP port, in Prometheus text format.
These metrics provide deep insights into the state of RabbitMQ nodes and the runtime. They make reasoning about the behaviour of RabbitMQ, applications that use it and various infrastructure elements a lot more informed.
Grafana Support
Collected metrics are not very useful unless they are visualised. Team RabbitMQ provides a prebuilt set of Grafana dashboards that visualise a large number of available RabbitMQ and runtime metrics in context-specific ways.
There is a number of dashboards available:
- an overview dashboard
- runtime memory allocators dashboard
- an inter-node communication (Erlang distribution) dashboard
- a Raft metric dashboard
and others. Each is meant to provide an insight into a specific part of the system. When used together, they are able to explain RabbitMQ and application behaviour in detail.
Note that the Grafana dashboards are opinionated and use a number of conventions, for example, to spot system health issues quicker or make cross-graph referencing possible. Like all Grafana dashboards, they are also highly customizable. The conventions they assume are considered to be good practices and are thus recommended.
An Example
When RabbitMQ is integrated with Prometheus and Grafana, this is what the RabbitMQ Overview dashboard looks like:
Quick Start
Before We Start
This section explains how to set up a RabbitMQ cluster with Prometheus and Grafana dashboards, as well as some applications that will produce some activity and meaningful metrics.
With this setup you will be able to interact with RabbitMQ, Prometheus & Grafana running locally. You will also be able to try out different load profiles to see how it all fits together, make sense of the dashboards, panels and so on.
This is merely an example; the rabbitmq_prometheus
plugin and our Grafana dashboards do not require
the use of Docker Compose demonstrated below.
Prerequisites
The instructions below assume a host machine that has a certain set of tools installed:
- A terminal to run the commands
- Git to clone the repository
- Docker Desktop to use Docker Compose locally
- A Web browser to browse the dashboards
Their installation is out of scope of this guide. Use
git version
docker info && docker-compose version
on the command line to verify that the necessary tools are available.