Please review the metrics overview if you haven’t yet.
Once you have Connexion, Prometheus, and Grafana configured, you can leverage this infrastructure to view built-in and custom metrics. Built-in metrics are compiled with each version of Connexion. Custom metrics can be written in compiled custom devices as well as custom code-based devices.
Metrics Configuration
The metrics configuration screen can be used to enable and disable metrics, as well as configure which channel(s), plugin(s), and repositories are associated with a specific metric.
Metrics calculations can consume considerable resources. Do not arbitrarily enable metrics on production systems without ensuring:
The metrics will be used / monitored
The additional load will not degrade message processing performance
Always verify proper system operation and performance after changing the metrics configuration.
This screen follows the same pattern as the Alerting, Authorization, and Monitoring screens. You create a metric group
to configure the enabled state for one or more metrics (which may be tied to a specific set of channels, plugins, repositories, etc.). By creating multiple metric groups, you can create a fine-grained set of enabled metrics. The metrics system will search all metric groups
to determine if a metric should be written for a specific connexion item (channel, plugin, repo, etc.).
By default, all channels, plugins, and repositories are enabled. If you wish to enable a metric for a specific object, you must explicitly deselect objects to be ignored.
Metric Association
Most metrics are associated with Connexion objects. For example, the connexion_channel_message_count
metric is only written for selected channels. The connexion_repository_disk_space
metric is only written for selected repositories.
In some cases, the same metric exists in multiple locations. For example, connexion_channel_message_count
(calculated at the channel-level) is also provided at the system-level by connexion_system_message_counts
. The typical (Grafana) method for calculating the system-level message counts would be to sum
all the connexion_channel_message_count
metrics, however, this would require Connexion to publish this metric for every channel. In a system with thousands of channels, this may an be unnecessary overhead when the goal is to display a single system-level count.
Some metrics may simply be on or off, such as the process-level metrics.
Enabling a Metric
To enable a metric, click the checkbox beside its name. If you wish to write this metric only for a specific set of connexion objects, navigate to the appropriate tab and set the corresponding checked state. Save the metric.
Metrics are typically created on first use. This means you may need to start a channel or process a single message before the metric is published to the http endpoint.
You can verify that a metric is being properly published by visiting the metrics endpoint of your Connexion server (https://your-server:8092/metrics) and searching for the metric name.
Understanding Prometheus Metrics
Prometheus metrics are comprised of 2 parts: a name, and zero or more labels. The name identifies the metric (connexion_channel_message_count
in the image above), and the labels identify any association(s). The labels in the above image Group=X
, Tab=Y
, Channel=Z
etc. denote the group/tab/channel for which we are writing message count values. In Prometheus (and Grafana) we can query on these labels, allowing us to create a graph or table for a specific channel. We can also use functions to create other metrics, such as a sum
, average
, or delta
(among many others) of all queued messages with a tab label equal to ‘XYZ’.
If you create your own metrics, you will need to think about the types of visualizations you want to support, and ensure you supply labels which support this.
Metric Types
Connexion supports 3 types of metrics: Counters, Gauges, and Histograms.
Counters are for data that only ever increases (such as request count) and reset to zero when the service restarts. When you want to show the rate/speed of an operation, you typically want a Counter.
Gauges are like counters, but allow values to increase and decrease. For example, cpu and memory metrics would be gauges.
Histograms allow you to measure the distribution, count, and rate/time. For example, if you wanted to measure the size of a payload, a histogram would show you how many of your payloads were between X and Y in size, and how many were between Y and Z in size etc. It can also calculate the rate payload operations as well as other more advanced metrics (such as how many payloads were above/below a certain threshold).
Creating Metrics
The vast majority of custom metrics will be part of custom compiled devices. As an example, let’s assume you’re a device author and your device sends messages to a web service. You’d like to track the duration of the web service call, the upload volume (bytes uploaded) and rate (bytes per second) as well as the same metrics on the download side. You also want to track the message size sent to the service.
For tracking operation duration, you will need a histogram. You will also need a histogram to track the message size, transfer rate, and total transfer volume (one histogram for upload, and one for download).
Connexion’s BaseDevice
class, from which all custom devices derive, contains scaffolding for declaring metrics. There are three methods exposed for creating metrics: GetOrCreateCounter
, GetOrCreateGauge
, and GetOrCreateHistogram
. The MetricsProvider
is also exposed should you need more advanced functionality.
Connexion’s metrics implementation is closely related to the prometheus-net library. Many of our conventions and signatures mimic those of this library.
Your metrics should be created in the InitializeMetrics()
override. By using this method, your metrics will automatically be updated as users enable and disable metrics via the UI. Let’s look at an example device class:
public class MetricSample : BaseDevice<MetricSampleConfiguration> { private IHistogram m_SendDurationHistogram; // measure the time taken to send the payload public override void Start() { // make sure you call InitializeMetrics InitializeMetrics(); } protected override void InitializeMetrics() { // construct your metrics here m_SendDurationHistogram = GetOrCreateHistogram("MyDevice.send_duration_in_s", // name of your metric. See notes about X.y syntax "The duration (in seconds) to send to the foo service", // description new HistogramConfiguration // configuration { LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey" }, // associate this metric with the owner channel Buckets = new double[] { 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 } // buckets (in seconds) }).WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString); } public override Task ProcessMessageAsync(IMessageContext context, CancellationToken cancellationToken) { // ...your code using (m_SendDurationHistogram.NewTimer()) // the Histogram.NewTimer() method is used to measure the wrapped methods duration. { await SendToFooServiceAsync(...).ConfigureAwait(false); } // ... your code } }