Create & Configure Metrics

Please review the metrics overview if you haven’t yet.

Once you have Connexion, Prometheus, and Grafana configured, you can leverage this infrastructure to view built-in and custom metrics. Built-in metrics are compiled with each version of Connexion. Custom metrics can be written in compiled custom devices as well as custom code-based devices.

Metrics Configuration

The metrics configuration screen can be used to enable and disable metrics, as well as configure which channel(s), plugin(s), and repositories are associated with a specific metric.

Metrics calculations can consume considerable resources. Do not arbitrarily enable metrics on production systems without ensuring:

  • The metrics will be used / monitored

  • The additional load will not degrade message processing performance

Always verify proper system operation and performance after changing the metrics configuration.

This screen follows the same pattern as the Alerting, Authorization, and Monitoring screens. You create a metric group to configure the enabled state for one or more metrics (which may be tied to a specific set of channels, plugins, repositories, etc.). By creating multiple metric groups, you can create a fine-grained set of enabled metrics. The metrics system will search all metric groups to determine if a metric should be written for a specific connexion item (channel, plugin, repo, etc.).

By default, all channels, plugins, and repositories are enabled. If you wish to enable a metric for a specific object, you must explicitly deselect objects to be ignored.

Metric Association

Most metrics are associated with Connexion objects. For example, the connexion_channel_message_count metric is only written for selected channels. The connexion_repository_disk_space metric is only written for selected repositories.

In some cases, the same metric exists in multiple locations. For example, connexion_channel_message_count (calculated at the channel-level) is also provided at the system-level by connexion_system_message_counts. The typical (Grafana) method for calculating the system-level message counts would be to sum all the connexion_channel_message_count metrics, however, this would require Connexion to publish this metric for every channel. In a system with thousands of channels, this may an be unnecessary overhead when the goal is to display a single system-level count.

Some metrics may simply be on or off, such as the process-level metrics.

Enabling a Metric

To enable a metric, click the checkbox beside its name. If you wish to write this metric only for a specific set of connexion objects, navigate to the appropriate tab and set the corresponding checked state. Save the metric.

You can verify that a metric is being properly published by visiting the metrics endpoint of your Connexion server (https://your-server:8092/metrics) and searching for the metric name.

Understanding Prometheus Metrics

Prometheus metrics are comprised of 2 parts: a name, and zero or more labels. The name identifies the metric (connexion_channel_message_count in the image above), and the labels identify any association(s). The labels in the above image Group=X , Tab=Y , Channel=Z etc. denote the group/tab/channel for which we are writing message count values. In Prometheus (and Grafana) we can query on these labels, allowing us to create a graph or table for a specific channel. We can also use functions to create other metrics, such as a sum , average , or delta (among many others) of all queued messages with a tab label equal to ‘XYZ’.

If you create your own metrics, you will need to think about the types of visualizations you want to support, and ensure you supply labels which support this.

Read about labels

Metric Types

Connexion supports 3 types of metrics: Counters, Gauges, and Histograms.

Counters are for data that only ever increases (such as request count) and reset to zero when the service restarts. When you want to show the rate/speed of an operation, you typically want a Counter.

Gauges are like counters, but allow values to increase and decrease. For example, cpu and memory metrics would be gauges.

Histograms allow you to measure the distribution, count, and rate/time. For example, if you wanted to measure the size of a payload, a histogram would show you how many of your payloads were between X and Y in size, and how many were between Y and Z in size etc. It can also calculate the rate payload operations as well as other more advanced metrics (such as how many payloads were above/below a certain threshold).

Read more about metric types

Creating Metrics

The vast majority of custom metrics will be part of custom compiled devices. As an example, let’s assume you’re a device author and your device sends messages to a web service. You’d like to track the duration of the web service call, the upload volume (bytes uploaded) and rate (bytes per second) as well as the same metrics on the download side. You also want to track the message size sent to the service.

For tracking operation duration, you will need a histogram. You will also need a histogram to track the message size, transfer rate, and total transfer volume (one histogram for upload, and one for download).

Connexion’s BaseDevice class, from which all custom devices derive, contains scaffolding for declaring metrics. There are three methods exposed for creating metrics: GetOrCreateCounter, GetOrCreateGauge, and GetOrCreateHistogram. The MetricsProvider is also exposed should you need more advanced functionality.

Your metrics should be created in the InitializeMetrics() override. By using this method, your metrics will automatically be updated as users enable and disable metrics via the UI. Let’s look at an example device class:

public class MetricSample : BaseDevice<MetricSampleConfiguration> { private IHistogram m_SendDurationHistogram; // measure the time taken to send the payload public override void InitializeMetrics() { // construct your metrics here m_SendDurationHistogram = GetOrCreateHistogram("MyDevice.send_duration_in_s", // name of your metric. See notes about X.y syntax "The duration (in seconds) to send to the foo service", // description new HistogramConfiguration // configuration { LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey" }, // associate this metric with the owner channel Buckets = new double[] { 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 } // buckets (in seconds) }).WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString); } public override Task ProcessMessageAsync(IMessageContext context, CancellationToken cancellationToken) { // ...your code using (m_SendDurationHistogram.NewTimer()) // the Histogram.NewTimer() method is used to measure the wrapped methods duration. { await SendToFooServiceAsync(...).ConfigureAwait(false); } // ... your code } }
  • The histogram is defined on line 3 (using the interface type)

  • The InitializeMetrics() method is called (automatically) after the device is loaded (after the Load method). Any labels used within the InitializeMetrics() method must be initialized in your constructor or Load methods.

  • The histogram is created in the InitializeMetrics method.

  • The operation to be measured is wrapped with the .NewTimer() method. The NewTimer method is a specialized function of the histogram type for measuring duration (in seconds). Other operations (like size) are measured using the .Observe(X) method.

The GetOrCreateX methods all take a name, description, and configuration parameter. Names must be globally unique, so you should namespace your metric using the notation Namespace.metric_name. The portion in front of the . is the namespace and is displayed in the UI as a separate block. This is typically your device name. The portion after the . is the name as presented to Prometheus. connexion_ is automatically added to your name for ease of use within Grafana. Note that convention is to use all lower-case with underscore separators for the name (the namespace can be upper case and is removed from the Prometheus name).

The configuration object defines the labels and buckets for the histogram:

new HistogramConfiguration { LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey" }, Buckets = new double[] { 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 } }).WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString);

The LabelNames collection defines the labels which will be supplied for this metric. It is convention to include Group, Tab, Channel, and ChannelKey labels. Add additional labels as needed. For example, a “TargetUri” label would enable you to break down metrics on a per-target basis.

The Buckets collection defines the buckets into which each metric will be placed. Duration is measured in seconds in Grafana, so our buckets are [up to 1ms], [1ms to 5ms], [5ms to 10ms]…[10 seconds]. Values greater than 10 seconds go into a special infinite bucket.

The .WithLabels(...) method creates a child metric with the given label values. In this case, the label values never change, so we can define our class-level histogram using the .WithLabels(...) method. In some cases, the label values will change dynamically. In this cases, you should define the class-level histogram without the .WithLabels(...) method, and then use the .WithLabels(...) each time you write a value.

m_SendDurationHistogram = new HistogramConfiguration { LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey", "DynamicValue" }, Buckets = new double[] { 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 } }); public override Task ProcessMessageAsync(IMessageContext context, CancellationToken cancellationToken) { // ...your code var dynamicValue = context.Message.SomeValueThatChangesWithEachMessage; using (m_SendDurationHistogram.WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString, dynamicValue).NewTimer()) { await SendToFooServiceAsync(...).ConfigureAwait(false); } // ... your code }

Once you have deployed your device and processed at least one message, your metric will be published on the Connexion http metric endpoint. At this point you can open Grafana in a browser and start creating a dashboard.

Custom Metrics in the UI

In order for custom metrics to appear in the Connexion UI metric tab, they must be created/accessed once. The process of calling the GetOrCreateX method will register a metric within Connexion. Currently, the Connexion UI must be restarted (or navigate away and back to the metric tab), for newly registered metrics to be displayed.