Create & Configure Metrics

Please review the metrics overview if you haven’t yet.

Once you have Connexion, Prometheus, and Grafana configured, you can leverage this infrastructure to view built-in and custom metrics. Built-in metrics are compiled with each version of Connexion. Custom metrics can be written in compiled custom devices as well as custom code-based devices.

Metrics Configuration

The metrics configuration screen can be used to enable and disable metrics, as well as configure which channel(s), plugin(s), and repositories are associated with a specific metric.

Metrics calculations can consume considerable resources. Do not arbitrarily enable metrics on production systems without ensuring:

The metrics will be used / monitored
The additional load will not degrade message processing performance

Always verify proper system operation and performance after changing the metrics configuration.

This screen follows the same pattern as the Alerting, Authorization, and Monitoring screens. You create a metric group to configure the enabled state for one or more metrics (which may be tied to a specific set of channels, plugins, repositories, etc.). By creating multiple metric groups, you can create a fine-grained set of enabled metrics. The metrics system will search all metric groups to determine if a metric should be written for a specific connexion item (channel, plugin, repo, etc.).

By default, all channels, plugins, and repositories are enabled. If you wish to enable a metric for a specific object, you must explicitly deselect objects to be ignored.

Metric Association

Most metrics are associated with Connexion objects. For example, the connexion_channel_message_count metric is only written for selected channels. The connexion_repository_disk_space metric is only written for selected repositories.

In some cases, the same metric exists in multiple locations. For example, connexion_channel_message_count (calculated at the channel-level) is also provided at the system-level by connexion_system_message_counts. The typical (Grafana) method for calculating the system-level message counts would be to sum all the connexion_channel_message_count metrics, however, this would require Connexion to publish this metric for every channel. In a system with thousands of channels, this may an be unnecessary overhead when the goal is to display a single system-level count.

Some metrics may simply be on or off, such as the process-level metrics.

Enabling a Metric

To enable a metric, click the checkbox beside its name. If you wish to write this metric only for a specific set of connexion objects, navigate to the appropriate tab and set the corresponding checked state. Save the metric.

Metrics are typically created on first use. This means you may need to start a channel or process a single message before the metric is published to the http endpoint.

You can verify that a metric is being properly published by visiting the metrics endpoint of your Connexion server (https://your-server:8092/metrics) and searching for the metric name.

Understanding Prometheus Metrics

Prometheus metrics are comprised of 2 parts: a name, and zero or more labels. The name identifies the metric (connexion_channel_message_count in the image above), and the labels identify any association(s). The labels in the above image Group=X , Tab=Y , Channel=Z etc. denote the group/tab/channel for which we are writing message count values. In Prometheus (and Grafana) we can query on these labels, allowing us to create a graph or table for a specific channel. We can also use functions to create other metrics, such as a sum , average , or delta (among many others) of all queued messages with a tab label equal to ‘XYZ’.

If you create your own metrics, you will need to think about the types of visualizations you want to support, and ensure you supply labels which support this.

Read about labels

Metric Types

Connexion supports 3 types of metrics: Counters, Gauges, and Histograms.

Counters are for data that only ever increases (such as request count) and reset to zero when the service restarts. When you want to show the rate/speed of an operation, you typically want a Counter.

Gauges are like counters, but allow values to increase and decrease. For example, cpu and memory metrics would be gauges.

Histograms allow you to measure the distribution, count, and rate/time. For example, if you wanted to measure the size of a payload, a histogram would show you how many of your payloads were between X and Y in size, and how many were between Y and Z in size etc. It can also calculate the rate payload operations as well as other more advanced metrics (such as how many payloads were above/below a certain threshold).

Creating Metrics

The vast majority of custom metrics will be part of custom compiled devices. As an example, let’s assume you’re a device author and your device sends messages to a web service. You’d like to track the duration of the web service call, the upload volume (bytes uploaded) and rate (bytes per second) as well as the same metrics on the download side. You also want to track the message size sent to the service.

For tracking operation duration, you will need a histogram. You will also need a histogram to track the message size, transfer rate, and total transfer volume (one histogram for upload, and one for download).

Connexion’s BaseDevice class, from which all custom devices derive, contains scaffolding for declaring metrics. There are three methods exposed for creating metrics: GetOrCreateCounter, GetOrCreateGauge, and GetOrCreateHistogram. The MetricsProvider is also exposed should you need more advanced functionality.

Connexion’s metrics implementation is closely related to the prometheus-net library. Many of our conventions and signatures mimic those of this library.

Your metrics should be created in the InitializeMetrics() override. By using this method, your metrics will automatically be updated as users enable and disable metrics via the UI. Let’s look at an example device class:

  public class MetricSample : BaseDevice<MetricSampleConfiguration>
  {
    private IHistogram m_SendDurationHistogram;   // measure the time taken to send the payload

    public override void Start()
    {
      // make sure you call InitializeMetrics
      InitializeMetrics();
    }
    protected override void InitializeMetrics()
    {
      // construct your metrics here
      m_SendDurationHistogram = GetOrCreateHistogram("MyDevice.send_duration_in_s",                           // name of your metric. See notes about X.y syntax
                                                     "The duration (in seconds) to send to the foo service",  // description
                                                     new HistogramConfiguration                               // configuration
                                                     {
                                                       LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey" },          // associate this metric with the owner channel
                                                       Buckets = new double[] { 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10 }  // buckets (in seconds)
                                                     }).WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString);
    }

    public override Task ProcessMessageAsync(IMessageContext context, CancellationToken cancellationToken)
    {
      // ...your code           
      using (m_SendDurationHistogram.NewTimer())    // the Histogram.NewTimer() method is used to measure the wrapped methods duration.
      { 
        await SendToFooServiceAsync(...).ConfigureAwait(false);
      }
      // ... your code
    }
  }