Building a Custom Metric & Dashboard
In this document, we’re going to build a device and dashboard which will
Measure the amount of elapsed time between a message being ingested by a remote agent and subsequently processed (in Connexion) and returned to the remote agent.
Create a dashboard showing the ‘message latency’ over time, as well as an SLA (service-level agreement) metric. The SLA metric can be used to determine periods where the SLA is not met.
Visual Studio is required to create a custom device
The metrics feature must be enabled in the Configuration Wizard. Please review previous metrics topics on how to enable this feature.
We’re going to leverage the Message Tracing features of Connexion to calculate the amount of time elapsed since message ingestion. We’ll begin by creating a new custom device project in Connexion.
We’re going to start with the Device.cs
file (this class contains your non-UI device logic). Replace the code inside the class definition with the following:
using Connexion.Core;
using System;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace MessageTraceMetricReporter
{
[DevicePlugin("MessageTraceMetricReporter", "Writes a metric tracking the elapsed time based on message tracing", DeviceDefinitionFlags.None, typeof(object), typeof(object), typeof(MessageTraceMetricReporterFactory))]
public class MessageTraceMetricReporter : BaseDevice<MessageTraceMetricReporterConfiguration>
{
// We're using a histogram metric to measure message age. Lots of information about Prometheus histograms online.
private IHistogram m_MessageAgeHistogram;
public MessageTraceMetricReporter()
: this(Guid.Empty, null)
{
}
public MessageTraceMetricReporter(Guid deviceKey, IMessageChannelDevice messageChannelDevice)
: base(deviceKey, messageChannelDevice)
{
}
protected override void InitializeMetrics()
{
// We're initializing the histogram with labels 'Group', 'Tab', 'Channel', and 'ChannelKey'. These labels
// let us create dashboards for a specific channel, or, plot multiple channels on a single chart.
// You can see we're passing in the label values using the .WithLabels(...) method.
m_MessageAgeHistogram = GetOrCreateHistogram("MessageTrace.message_age_in_s", "The elapsed time since the message was ingested", new HistogramConfiguration
{
LabelNames = new[] { "Group", "Tab", "Channel", "ChannelKey" },
Buckets = new double[] { 0.25, 0.5, 1, 2.5, 5, 10, 30, 60 }
}).WithLabels(MessageChannel.GroupName, MessageChannel.TabName, MessageChannel.ChannelName, MessageChannel.ChannelKeyString);
}
// this method is called when the upstream queue is in 'regular' mode (processing one message at-a-time)
public override Task ProcessMessageAsync(IMessageContext context, CancellationToken cancellationToken)
{
if (m_MessageAgeHistogram.IsEnabled)
GetMessageAge(context);
return Task.CompletedTask;
}
// this method is called when the upstream queue is in 'batch' mode (we can process multiple messages at-a-time)
public Task ProcessMessagesAsync(IMessageBatchContext batch, CancellationToken cancellationToken)
{
if (!m_MessageAgeHistogram.IsEnabled)
return Task.CompletedTask;
foreach (var context in batch.GetMessageContexts())
{
GetMessageAge(context);
}
return Task.CompletedTask;
}
// Calculate the elapsed time between when the message was first ingested and now. Message trace information flows with
// a message through remote agent connector devices, branches, and secure senders. It does not flow through other device
// types.
private void GetMessageAge(IMessageContext context)
{
var trace = context.MessageExtension.GetMessageTrace();
var ingestionEntry = trace.FirstOrDefault();
if (ingestionEntry != null)
m_MessageAgeHistogram.Observe(DateTime.UtcNow.Subtract(ingestionEntry.Utc).TotalSeconds);
}
// since this device does not change the message, setting this to false will help performance
public override bool DeviceChangesMessage
{
get { return false; }
}
}
}
The ProcessMessageAsync(...)
and ProcessMessagesAsync(...)
methods are the entry points for ‘regular’ message processing and ‘batch’ message processing. They both call the GetMessageAge(...)
method which calculates the age of the message.
The age of the message (in seconds) is then passed to the histogram (which will subsequently publish the metric).
Note the bucket configuration on line 33
Buckets = new double[] { 0.25, 0.5, 1, 2.5, 5, 10, 30, 60 }
Each message age will be placed in a specific bucket. In this case the buckets are <=0.25 seconds (250ms), >0.25 and <=0.5, (and so on) up to a value of 60 seconds. Depending on the use-case, other buckets may be more appropriate.
Since this device has no configuration and no user interface, we can ignore and/or remove the boilerplate code from the other classes.
Compile the device and import it into Connexion.
In our scenario, we’re going to measure the time it takes for a message to be received (via HL7 socket) by a remote agent, transfered to Connexion, perform some work in Connexion (like storing to a database), and then returning a result back to the remote agent.
For this sample we have two remote agent channels. The first will receive HL7 and transfer it to Connexion, and the second will receive from Connexion and send the result back out via HL7 socket.
On the Connexion Side, we have a single channel which receives from the RA, does some trivial processing, and then returns the message back to the “From Cloud to Customer” channel on the RA. Note that we have added our new metrics device to the end of this channel.
By placing our new device after the RA Sender device, the message(s) are stored on the remote agent side before we make our measurement.
Once you have configured and started your channels, you will need to enable your metric. Navigate to the metrics tab and enable it. Remember to click the Save button after enabling the metric.
Let’s confirm the metric is being published by visiting the Connexion metric endpoint (typically https://yourconnexionserver:8092/metrics).
Great! Now we can start building up our dashboard. It’s always a good idea to process some messages prior to building the dashboard. This gives you feedback while you’re working with the metric within the dashboard.
Navigate to Grafana and create a new dashboard. Add a new panel and begin to fill it in as follows:
Let’s tweak a few settings to display the legend on the right and set the Y-axis to display time.
Click the Apply button and update the period and refresh.
Great! We have a chart showing the age of a message (averaged over 30 seconds). Now let’s create an SLA chart showing the percentage of messages being processed within 1 second. Create a new panel and fill it out as follows:
Give the chart a title and click the Apply button.
Great! Now we can see the percentage of messages which are being processed within one second (our theoretical SLA requires messages to be processed within 1 second, 90% of the time).
There are many other charts which can be derived from this single metric. Here is an example dashboard with some other charts defined (it can be imported into Grafana).
The Custom Device VS Project: