Understanding Elastic Heartbeat time metrics

I wanted to play a little with Grafana while having Elasticsearch as a back-end and decided to use Elastic Heartbeat as my data generator. It’s an easy, no fuss, to set up the Heartbeat itself as well as the first Heartbeat HTTP monitor, but when I saw all the available Heartbeat time metrics for the HTTP monitor I got a bit overwhelmed. So decided to to gradually progress from ICMP through TCP and finally to HTTP Heartbeat monitors and that the way this post is going to evolve as well:

Part 1 – Elasticsearch Heartbeat ICMP time metrics
Part 2 – Elasticsearch Heartbeat TCP time metrics (work in progress)
Part 3 – Elasticsearch Heartbeat HTTP time metrics (work in progress)

Monitors Configuration In order to understand the Heartbeat ICMP monitor time metrics I’ve set up a simple monitor, that “pings” localhost and 127.0.0.1. You might say: “Wait a second, aren’t these that same?”, but you will see the reason why I set it up this way soon. You can see the extract from the heartbeat.yml configuration file relevant to the ICMP monitor

- type: icmp

  # List or urls to query
  hosts: ["localhost", "127.0.0.1"]

  # Configure task schedule
  schedule: '@every 60s'

  # Configure IP protocol types to ping on if hostnames are configured.
  # Ping all resolvable IPs if `mode` is `all`, or only one IP if `mode` is `any`.
  ipv4: true
  ipv6: false
  mode: all

  # Total running time per ping test.
  timeout: 16s

  # Waiting duration until another ICMP Echo Request is emitted.
  wait: 1s

The data returned by the monitors After letting it run for a while let’s check what data Heartbeat is sending to Elasticsearch For the 127.0.0.1 monitor

{
  "@timestamp": "2017-10-17T05:36:25.687Z",
  "beat": {
    "hostname": "DESKTOP-123456",
    "name": "DESKTOP-123456",
    "version": "5.6.1"
  },
  "duration": {
    "us": 0
  },
  "icmp_rtt": {
    "us": 0
  },
  "ip": "127.0.0.1",
  "monitor": "icmp-ip@127.0.0.1",
  "type": "icmp",
  "up": true
}

For the localhost monitor

{
  "@timestamp": "2017-10-17T05:36:25.687Z",
  "beat": {
    "hostname": "DESKTOP-123456",
    "name": "DESKTOP-123456",
    "version": "5.6.1"
  },
  "duration": {
    "us": 1007
  },
  "host": "localhost",
  "icmp_rtt": {
    "us": 501
  },
  "ip": "127.0.0.1",
  "monitor": "icmp-host-ip4@localhost",
  "resolve_rtt": {
    "us": 506
  },
  "type": "icmp",
  "up": true
}

Time metrics We can see that there are only 3 values that have are measured “us” (which suggest that these are time metrics)

duration
resolve_rtt
icmp_rtt

After going through some Elasticsearch Heartbeat docos that’s what I came up with:

duration – Total monitoring test duration

As I understand, it includes the time that Heartbeat actually reads the the monitor definition, executes it and has the final results

resolve_rtt – Duration required to resolve an IP from hostname

Since the first thing that Heartbeat needs to do is to understand which IP you want it to “ping” it tries to translate the hostname you’ve provided to a valid IP. By the way, if you are to provide an IP in the monitor settings, this step will not be require and the metric will exist no more.

icmp_rtt – ICMP Echo Request and Reply round trip time

The actual round trip time of the monitor. The timing starts the moment the client sent the request, and ends when it receives a reply from the server, so basically it includes network time (in each direction) + server time.

Visualization

As a bonus for those who have read it so for here are some nice shiny Kibana graphs to visualize the Heartbeat ICMP monitor duration metrics

Kibana graph for Elastic Heartbeat ICMP monitor that “pings” 127.0..0.1

Kibana graph for Elastic Heartbeat ICMP monitor that “pings” localhost

As you can see in the first graph we didn’t have the resolve_rtt metric since we were “pinging” a known IP, while in the second graph it appears since Heartbeat ICMP monitor needs to resolve the “localhost” to an IP.

Below is a visualization of a “ping” to google.com so and you can see that resolve_rtt would in “real life” be only a small fraction of the total duration (vs localhost scenario)

Kibana graph for Elastic Heartbeat ICMP monitor that “pings” google.com

In Part 2 I will go over the Elastic Heartbeat TCP Monitor time metrics and finally Part 3 will look into the HTTP ones

Please leave your comment if you think that anything is missing.

11 thoughts on “Understanding Elastic Heartbeat time metrics – ICMP”

Hey,

Thank you for this blog post, looking forward to part 2 and 3.

Grtz

Willem

Ilya Reshetnikov says:

March 5, 2018 at 6:46 AM

Thank you for the feedback Willem.
I kind of have most of the details written for these, but didn’t have time to put it in a post and add pretty pictures.

Reply

what are the steps used to calculate monitor.us value. How its get calculated. I gone through docs of heartbeat but can’t find a solution
Is it a time to read the monitor definition, then how its get read and generate the value..

Ilya Reshetnikov says:

February 12, 2019 at 6:37 AM

Hi Steven,
That’s my understanding as well, as I wrote above:
“…duration – Total monitoring test duration
As I understand, it includes the time that Heartbeat actually reads the the monitor definition, executes it and has the final results…”

Regards,
ILYA Reshetnikov

Reply

Where is part 2 and part 3. I tried searching on this website but can’t find it

Ilya Reshetnikov says:

March 19, 2019 at 6:38 AM

I kind of got sidetracked to other stuff, still hope to write them some day 🙂

Reply
Ilya Reshetnikov says:

March 21, 2019 at 6:16 AM

Hi Steven,
Part 2 is here https://isbyr.com/understanding-elastic-heartbeat-time-metrics-tcp/

Reply

How do you see the data returned by the monitor. I visualized it on kibana only. Is there any other way to see it.

Ilya Reshetnikov says:

March 20, 2019 at 4:27 PM

Hi Steven,
You will be able to see the data returned by the monitor on under logs folder of you Elastic Heartbeat installation folder.
You will need to enable debug logging level in the heartbeat.yml file by uncommenting the following line:
#logging.level: debug

Regards,
ILYA

Reply

Very nicely written and thanks for taking the time to explain

Ilya Reshetnikov says:

March 25, 2019 at 4:36 PM

Thank you for the feedback Ravi.

Reply

ISbyR

Understanding Elastic Heartbeat time metrics – ICMP

The data returned by the monitors After letting it run for a while let’s check what data Heartbeat is sending to Elasticsearch For the 127.0.0.1 monitor

Time metrics We can see that there are only 3 values that have are measured “us” (which suggest that these are time metrics)

Visualization

11 thoughts on “Understanding Elastic Heartbeat time metrics – ICMP”

Leave a Reply Cancel reply

Infrequent Smarts by Reshetnikov