Splunk Eventgen Jinja templating

I’ve recently dipped my toes into Splunk Eventgen (Jinja templating). It’s an awesome app that allows you to generate sample events that can be ingested by Splunk (or for any other reason).

EventGen has two ways of configuring the event content generation:

  • Traditional – where you specify a sample file and provide regexes that will be used to replace static content in the sample file with the required values
  • Jinja Templating – where you use Jinja templating engine to create the events.

While the traditional way is quite straightforward, the event’s format that I was after had a few nuances that made it not suitable for me, thus I had to fiddle with Splunk Eventgen Jinja templating.

Requirements

Generate Skype For Business (Media Quality Summary) MOS data. This data is basically “call” records, so it will have:

  • Call Start and End timestamps
  • Participants (callee and caller) details (username/device used/ IP)
  • Quality of the call
  • The Skype for Business pool used

Most of the time I’ve spent on the timestamps issues: just randomly generated timestamps will not cut, as the Start timestamp should be before the End timestamp and I also wanted to make sure that I can define the randomness’s range, i.e. what could be the minimum and the maximum duration of the call.

Result

Eventgen can either be installed as a Splunk app or can be installed “outside” of Splunk Enterprise and be used as a command-line tool. In my case, I took the former approach.

Install and enable the Splunk Eventgen app

Download the app from Splunk Base.

Install it (either via Splunk UI or by extracting the Eventgen app into your <SPLINK_HOME>/etc/apps folder.

Create a new inputs.conf file, either under the <SPLUNK_HOME>/etc/apps/SA-Eventgen/local/ (that’s the default extracted folder name for the Eventgen app) or under <SPLUNK_HOME>/etc/apps/my_app>local/ (that’s my preference) with followin stanza to enable eventgen

[modinput_eventgen://default]
disabled = false

Create your Eventgen configuration

Eventgen generators are defined in eventgen.conf files, so let’s create one (as mentioned before I’ll be doing it under <SPLUNK_HOME>/etc/apps/my_app>local/).

[conversations_mos_jinja.sample]
earliest = -15s
latest = now
interval = 15
count = 10

outputMode = file
fileName = /tmp/mos_jinja.log

generator = jinja
jinja_template_dir = templates
jinja_target_template = mos_jinja.template
jinja_variables = { \
    "min_duration": 10, \
    "max_duration": 180, \
    "timezone_offset": 10, \
    "corp_domain": "whatever.com", \
    "capture_device_list": [ "Device Type 1","Device Type 2","Device Type 3","Device Type 4","Device Type 5" ], \
    "cpu_name_list": ["CPU @ 1.60GHz","CPU @ 2.60GHz","CPU @ 3.60GHz","CPU @ 4.60GHz","CPU @ 5.60GHz"],\
    "ip_range_list": ["10.0","192.168"]}

Let’s decipher all these lines:

stanza name – which in Splunk Eventgen Jinja templating method is just a meaningful name (but if you were to use the Default approach, stanza name is actually the name of a sample file that is used to generate events).

earliest – what can be earliest event’s timestamp

latest – what can be the latest event’s timestamp

interval – how ofter the generator runs

count – how many events should be generated each run

outputMode – I needed to write to a file on disk, but one can use other options to ingest directly to Splunk or send it HEC or a few other options.

fileName – which file to write the results to, a few notes here:

  • make sure that the user that is running Splunk service has permissions to write to this destination
  • Eventgen has a built-in file rotation mechanism in place, so you don’t need to worry about that. If the default values of having up to 5 files of 10MB don’t work for you, these can be overwritten using fileBackupFiles and fileMaxBytes respectively.

generator – which generator to use, obviously (I mean that’s the main purpose of this whole post) we are using the jinja one

jinja_template_dir – path to the Jinja template folder relative to Eventgen’s sample folder

jinja_target_template – the name of the “root” Jinja template that will be used.

jinja_variables – here we can pass variables that will be used by the Splunk Jinja templating engine

Craft your JInja template

I’ll dive into the template in a bit, but first:

Variables available for every sample

eventgen_count – The current count

eventgen_maxcount – The max count requested in the stanza

eventgen_earliest – The earliest specified item in ISO8601

eventgen_earliest_epoch – earliest converted to epoch time based on specified value and host time

eventgen_latest – the latest specified time item in ISO8601

eventgen_latest_epoch – latest time converted to epoch

Timestamps in Splunk Eventgen Jinja Templating are pain….

I think many will agree that time/timestamps/dates are always hard, no matter what programming language / tool you are using and this is no different in Splunk Eventgen Jinja templating.

Eventgen GitHub repo has a sample of how to work with time in Jinja templates, so that was my source of inspiration.

The 2 custom Jinja functions that are exposed and can be used in the Jinja templates are:

FunctionDescriptionParametersReturns
time_now Will tell the time module to find the current spot in time based on the current host’s machine time.date_formattime_now_epoch and time_now_formatted
time_sliceUsed to divide up time based on earliest and latest. Let’s say I wanted to know, “If I gave you a start window and an end window, and wanted you to divide that time slot into set buckets of time. Given my 3rd event, give me back a time period that fits the expected slice.”earliest – earliest time in epoch start slice time periods
latest – latest time in epoch to end slice time period
count – Which slice to use
slices – Total number of slices to divide time into
date_format – python date format you want the results be formatted in
time_now_epoch and time_now_formatted
Eventgen Jinja time functions

As you see, there is no function that allows you to provide an epoch timestamp and it will spit you a nicely formatted Date and Time. While I know Python enough to be able to extend the jinja generator to have this function, I’ve decided not to do it for the sake of easier implementation later (one will have to re-patch the Jinja generator every time when the Eventgen app is updated)

Also while you can specify date_format in the above 2 functions, it is using out-of-the-box Python’s strftime function, which lacked some flexibility in my case (like having microseconds only and not milliseconds and not beeing able to have : in the Timezone .

My timestamp whinging is done so let’s…

Create your Jinja template.

Based on content of eventgen.conf mine was <SPLUNK_HOME>/etc/apps/my_app/samples/templates/mos_jinja.template.

Here is how it looks inside

{# 
session_seq,start_time,end_time,conversational_mos,caller_ip_addr,callee_ip_addr,caller_capture_dev,callee_capture_dev,caller_cpu_name,callee_cpu_name,caller_render_dev,callee_render_dev,callee_pool,caller_pool,pool
1,2022-03-24T01:36:31.277+11:00,2022-03-24T01:37:31.277+11:00,1.2,10.0.12.150,10.0.12.123,Transmit (2- Plantronics DA45),Transmit (2- xyz),Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Remote Audio,Remote Audio,ABCPool2.asdf,XYZPool45.asdf,BATPOOL123.asdfas.com.au
#}

{%- time_now -%}
{%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%}
{% set end_time = time_target_epoch %}
{% set end_time_formatted = time_target_formatted %}

{%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%}
{% set start_time = time_target_epoch %}
{% set start_time_formatted = time_target_formatted %}
{% set duration = end_time - start_time %}

{% set callee_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "."  + range(1,254)|random|string  %}
{% set caller_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "."  + range(1,254)|random|string  %}
{% set callee_capture_dev = capture_device_list|random %}
{% set caller_capture_dev = capture_device_list|random %}
{% set callee_cpu_name = cpu_name_list|random %}
{% set caller_cpu_name = cpu_name_list|random %}
{% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}
{% set caller_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}
{% set pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}

{"_time":"{{ time_now_epoch }}", "_raw":"1,{{ start_time_formatted }}.000+{{ timezone_offset}}:00,{{ end_time_formatted }}.000+{{ timezone_offset}}:00,{{ (range(0, 50) | random)/10 }},{{ callee_ip_addr }},{{ caller_ip_addr }},{{ callee_capture_dev }},{{ caller_capture_dev }},{{ callee_cpu_name }},{{ caller_cpu_name }},Remote Audio,Remote Audio,{{ callee_pool }},{{ caller_pool }},{{ pool }}"}

Template decipher time ( I will skip some lines that perform a function similar to one previously described) :

{# ...#} – it’s just a comment and reminds me (or anyone looking at the template what’s the expected sample format

{%- time_now -%} – we are calling the time_now function to be able to use the time_now_epoch variable

{%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%}

  • we are calling the time_slice function to get a random timstamp for the call end_time that is within the time range of the current generation cycle
  • for some reason the timestamp returned didn’t respect the machine timezone, so I was getting events in the future

{% set end_time = time_target_epoch %} – create a new variable start_time and assign it the value that was returned by the time_slice function
{% set end_time_formatted = time_target_formatted %} – create a new variable end_time_formatted and assigned it the nicely formatted value from the time_slice function

{%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%} – run the time_slice function again to get a random start_time for a call

{% set callee_capture_dev = capture_device_list|random %} – assign a value to the callee_capture_dev variable that is randomly chosen from capture_dev_list that we have defined in the eventgen.conf

{% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%} – here we create a callee_pool variable and the value will be a concatenation of a (fixed string) “POOL00”, random integer (that needs to be converted to a string for concatenation purposes) from 0-5 , “.” and the corp_domain (that is was defined in the eventgen.conf).

{"_time":"{{ time_now_epoch }}", "_raw":" 1,{{ start_time_formatted..... – this line basically builds the actual event. We don’t need the _time for our purposes (dumping this samples to a file), it would is required in other output modes.

Resulting samples

1,2022-04-11T15:23:49.000+10:00,2022-04-11T15:25:02.000+10:00,0.3,192.168.17.72,10.0.137.210,Device Type 5,Device Type 4,CPU @ 1.60GHz,CPU @ 2.60GHz,Remote Audio,Remote Audio,POOL000.whatever.com,POOL002.whatever.com,POOL002.whatever.com
1,2022-04-11T15:24:05.000+10:00,2022-04-11T15:25:01.000+10:00,4.3,192.168.207.220,192.168.15.245,Device Type 4,Device Type 5,CPU @ 2.60GHz,CPU @ 5.60GHz,Remote Audio,Remote Audio,POOL004.whatever.com,POOL002.whatever.com,POOL001.whatever.com
1,2022-04-11T15:24:34.000+10:00,2022-04-11T15:25:01.000+10:00,1.5,192.168.101.237,192.168.217.187,Device Type 5,Device Type 2,CPU @ 3.60GHz,CPU @ 1.60GHz,Remote Audio,Remote Audio,POOL002.whatever.com,POOL003.whatever.com,POOL003.whatever.com

More posts about Splunk

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.