I’ve recently dipped my toes into Splunk Eventgen (Jinja templating). It’s an awesome app that allows you to generate sample events that can be ingested by Splunk (or for any other reason).
EventGen has two ways of configuring the event content generation:
- Traditional – where you specify a sample file and provide regexes that will be used to replace static content in the sample file with the required values
- Jinja Templating – where you use Jinja templating engine to create the events.
While the traditional way is quite straightforward, the event’s format that I was after had a few nuances that made it not suitable for me, thus I had to fiddle with Splunk Eventgen Jinja templating.
Requirements
Generate Skype For Business (Media Quality Summary) MOS data. This data is basically “call” records, so it will have:
- Call Start and End timestamps
- Participants (callee and caller) details (username/device used/ IP)
- Quality of the call
- The Skype for Business pool used
Most of the time I’ve spent on the timestamps issues: just randomly generated timestamps will not cut, as the Start timestamp should be before the End timestamp and I also wanted to make sure that I can define the randomness’s range, i.e. what could be the minimum and the maximum duration of the call.
Result
Eventgen can either be installed as a Splunk app or can be installed “outside” of Splunk Enterprise and be used as a command-line tool. In my case, I took the former approach.
Install and enable the Splunk Eventgen app
Download the app from Splunk Base.
Install it (either via Splunk UI or by extracting the Eventgen app into your <SPLINK_HOME>/etc/apps
folder.
Create a new inputs.conf
file, either under the <SPLUNK_HOME>/etc/apps/SA-Eventgen/local/
(that’s the default extracted folder name for the Eventgen app) or under <SPLUNK_HOME>/etc/apps/my_app>local/
(that’s my preference) with followin stanza to enable eventgen
[modinput_eventgen://default] disabled = false
Create your Eventgen configuration
Eventgen generators are defined in eventgen.conf
files, so let’s create one (as mentioned before I’ll be doing it under <SPLUNK_HOME>/etc/apps/my_app>local/
).
[conversations_mos_jinja.sample] earliest = -15s latest = now interval = 15 count = 10 outputMode = file fileName = /tmp/mos_jinja.log generator = jinja jinja_template_dir = templates jinja_target_template = mos_jinja.template jinja_variables = { \ "min_duration": 10, \ "max_duration": 180, \ "timezone_offset": 10, \ "corp_domain": "whatever.com", \ "capture_device_list": [ "Device Type 1","Device Type 2","Device Type 3","Device Type 4","Device Type 5" ], \ "cpu_name_list": ["CPU @ 1.60GHz","CPU @ 2.60GHz","CPU @ 3.60GHz","CPU @ 4.60GHz","CPU @ 5.60GHz"],\ "ip_range_list": ["10.0","192.168"]}
Let’s decipher all these lines:
stanza name
– which in Splunk Eventgen Jinja templating method is just a meaningful name (but if you were to use the Default approach, stanza name is actually the name of a sample file that is used to generate events).
earliest
– what can be earliest event’s timestamp
latest
– what can be the latest event’s timestamp
interval
– how ofter the generator runs
count
– how many events should be generated each run
outputMode
– I needed to write to a file on disk, but one can use other options to ingest directly to Splunk or send it HEC or a few other options.
fileName
– which file to write the results to, a few notes here:
- make sure that the user that is running Splunk service has permissions to write to this destination
- Eventgen has a built-in file rotation mechanism in place, so you don’t need to worry about that. If the default values of having up to 5 files of 10MB don’t work for you, these can be overwritten using
fileBackupFiles
andfileMaxBytes
respectively.
generator
– which generator to use, obviously (I mean that’s the main purpose of this whole post) we are using the jinja
one
jinja_template_dir
– path to the Jinja template folder relative to Eventgen’s sample
folder
jinja_target_template
– the name of the “root” Jinja template that will be used.
jinja_variables
– here we can pass variables that will be used by the Splunk Jinja templating engine
Craft your JInja template
I’ll dive into the template in a bit, but first:
Variables available for every sample
eventgen_count
– The current count
eventgen_maxcount
– The max count requested in the stanza
eventgen_earliest
– The earliest specified item in ISO8601
eventgen_earliest_epoch
– earliest converted to epoch time based on specified value and host time
eventgen_latest
– the latest specified time item in ISO8601
eventgen_latest_epoch
– latest time converted to epoch
Timestamps in Splunk Eventgen Jinja Templating are pain….
I think many will agree that time/timestamps/dates are always hard, no matter what programming language / tool you are using and this is no different in Splunk Eventgen Jinja templating.
Eventgen GitHub repo has a sample of how to work with time in Jinja templates, so that was my source of inspiration.
The 2 custom Jinja functions that are exposed and can be used in the Jinja templates are:
Function | Description | Parameters | Returns |
---|---|---|---|
time_now | Will tell the time module to find the current spot in time based on the current host’s machine time. | date_format | time_now_epoch and time_now_formatted |
time_slice | Used to divide up time based on earliest and latest. Let’s say I wanted to know, “If I gave you a start window and an end window, and wanted you to divide that time slot into set buckets of time. Given my 3rd event, give me back a time period that fits the expected slice.” | earliest – earliest time in epoch start slice time periodslatest – latest time in epoch to end slice time periodcount – Which slice to useslices – Total number of slices to divide time intodate_format – python date format you want the results be formatted in | time_now_epoch and time_now_formatted |
As you see, there is no function that allows you to provide an epoch timestamp and it will spit you a nicely formatted Date and Time. While I know Python enough to be able to extend the jinja generator to have this function, I’ve decided not to do it for the sake of easier implementation later (one will have to re-patch the Jinja generator every time when the Eventgen app is updated)
Also while you can specify date_format
in the above 2 functions, it is using out-of-the-box Python’s strftime function, which lacked some flexibility in my case (like having microseconds only and not milliseconds and not beeing able to have :
in the Timezone .
My timestamp whinging is done so let’s…
Create your Jinja template.
Based on content of eventgen.conf
mine was <SPLUNK_HOME>/etc/apps/my_app/samples/templates/mos_jinja.template
.
Here is how it looks inside
{# session_seq,start_time,end_time,conversational_mos,caller_ip_addr,callee_ip_addr,caller_capture_dev,callee_capture_dev,caller_cpu_name,callee_cpu_name,caller_render_dev,callee_render_dev,callee_pool,caller_pool,pool 1,2022-03-24T01:36:31.277+11:00,2022-03-24T01:37:31.277+11:00,1.2,10.0.12.150,10.0.12.123,Transmit (2- Plantronics DA45),Transmit (2- xyz),Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Remote Audio,Remote Audio,ABCPool2.asdf,XYZPool45.asdf,BATPOOL123.asdfas.com.au #} {%- time_now -%} {%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%} {% set end_time = time_target_epoch %} {% set end_time_formatted = time_target_formatted %} {%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%} {% set start_time = time_target_epoch %} {% set start_time_formatted = time_target_formatted %} {% set duration = end_time - start_time %} {% set callee_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "." + range(1,254)|random|string %} {% set caller_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "." + range(1,254)|random|string %} {% set callee_capture_dev = capture_device_list|random %} {% set caller_capture_dev = capture_device_list|random %} {% set callee_cpu_name = cpu_name_list|random %} {% set caller_cpu_name = cpu_name_list|random %} {% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%} {% set caller_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%} {% set pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%} {"_time":"{{ time_now_epoch }}", "_raw":"1,{{ start_time_formatted }}.000+{{ timezone_offset}}:00,{{ end_time_formatted }}.000+{{ timezone_offset}}:00,{{ (range(0, 50) | random)/10 }},{{ callee_ip_addr }},{{ caller_ip_addr }},{{ callee_capture_dev }},{{ caller_capture_dev }},{{ callee_cpu_name }},{{ caller_cpu_name }},Remote Audio,Remote Audio,{{ callee_pool }},{{ caller_pool }},{{ pool }}"}
Template decipher time ( I will skip some lines that perform a function similar to one previously described) :
{# ...#}
– it’s just a comment and reminds me (or anyone looking at the template what’s the expected sample format
{%- time_now -%}
– we are calling the time_now
function to be able to use the time_now_epoch
variable
{%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%}
–
- we are calling the
time_slice
function to get a random timstamp for the callend_time
that is within the time range of the current generation cycle - for some reason the timestamp returned didn’t respect the machine timezone, so I was getting events in the future
{% set end_time = time_target_epoch %}
– create a new variable start_time
and assign it the value that was returned by the time_slice
function{% set end_time_formatted = time_target_formatted %}
– create a new variable end_time_formatted
and assigned it the nicely formatted value from the time_slice
function
{%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%}
– run the time_slice
function again to get a random start_time
for a call
{% set callee_capture_dev = capture_device_list|random %}
– assign a value to the callee_capture_dev
variable that is randomly chosen from capture_dev_list
that we have defined in the eventgen.conf
{% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}
– here we create a callee_pool
variable and the value will be a concatenation of a (fixed string) “POOL00”, random integer (that needs to be converted to a string for concatenation purposes) from 0-5 , “.” and the corp_domain
(that is was defined in the eventgen.conf
).
{"_time":"{{ time_now_epoch }}", "_raw":" 1,{{ start_time_formatted.....
– this line basically builds the actual event. We don’t need the _time
for our purposes (dumping this samples to a file), it would is required in other output modes.
Resulting samples
1,2022-04-11T15:23:49.000+10:00,2022-04-11T15:25:02.000+10:00,0.3,192.168.17.72,10.0.137.210,Device Type 5,Device Type 4,CPU @ 1.60GHz,CPU @ 2.60GHz,Remote Audio,Remote Audio,POOL000.whatever.com,POOL002.whatever.com,POOL002.whatever.com 1,2022-04-11T15:24:05.000+10:00,2022-04-11T15:25:01.000+10:00,4.3,192.168.207.220,192.168.15.245,Device Type 4,Device Type 5,CPU @ 2.60GHz,CPU @ 5.60GHz,Remote Audio,Remote Audio,POOL004.whatever.com,POOL002.whatever.com,POOL001.whatever.com 1,2022-04-11T15:24:34.000+10:00,2022-04-11T15:25:01.000+10:00,1.5,192.168.101.237,192.168.217.187,Device Type 5,Device Type 2,CPU @ 3.60GHz,CPU @ 1.60GHz,Remote Audio,Remote Audio,POOL002.whatever.com,POOL003.whatever.com,POOL003.whatever.com