Splunk Archives - ISbyR

Predicting multiple metrics in Splunk

Ilya Reshetnikov — Thu, 15 Dec 2022 08:01:28 +0000

Splunk has a predict command that can be used to predict a future value of a metric based on historical values. This is not a Machine Learning or an Artificial Intelligence functionality, but a plain-old-statistical analysis.

So if we have a single metric, based on historical results we can produce a nice prediction for the future (of definable span), but predicting multiple metrics in Splunk might not be as straightforward.

For example, I want to see a prediction for a used_pct metric for the next 30 days based on max daily reading for the last 30 days.

Easy (for a single metric/dimension):

Just get the metric (field) you want to predict into a timechart function and add the predict command followed by the field name (of interest) and the future_timespan parameter

| mstats max("latest_capacity|used_space") as used_space, max("latest_capacity|total_capacity") as total_capacity WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename=bla_vsan01 span=1d BY sourcename 
| eval used_pct=round(used_space/total_capacity*100,2) 
| timechart max(used_pct) by sourcename useother=false span=1d
| predict bla_vsan01 future_timespan=30

The resulting graph has the historical (last 30 days) reading (the wiggly line) and the prediction straight line with an average prediction with a “fan” upper/lower 95th percentile predictions.

Splunk single metric prediction timechart

BUT! “we don’t know how many storage instances we will have! we can’t just hardcode all the fields in the predict command or create a panel/search for each!”

I know, I know.

Predicting multiple metrics

Here it becomes a bit tricky. Remember that you need to specify all the fields you want to predict as part of predict command | predict MY_FIELD1, MY_FIELD2 ....

The way to deal with it is the map command https://docs.splunk.com/Documentation/Splunk/9.0.1/SearchReference/Map.

What it does, is executes an enclosed parametrised search command using provided parameters.

| mstats max("latest_capacity|used_space") as used_space WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename != "*local*" span=1d BY sourcename 
| stats values(sourcename) as sourcename 
| mvexpand sourcename 
| map 
    [| mstats max("latest_capacity|used_space") as used_space, max("latest_capacity|total_capacity") as total_capacity WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename=$sourcename$ span=1d by sourcename 
    | eval used_pct=round(used_space/total_capacity*100,2) 
    | timechart max(used_pct) span=1d by sourcename useother=false limit=0 
    | predict $sourcename$ future_timespan=30 
        ] maxsearches=30

The first part of the SPL (|mstats until the | map) is used to prepare a list of sourcenames (storages) that will be passed instead of the $storename$ parameter.

The second part is the same SPL from before, where literal sourceneme was tokenised.

Here is the visualisation of the above SPL using the “Line Chart” option:

I personally find the same visualisation a bit more readable when using an “Area Chart”

The actual measurements are visualised using the shaded areas, while the predictions are drawn using lines

Some thoughts about putting it all in a dashboard

Save the search as a saved search that will run hourly or maybe even daily. Then update the SPL to load results from the saved search. Pass :: to the loadjob command

| loadjob savedsearch="admin:my_app:vmware_vrops_datastore usage prediction"

Add a dropdown selector to allow us to either see all storages or focus on single storage at a time and update the SPL with tokens

| fields + _time $tkn_datastore$, prediction($tkn_datastore$), lower95(prediction($tkn_datastore$)), upper95(prediction($tkn_datastore$))

add static “threshold” lines by adding below to the SPL

| eval WARN=70, ERROR=90

The resulting SPL:

| loadjob savedsearch="admin:def_sahara:vmware_vrops_datastore usage prediction"
| fields + _time $tkn_datastore$, prediction($tkn_datastore$), lower95(prediction($tkn_datastore$)), upper95(prediction($tkn_datastore$))
| eval WARN=70, ERROR=90

and maybe add an annotation “tick” to show when is “now”, which will indicate that everything before (left of it) was collected data and everything after (right of it) is a prediction. That is done by adding the following section to the XML of the panel.


  
| makeresults 
| bin _time span=1d 
| eval annotation_label = "Now" , annotation_color = "#808080"
| table _time annotation_label annotation_color
  
  -24h@h
  now

if you have too many storages the graph might become unreadable, so one of the options will be first to pre-calculate the predictions, and then chart only the metrics that are predicted to cross a threshold.

Another option will be to show all our “final” predictions in a table format (instead of the timechart)

| loadjob savedsearch="admin:my_app:vmware_vrops_datastore usage prediction" 
| fields - upper*, lower* 
| stats latest(*) as current_* 
| transpose 
| rename "column" AS m, "row 1" AS r 
| eval r = round(r,2) 
| rex field=m "(?:(current_prediction\((?.*)\))|(current_(?.*)))" 
| eval datastore_name=coalesce(datastore_name_predict, datastore_name_current), r_type = if(isnull(datastore_name_predict),"current","prediction") 
| stats values(r) as r by datastore_name, r_type 
| xyseries datastore_name, r_type, r 
| eval WARN=40, ERROR=60 
| eval status = if(prediction > ERROR,"ERROR",if(prediction > WARN, "WARN","OK")) 
| sort - prediction 
| fields - ERROR, WARN 
| table datastore_name, current, prediction, status 
| rename datastore_name AS "Datastore", current as "Last Value", prediction as "Predicted Value", status as "Predicted Status"

Splunk Failed to apply rollup policy to index… Summary span… cannot be cron scheduled

Ilya Reshetnikov — Wed, 14 Dec 2022 23:12:08 +0000

I started playing for Splunk Metrics rollups and but then tried to step out of the box and got a “Failed to apply rollup policy to index=’…’. Summary span=’7d’ cannot be cron scheduled” error.

While you can configure the rollup policies in the Splunk UI, you can also do it using Splunk’s REST API or using spunk configuration files (metric_rollups.conf).

Here is the interesting part: your rollup span value is limited only to the values available in the UI drop-down (no matter which method of configuration you are using).

The options that you can use are:

Minute Span	Hourly Span	Daily Span	Notes
1m*			requires `limits.conf` update
2m*			requires `limits.conf` update
3m*			requires `limits.conf` update
5m
6m
10m
12m
20m
30m
60m	1h
	1h
	3h
	4h
	6h
	8h
	12h
	24h	1d

Splunk rollup spans allowed for metrics rollups

If you will try to use any other value you will see the following error in the splunkd.log: 12-12-2022 14:04:08.645 +1100 ERROR MetricsRollupPolicy [12181677 TcpChannelThread] - Failed to apply rollup policy to index='...'. Summary span='7d' cannot be cron scheduled.

This is not (as of 15/12/2022) a documented “feature”.

How to collect StatsD metrics from rippled server using Splunk

Ilya Reshetnikov — Fri, 02 Dec 2022 13:31:01 +0000

The XRP Ledger (XRPL) is a decentralized, public blockchain and rippled server software (rippled in future references) powers the blockchain. rippled follows the peer-to-peer network, processes transactions, and maintains some ledger history.

rippled is capable of sending its telemetry data using StatsD protocol to 3rd party systems like Splunk.

Overview

Prerequisites – rippled and Splunk are installed

Configure Splunk – Configure indexes and Inputs in Splunk

Configure rippled – Configure rippled to send metrics

View rippled metrics in Splunk – enjoy the fruits of your labour

Prerequisites

Since we are talking about collecting data from the rippled servers into Splunk Enterprise (or simply “Splunk” in future references) it will make sense that we need these 2 pieces of software running.

As a rule of thumb, rippled and Splunk Enterprise should not be running on the same machine unless it’s a dev environment.

Follow the official guides from XRPL.org to install rippled server
Follow Splunk’s official docs to install Splunk Enterprise.
If you are using 2 separate machines to run rippled and Splunk, install Splunk Universal Forwarder (UF) on the rippled server machine

Make sure that there is network connectivity from UF to Splunk Enterprise over port 9997 (default port for Splunk-to-Splunk data flow) and 8089 (optional if you intend to use Splunk Deployment Server functionality, which I will not dive into in this post).

rippled to Splunk StatsD metrics flow

Configure Splunk

Splunk Enterprise

Create a new Metrics index, either in Splunk UI or in indexes.conf .

[xrpl_metrics]
coldPath = $SPLUNK_DB/xrpl_metrics/colddb
homePath = $SPLUNK_DB/xrpl_metrics/db
thawedPath = $SPLUNK_DB/xrpl_metrics/thaweddb
datatype = metric

If using UF to send data, enable TCPinput in Splunk UI or in inputs.conf

[splunktcp://9997]

Spunk UF

I’m using UF in my deployment, as it will not only collect telemetry data, but also the rippled logs.

Configure inputs.conf to listen to StatsD feed that will be coming out of rippled

[udp://6025]
connection_host = none
index = xrpl_metrics
sourcetype = statsd

Configure outputs.conf for UF to send data to Splunk (if that is not already configured)

[tcpout]
defaultGroup = primary_indexers
forceTimebasedAutoLB = true
forwardedindex.2.whitelist = (_audit|_introspection|_internal)

[tcpout:primary_indexers]
server = splunk_entterprise_server:9997

Configure rippled

Add the following stanza to your rippled.cfg (located by default in /opt/rippled/etc folder)

[insight]
server=statsd
address=127.0.0.1:6025
prefix=rippled

Restart rippled server [sudo] systemctl restart rippled.service

View rippled metrics in Splunk

If you will use the index name as in this post, your rippled StatsD metrics will be visible in the xrpl_metrics index. You can preview them:

| mpreview where index=xrpl_metrics

Preview of rippled StatsD metrics in Splunk

You will notice that metric_name also includes rippled as we have provided it as a suffix in the rippled.cfg

One of the interesting metrics is the rippled.State_Accounting.*_duration which shows you how long the rippled server was in a certain state.

What is convenient is that only the metric that reflects the current state is reported in any given time (in StatsD).

So you can easily create a dashboard panel that shows you which state was your rippled server over a period of time.

| mstats count("rippled.State_Accounting.*_duration") as *  WHERE "index"="xrpl_metrics" span=1m

rippled state over time in Splunk

The post How to collect StatsD metrics from rippled server using Splunk appeared first on ISbyR.

Plotting Splunk with the same metric and dimension names shows NULL

Ilya Reshetnikov — Wed, 05 Oct 2022 12:19:34 +0000

When you try plotting on a graph Splunk metric split by a dimension with the same name (as the metric itself) will show NULL instead of the dimension.

Splunk timechart visualisation with breakdown by dimension with the same metric and dimension names will show NULL

The Problem

Let’s rewind a little.

Below is the payload that is sent to Splunk HEC and you will notice that there are 2 “statuses”:

"status": "success" – which is one of the dimensions and it can represent a collector/monitor status
"metric_name:status": 0 – which is the actual metric value that was collected by the collector/monitor

{
    "time": 1664970920,
    "event": "metric",
    "host": "host_5.splunk.com",
    "index": "d_telegraf_metrics",
    "fields": {
        "collector": "collector_a",
        "status": "success",
        "metric_name:query_time_seconds": 10.869,
        "metric_name:status": 0
    }
}

In the perfect world where you would probably rename one of these not to confuse the end-user in Splunk, but that (living in a perfect world) is not always the case.

As a result, we end up with NULLs in the graphs

The Solution

Lucky for us Splunk’s search language (SPL) is very powerful and flexible and with two little modifications to the “original” SPL (that was produced by the Metrics Analyzer), we can solve the issue.

All you need to do is :

instead of prestats=true rename the metric function result using as command.
update the avg function in the timechart command to use the renamed field name.

Original SPL:

original Splunk SPL that was causing NULL in graphs

Revised SPL:

fixed Splunk SPL that shows the breakdown by dimension with the same name as the metric

The Result

Splunk fixed timechart visualisation

Splunk Eventgen Jinja templating

Ilya Reshetnikov — Wed, 13 Apr 2022 02:23:23 +0000

I’ve recently dipped my toes into Splunk Eventgen (Jinja templating). It’s an awesome app that allows you to generate sample events that can be ingested by Splunk (or for any other reason).

EventGen has two ways of configuring the event content generation:

Traditional – where you specify a sample file and provide regexes that will be used to replace static content in the sample file with the required values
Jinja Templating – where you use Jinja templating engine to create the events.

While the traditional way is quite straightforward, the event’s format that I was after had a few nuances that made it not suitable for me, thus I had to fiddle with Splunk Eventgen Jinja templating.

Requirements

Generate Skype For Business (Media Quality Summary) MOS data. This data is basically “call” records, so it will have:

Call Start and End timestamps
Participants (callee and caller) details (username/device used/ IP)
Quality of the call
The Skype for Business pool used

Most of the time I’ve spent on the timestamps issues: just randomly generated timestamps will not cut, as the Start timestamp should be before the End timestamp and I also wanted to make sure that I can define the randomness’s range, i.e. what could be the minimum and the maximum duration of the call.

Result

Eventgen can either be installed as a Splunk app or can be installed “outside” of Splunk Enterprise and be used as a command-line tool. In my case, I took the former approach.

Install and enable the Splunk Eventgen app

Download the app from Splunk Base.

Install it (either via Splunk UI or by extracting the Eventgen app into your /etc/apps folder.

Create a new inputs.conf file, either under the /etc/apps/SA-Eventgen/local/ (that’s the default extracted folder name for the Eventgen app) or under /etc/apps/my_app>local/ (that’s my preference) with followin stanza to enable eventgen

[modinput_eventgen://default]
disabled = false

Create your Eventgen configuration

Eventgen generators are defined in eventgen.conf files, so let’s create one (as mentioned before I’ll be doing it under /etc/apps/my_app>local/).

[conversations_mos_jinja.sample]
earliest = -15s
latest = now
interval = 15
count = 10

outputMode = file
fileName = /tmp/mos_jinja.log

generator = jinja
jinja_template_dir = templates
jinja_target_template = mos_jinja.template
jinja_variables = { \
    "min_duration": 10, \
    "max_duration": 180, \
    "timezone_offset": 10, \
    "corp_domain": "whatever.com", \
    "capture_device_list": [ "Device Type 1","Device Type 2","Device Type 3","Device Type 4","Device Type 5" ], \
    "cpu_name_list": ["CPU @ 1.60GHz","CPU @ 2.60GHz","CPU @ 3.60GHz","CPU @ 4.60GHz","CPU @ 5.60GHz"],\
    "ip_range_list": ["10.0","192.168"]}

Let’s decipher all these lines:

stanza name – which in Splunk Eventgen Jinja templating method is just a meaningful name (but if you were to use the Default approach, stanza name is actually the name of a sample file that is used to generate events).

earliest – what can be earliest event’s timestamp

latest – what can be the latest event’s timestamp

interval – how ofter the generator runs

count – how many events should be generated each run

outputMode – I needed to write to a file on disk, but one can use other options to ingest directly to Splunk or send it HEC or a few other options.

fileName – which file to write the results to, a few notes here:

make sure that the user that is running Splunk service has permissions to write to this destination
Eventgen has a built-in file rotation mechanism in place, so you don’t need to worry about that. If the default values of having up to 5 files of 10MB don’t work for you, these can be overwritten using fileBackupFiles and fileMaxBytes respectively.

generator – which generator to use, obviously (I mean that’s the main purpose of this whole post) we are using the jinja one

jinja_template_dir – path to the Jinja template folder relative to Eventgen’s sample folder

jinja_target_template – the name of the “root” Jinja template that will be used.

jinja_variables – here we can pass variables that will be used by the Splunk Jinja templating engine

Craft your JInja template

I’ll dive into the template in a bit, but first:

Variables available for every sample

eventgen_count – The current count

eventgen_maxcount – The max count requested in the stanza

eventgen_earliest – The earliest specified item in ISO8601

eventgen_earliest_epoch – earliest converted to epoch time based on specified value and host time

eventgen_latest – the latest specified time item in ISO8601

eventgen_latest_epoch – latest time converted to epoch

Timestamps in Splunk Eventgen Jinja Templating are pain….

I think many will agree that time/timestamps/dates are always hard, no matter what programming language / tool you are using and this is no different in Splunk Eventgen Jinja templating.

Eventgen GitHub repo has a sample of how to work with time in Jinja templates, so that was my source of inspiration.

The 2 custom Jinja functions that are exposed and can be used in the Jinja templates are:

Function	Description	Parameters	Returns
`time_now`	Will tell the time module to find the current spot in time based on the current host’s machine time.	`date_format`	`time_now_epoch` and `time_now_formatted`
time_slice	Used to divide up time based on earliest and latest. Let’s say I wanted to know, “If I gave you a start window and an end window, and wanted you to divide that time slot into set buckets of time. Given my 3rd event, give me back a time period that fits the expected slice.”	`earliest` – earliest time in epoch start slice time periods `latest` – latest time in epoch to end slice time period `count` – Which slice to use `slices` – Total number of slices to divide time into date_format – python date format you want the results be formatted in	`time_now_epoch` and `time_now_formatted`

Eventgen Jinja time functions

As you see, there is no function that allows you to provide an epoch timestamp and it will spit you a nicely formatted Date and Time. While I know Python enough to be able to extend the jinja generator to have this function, I’ve decided not to do it for the sake of easier implementation later (one will have to re-patch the Jinja generator every time when the Eventgen app is updated)

Also while you can specify date_format in the above 2 functions, it is using out-of-the-box Python’s strftime function, which lacked some flexibility in my case (like having microseconds only and not milliseconds and not beeing able to have : in the Timezone .

My timestamp whinging is done so let’s…

Create your Jinja template.

Based on content of eventgen.conf mine was /etc/apps/my_app/samples/templates/mos_jinja.template.

Here is how it looks inside

{# 
session_seq,start_time,end_time,conversational_mos,caller_ip_addr,callee_ip_addr,caller_capture_dev,callee_capture_dev,caller_cpu_name,callee_cpu_name,caller_render_dev,callee_render_dev,callee_pool,caller_pool,pool
1,2022-03-24T01:36:31.277+11:00,2022-03-24T01:37:31.277+11:00,1.2,10.0.12.150,10.0.12.123,Transmit (2- Plantronics DA45),Transmit (2- xyz),Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz,Remote Audio,Remote Audio,ABCPool2.asdf,XYZPool45.asdf,BATPOOL123.asdfas.com.au
#}

{%- time_now -%}
{%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%}
{% set end_time = time_target_epoch %}
{% set end_time_formatted = time_target_formatted %}

{%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%}
{% set start_time = time_target_epoch %}
{% set start_time_formatted = time_target_formatted %}
{% set duration = end_time - start_time %}

{% set callee_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "."  + range(1,254)|random|string  %}
{% set caller_ip_addr = ip_range_list|random + "." + range(1,254)|random|string + "."  + range(1,254)|random|string  %}
{% set callee_capture_dev = capture_device_list|random %}
{% set caller_capture_dev = capture_device_list|random %}
{% set callee_cpu_name = cpu_name_list|random %}
{% set caller_cpu_name = cpu_name_list|random %}
{% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}
{% set caller_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}
{% set pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%}

{"_time":"{{ time_now_epoch }}", "_raw":"1,{{ start_time_formatted }}.000+{{ timezone_offset}}:00,{{ end_time_formatted }}.000+{{ timezone_offset}}:00,{{ (range(0, 50) | random)/10 }},{{ callee_ip_addr }},{{ caller_ip_addr }},{{ callee_capture_dev }},{{ caller_capture_dev }},{{ callee_cpu_name }},{{ caller_cpu_name }},Remote Audio,Remote Audio,{{ callee_pool }},{{ caller_pool }},{{ pool }}"}

Template decipher time ( I will skip some lines that perform a function similar to one previously described) :

{# ...#} – it’s just a comment and reminds me (or anyone looking at the template what’s the expected sample format

{%- time_now -%} – we are calling the time_now function to be able to use the time_now_epoch variable

{%- time_slice earliest=eventgen_earliest_epoch-(timezone_offset*3600), latest=eventgen_latest_epoch-(timezone_offset*3600), count=(range(min_duration, max_duration) | random) , slices=max_duration -%} –

we are calling the time_slice function to get a random timstamp for the call end_time that is within the time range of the current generation cycle
for some reason the timestamp returned didn’t respect the machine timezone, so I was getting events in the future

{% set end_time = time_target_epoch %} – create a new variable start_time and assign it the value that was returned by the time_slice function
{% set end_time_formatted = time_target_formatted %} – create a new variable end_time_formatted and assigned it the nicely formatted value from the time_slice function

{%- time_slice earliest=end_time-max_duration, latest=end_time, count=(range(min_duration, max_duration) | random), slices=max_duration -%} – run the time_slice function again to get a random start_time for a call

{% set callee_capture_dev = capture_device_list|random %} – assign a value to the callee_capture_dev variable that is randomly chosen from capture_dev_list that we have defined in the eventgen.conf

{% set callee_pool = "POOL00" + range(0,5)|random|string + "." + corp_domain%} – here we create a callee_pool variable and the value will be a concatenation of a (fixed string) “POOL00”, random integer (that needs to be converted to a string for concatenation purposes) from 0-5 , “.” and the corp_domain (that is was defined in the eventgen.conf).

{"_time":"{{ time_now_epoch }}", "_raw":" 1,{{ start_time_formatted..... – this line basically builds the actual event. We don’t need the _time for our purposes (dumping this samples to a file), it would is required in other output modes.

Resulting samples

1,2022-04-11T15:23:49.000+10:00,2022-04-11T15:25:02.000+10:00,0.3,192.168.17.72,10.0.137.210,Device Type 5,Device Type 4,CPU @ 1.60GHz,CPU @ 2.60GHz,Remote Audio,Remote Audio,POOL000.whatever.com,POOL002.whatever.com,POOL002.whatever.com
1,2022-04-11T15:24:05.000+10:00,2022-04-11T15:25:01.000+10:00,4.3,192.168.207.220,192.168.15.245,Device Type 4,Device Type 5,CPU @ 2.60GHz,CPU @ 5.60GHz,Remote Audio,Remote Audio,POOL004.whatever.com,POOL002.whatever.com,POOL001.whatever.com
1,2022-04-11T15:24:34.000+10:00,2022-04-11T15:25:01.000+10:00,1.5,192.168.101.237,192.168.217.187,Device Type 5,Device Type 2,CPU @ 3.60GHz,CPU @ 1.60GHz,Remote Audio,Remote Audio,POOL002.whatever.com,POOL003.whatever.com,POOL003.whatever.com

How to Register to Splunk Partner Portal and transfer Certifications and Learning

Ilya Reshetnikov — Tue, 25 Jan 2022 20:33:13 +0000

Intro

This document will describe how to register to Splunk Partner Portal and transfer Certifications and Learning from your old email to a new one.

Register

Go to https://partners.splunk.com/ and click “Create your free account today”

When creating an new account use your @PARNER.DOMAIN email address, you will also have to use a new username

After registration and email verification make sure you can login (using your Partner email/ new username) to

Portal	URL	Example
Partners Portal	https://partners.splunk.com/https://partners.splunk.com/English/MSP/profile/account_profile.aspx
Public Education Portal	https://education.splunk.com/user/learning/enrollments
Partner Education	LEARNING > PARTNER LEARNING CENTER https://partnereducation.splunk.com/ If you are getting an error raise a non-technical Partner+ Case from within the Partners portal (https://partners.splunk.com/English/MSP/sfdc_support/open_non_technical_support_case.aspx)

Transferring you Certificates

Send an email to education_apac@splunk.com and certification@splunk.com (you can send one email to both of them, anyway it will create 2 separate support cases)

Dear Education and Certification teams,
Our company recently became Splunk Partner while all my
certifications and trainings history is under customer’s/personal email.

Can you please transfer my completed certifications and trainings from MY.NAME@OLD.DOMAIN (Customer/Personal email) to MYNAME@NEW.DOMAIN (Partner email
email)
Please note that I have already created the new Partner account and can login to Education/Certification portal.
Regards,
ME

As mentioned this will create 2 support cases:

Education
Certification

Case	What will it do	After complete, can be verified here	Example
Education	Will transfer your accomplished trainings,	https://education.splunk.com/user/learning/achievement
Certification	Will transfer your existing certificates into your partner’s profile.	https://partners.splunk.com/English/MSP/certification/my_certifications.aspx
Certification	Will create a new Splunk ID for PearsonVUE	You will receive an email from Splunk with your new Splunk ID You will need to create a new account in PearsonVUE using the new Splunk ID here https://home.pearsonvue.com/splunk After registration is complete you can see all the exams you are entitled to take here: https://wsr.pearsonvue.com/testtaker/registration/Dashboard/SPLUNK
Certification	Will transfer exam records to the new Splunk ID in PearsonVUE	Can take 5-10 days and can be verified here https://wsr.pearsonvue.com/testtaker/registration/ExamRegistrationHistory/SPLUNK
Education	Registering to Courses with Splunk Partners Discount	Go to https://www.splunk.com/en_us/training.html Login with you new Partner email. Select a course and date and proceed to checkout You should see the discount added at the “View Card” step (as per the screenshot here). If you don’t email education_apac@splunk.com describing the issue (Partners; discount not been applied) and attach a screenshot of the cart

Splunk Connect for Kafka

Ilya Reshetnikov — Fri, 28 May 2021 13:44:28 +0000

My journey with Splunk Connect for Kafka.

Splunk Connect for Kafka (aka SC4K) allows to collect events from Kafka platform and send them to Splunk. While the sending part (to Splunk) was pretty straight forward to me, the collection part (from Kafka) was very new, as I’ve had no experience with Kafka eco-system. So I guess will start with it.

This will not be a comprehensive guide about Kafka system, how to run massive Kafka clusters nor I will be covering all the possible configuration options for SC4K, but rather I will (hopefully) give you enough information (and jargon) about Kafka Connect to be able to talk to Kafka admins in your organisation and/or, as it was in my case, to run a distributed Splunk Connect for Kafka cluster yourself. As I am learning this myself, some (many) things might be obvious to others, some might be wrong and I will be updating this post as I progress with my journey, so do expect some sections to be empty (or have a “TBC” placeholder)

Kafka and Kafka Connect eco-system

As per the official project site: “Apache Kafka is an open-source distributed event streaming platform“.

Kafka is run as a cluster of one or more servers. Some of these servers form the storage layer, called the brokers. ZooKeeper servers are primarily used to track the status of nodes in the Kafka cluster and maintain a list of Kafka topics and messages. Other servers run Kafka Connect to continuously import and export data as event streams.

When you read or write data from/to Kafka, you do this in the form of events.Events are send to and consumed from topics. Events are organised and durably stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.

Topics are partitioned, meaning a topic is spread over a number of “buckets” located on different Kafka brokers.

A good explanation about Kafka Connect can be found here

Connector: is a job that manages and coordinates the tasks. It decides how to split the data-copying work between the tasks.
Task: is piece of work that provides service to accomplish actual job.
Worker: is the node that is running the connector and its tasks.
Transforms: optional in-flight manipulation of messages
Converters: handling serialization and deserialization of data

Kafka Connect Workers

Connectors divide the actual job into smaller pieces as tasks in order to have the ability to scalability and fault tolerance. The state of the tasks is stored in special Kafka topics, and it is configured with offset.storage.topic, config.storage.topic and status.storage.topic . As the task does not keep its state it can be started, stopped and restarted at any time or nodes.

Confusingly, in the documentation, Connector can refer to:

Connector Worker / Connector Cluster
Specific Sink/Source Connector binary (jar)
Instance of a Source/Sink Connector running in a Worker

One more thing, Confluent Platform is a vendor based distribution that includes the latest release of Kafka and additional tools and services that make it easier to build and manage an Event Streaming Platform.

Base Installation

As it was mentioned before, Kafka Connect is part of Kafka system and it is shipped with Kafka binaries. So download it, It’s your/organisational choice whether to go with “plain” Apache Kafka or get the Confluent Community/Enterprise Platform (more info here)

Confluent Platform Components

For the sake of simplicity let’s assume “vanilla” Apache Kafka from here

Download the latest SC4K archive from GitHub (you will only need the jar file).

Extract kafka and rename the folder to commit the version

$ tar -xzvf /tmp/kafka_2.13-2.8.0.tgz -C /opt/
$ mv /opt/kafka_2.13-2.8.0/ /opt/kafka/
$ cd /opt/kafka/

Create a plugins directory and copy (the downloaded) SC4K jar into it

mkdir /opt/kafka/plugins/
cp /tmp/splunk-kafka-connect-v2.0.2.jar /opt/kafka/plugins/

That’s it now we have all the binaries and we can start Splunk Connect for Kafka

Starting Kafka

Following will be required if you are starting you own Kafka cluster. If you are using an existing Kafka cluster you can skip it.

Start Zookeeper

$ bin/zookeeper-server-start.sh config/zookeeper.properties

Open a new terminal session and start Kafka Broker

$ bin/kafka-server-start.sh config/server.properties

Simple Worker Configuration

So now we are ready to start a our Kafka Connect. Assuming the most simple configuration, the only thing we will need to know is the Kafka Broker host and port, which, in case we we are using out-of-the-box from previous step, will be locahost:9092

Update your config/connect-distributed.properties

One can start updating properties in the file one by one. Or just copy the content of the file provided in the SC4K GitHub repository.

The “most” important thing is bootstrap.servers, if you are not running Kafka Brokers on the same host change it (from the default localhost:9092) accordingly .

When connecting to an existing cluster you will need to consider a few things about storage topics, encryption, etc

Let’s start the Kafka Connector Worker. Open a new session and run

$ bin/connect-distributed.sh config/connect-distributed.properties

if you want to have more then one Worker in your cluster, just repeat the steps described in this section

You can check the status of you worker by running

curl http://localhost:8083/

All the following configuration (on a connector running in a distributed mode) is done via its REST API, I am using curl here, but one can off course use any other client of his choice, like Postman

Splunk Connect for Kafka Connector Instance Configuration

So, we have a Kafka Connector cluster running, but it is sitting “idle” (just burning our CPU cycles) as it has no defined Connector Instances (I told you that “connector” term is quite ambiguous in Kafka eco-system) .

Lets say you have a topic named web_json where some application is pushing events and you want these to be indexed in web index with sc4k_web_json sourcetype,

You will need to know the URL of Splunk’s HEC endpoint (which I assume you already have running) and a HEC token to use (i.e. already configured on Splunk side).

Now let’s run this REST API call,

Going Above Base

Storage topics

SC4K (or any other Kafka Connector) in distributed mode relies of 3 topics for storing its configuration and progress, These are defined in the config/connect-distributed.properties

config.storage.topic – The name of the Kafka topic where connector configurations are stored
offset.storage.topic – The name of the Kafka topic where connector offsets are stored
status.storage.topic – The name of the Kafka topic where connector and task status are stored

If you are running the Broker(s) on a local (or other permissive environment), the Connector will create these topics for you with replication factor and number of partitions based on values provided in *.storage.replication.factor and *.storage.partitions respectively.

In a more restricted (read enterprise) environment Kafka cluster administrator might create these for you beforehand. If that is the case, the important thing will be to get the names for each topic that were created in the cluster and update config/connect-distributed.properties file with them, while also removing the *.storage.replication.factor and *.storage.partitions properties

Hardening

Let’s talk a bit about the certificates that are used in SC4K. Of course these will greatly depend on the environment configuration in which you will be running.

Here are the touch points that I had to care about:

Splunk HEC (server) certificate
Kafka Broker (server) certificate
SC4K (client) certificate

Splunk HEC (server) certificate

Splunk’s HEC endpoint was using HTTPs and had a certificate provisioned by the organisation root CA. So in order for SC4K to trust the HEC certificate we need to add the organisation root CA to SC4K’s truststore

Kafka Broker (server) certificate

Same story as with HEC – Broker endpoint is hardened with a certificate signed by organisation’s root CA,

SC4K (client) certificate

In this environment, access to topics was implemented using SSL with client certificates so SC4K had to be configured to use a client certificate to communicate with Kafka Broker.

Schema Registry

Avro Converters

References

Get Score Breakdown for Pearson VUE Exam

Ilya Reshetnikov — Fri, 28 May 2021 06:11:09 +0000

When having an exam with Pearson VUE, you would only get pass/faill result, but what if you want to know which section of the exam you have scored low and you want to brush up on the relevant skills? Here is how to get score breakdown for Pearson VUE exam.

It will show all the exams you’ve taken.

Click the one you want to get score breakdown for.

Now it will show you the Exam results letter which will tell you if you have passed or failed.

To get score breakdown for Pearson VUE exam open developer tools Option + ⌘ + J (on macOS), or Shift + CTRL + J (on Windows/Linux).

Click “Console” tab and type console.log(result) and hit Enter.

You will need to expand the exam and child reportingGroups sections.

And vuala you have you actual Score as well as the score breakdown for the Exam.

Pearson VUE exam results breakdown

Splunk Certification Tracks

Ilya Reshetnikov — Sun, 01 Nov 2020 02:51:20 +0000

So here is my understanding of the current Splunk Certification Tracks.

Of course you can go to the “source” https://www.splunk.com/en_us/training.html , but may be that visual representation will help someone

Splunk Certification Tracks

The dotted lines represent Recommended Prerequisites while solid lines are Mandatory Prerequisites for the different Splunk Certification Tracks

There is no meaning to the line colours by the way

Splunk – List REST API users and their IPs

Ilya Reshetnikov — Thu, 17 Oct 2019 05:40:04 +0000

Want to get a list REST API users and their IPs? Run this search index=_internal host IN(SH1,SH2,SH3) sourcetype=splunkd_access user != "-" clientip != "IP_of_SH1" clientip != "IP_of_SH2" clientip != “IP_of_SH3” NOT TERM(127.0.0.1) NOT TERM(splunk-system-user) | stats values(clientip) by user The limitation is if the users are going via a Load Balancer, you will see Load Balancer’s … Continue reading Splunk – List REST API users and their IPs →

The post Splunk – List REST API users and their IPs appeared first on ISbyR.

Splunk Archives - ISbyR

Predicting multiple metrics in Splunk

Easy (for a single metric/dimension):

Predicting multiple metrics

Some thoughts about putting it all in a dashboard

More posts about Splunk

Splunk Failed to apply rollup policy to index… Summary span… cannot be cron scheduled

How to collect StatsD metrics from rippled server using Splunk

Overview

Prerequisites

Configure Splunk

Splunk Enterprise

Spunk UF

Configure rippled

View rippled metrics in Splunk

Plotting Splunk with the same metric and dimension names shows NULL

The Problem

The Solution

The Result

More posts about Splunk

Splunk Eventgen Jinja templating

Requirements

Result

Install and enable the Splunk Eventgen app

Create your Eventgen configuration

Craft your JInja template

Variables available for every sample

Timestamps in Splunk Eventgen Jinja Templating are pain….

Create your Jinja template.

Resulting samples

More posts about Splunk

How to Register to Splunk Partner Portal and transfer Certifications and Learning

Intro

Register

Transferring you Certificates

Related posts about Splunk

Splunk Connect for Kafka

My journey with Splunk Connect for Kafka.

Kafka and Kafka Connect eco-system

Base Installation

Starting Kafka

Simple Worker Configuration

Splunk Connect for Kafka Connector Instance Configuration

Going Above Base

Storage topics

Hardening

Splunk HEC (server) certificate

Kafka Broker (server) certificate

SC4K (client) certificate

Schema Registry

Avro Converters

References

Related posts about Splunk

Get Score Breakdown for Pearson VUE Exam

Related posts about Splunk

Splunk Certification Tracks

Related posts about Splunk

Splunk – List REST API users and their IPs