Predicting multiple metrics in Splunk

Splunk has a predict command that can be used to predict a future value of a metric based on historical values. This is not a Machine Learning or an Artificial Intelligence functionality, but a plain-old-statistical analysis.

So if we have a single metric, based on historical results we can produce a nice prediction for the future (of definable span), but predicting multiple metrics in Splunk might not be as straightforward.

For example, I want to see a prediction for a used_pct metric for the next 30 days based on max daily reading for the last 30 days.

Easy (for a single metric/dimension):

Just get the metric (field) you want to predict into a timechart function and add the predict command followed by the field name (of interest) and the future_timespan parameter

| mstats max("latest_capacity|used_space") as used_space, max("latest_capacity|total_capacity") as total_capacity WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename=bla_vsan01 span=1d BY sourcename 
| eval used_pct=round(used_space/total_capacity*100,2) 
| timechart max(used_pct) by sourcename useother=false span=1d
| predict bla_vsan01 future_timespan=30

The resulting graph has the historical (last 30 days) reading (the wiggly line) and the prediction straight line with an average prediction with a “fan” upper/lower 95th percentile predictions.

Splunk single metric prediction timechart
Splunk single metric prediction timechart

BUT! “we don’t know how many storage instances we will have! we can’t just hardcode all the fields in the predict command or create a panel/search for each!”

I know, I know.

Predicting multiple metrics

Here it becomes a bit tricky. Remember that you need to specify all the fields you want to predict as part of predict command | predict MY_FIELD1, MY_FIELD2 ....

The way to deal with it is the map command

What it does, is executes an enclosed parametrised search command using provided parameters.

| mstats max("latest_capacity|used_space") as used_space WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename != "*local*" span=1d BY sourcename 
| stats values(sourcename) as sourcename 
| mvexpand sourcename 
| map 
    [| mstats max("latest_capacity|used_space") as used_space, max("latest_capacity|total_capacity") as total_capacity WHERE "index"="telegraf_metrics" "name"="vmware_vrops_datastore" sourcename=$sourcename$ span=1d by sourcename 
    | eval used_pct=round(used_space/total_capacity*100,2) 
    | timechart max(used_pct) span=1d by sourcename useother=false limit=0 
    | predict $sourcename$ future_timespan=30 
        ] maxsearches=30

The first part of the SPL (|mstats until the | map) is used to prepare a list of sourcenames (storages) that will be passed instead of the $storename$ parameter.

The second part is the same SPL from before, where literal sourceneme was tokenised.

Here is the visualisation of the above SPL using the “Line Chart” option:

Splunk predict visualisation of a single metrics

I personally find the same visualisation a bit more readable when using an “Area Chart”

The actual measurements are visualised using the shaded areas, while the predictions are drawn using lines

Some thoughts about putting it all in a dashboard

  • Save the search as a saved search that will run hourly or maybe even daily. Then update the SPL to load results from the saved search. Pass <USER>:<APP>:<SAVED_SEARCH_NAME> to the loadjob command

| loadjob savedsearch="admin:my_app:vmware_vrops_datastore usage prediction"

  • Add a dropdown selector to allow us to either see all storages or focus on single storage at a time and update the SPL with tokens

| fields + _time $tkn_datastore$, prediction($tkn_datastore$), lower95(prediction($tkn_datastore$)), upper95(prediction($tkn_datastore$))

  • add static “threshold” lines by adding below to the SPL

| eval WARN=70, ERROR=90

The resulting SPL:

| loadjob savedsearch="admin:def_sahara:vmware_vrops_datastore usage prediction"
| fields + _time $tkn_datastore$, prediction($tkn_datastore$), lower95(prediction($tkn_datastore$)), upper95(prediction($tkn_datastore$))
| eval WARN=70, ERROR=90
  • and maybe add an annotation “tick” to show when is “now”, which will indicate that everything before (left of it) was collected data and everything after (right of it) is a prediction. That is done by adding the following <search type="annotation"> section to the XML of the panel.
<search type="annotation">
| makeresults 
| bin _time span=1d 
| eval annotation_label = "Now" , annotation_color = "#808080"
| table _time annotation_label annotation_color
Splunk annotation snippet

if you have too many storages the graph might become unreadable, so one of the options will be first to pre-calculate the predictions, and then chart only the metrics that are predicted to cross a threshold.

Another option will be to show all our “final” predictions in a table format (instead of the timechart)

| loadjob savedsearch="admin:my_app:vmware_vrops_datastore usage prediction" 
| fields - upper*, lower* 
| stats latest(*) as current_* 
| transpose 
| rename "column" AS m, "row 1" AS r 
| eval r = round(r,2) 
| rex field=m "(?:(current_prediction\((?<datastore_name_predict>.*)\))|(current_(?<datastore_name_current>.*)))" 
| eval datastore_name=coalesce(datastore_name_predict, datastore_name_current), r_type = if(isnull(datastore_name_predict),"current","prediction") 
| stats values(r) as r by datastore_name, r_type 
| xyseries datastore_name, r_type, r 
| eval WARN=40, ERROR=60 
| eval status = if(prediction > ERROR,"ERROR",if(prediction > WARN, "WARN","OK")) 
| sort - prediction 
| fields - ERROR, WARN 
| table datastore_name, current, prediction, status 
| rename datastore_name AS "Datastore", current as "Last Value", prediction as "Predicted Value", status as "Predicted Status"
Splunk predict visualisation of multiple metrics in a table format

More posts about Splunk

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.