LLM Archives - ISbyR https://isbyr.com/tag/llm/ Infrequent Smarts by Reshetnikov Tue, 24 Jun 2025 15:18:21 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.4 Add new LLM models to Splunk MLTK https://isbyr.com/add-new-llm-models-to-splunk-mltk/ https://isbyr.com/add-new-llm-models-to-splunk-mltk/#comments Tue, 24 Jun 2025 14:57:26 +0000 https://isbyr.com/?p=1234 Splunk MLTK 5.6.0+ allows you to configure LLM inference endpoints, but the list is somewhat limited. Below, I’ll explain how you can add new LLM models to Splunk MLTK. The Issue You can configure any of the pre-added models in the Splunk UI by going to the MLTK App and then hitting the “Connection Manager” … Continue reading Add new LLM models to Splunk MLTK

The post Add new LLM models to Splunk MLTK appeared first on ISbyR.

]]>
Splunk MLTK 5.6.0+ allows you to configure LLM inference endpoints, but the list is somewhat limited. Below, I’ll explain how you can add new LLM models to Splunk MLTK.

The Issue

You can configure any of the pre-added models in the Splunk UI by going to the MLTK App and then hitting the “Connection Manager” tab.

When you select a service, you can see a list of pre-defined models. These are already somewhat outdated, for example, for Gemin, you don’t have any of the 2.5 models.

So, “how do we add new LLM models to Splunk MLTK?” you might ask.

The Solution

Easy-ish…

A bit of background

This configuration is managed in a Splunk KV Store collection (named mltk_ai_commander_collection), and in essence, it’s a big JSON that has all the providers and the models.

For example, here is the snippet for the Gemini Service and the first of its models

        "Gemini": {
            "Endpoint": {
                "value": "https://generativelanguage.googleapis.com/v1beta/models",
                "type": "string",
                "required": false,
                "description": "The API endpoint for sending chat completion requests to Google's Gemini language model."
            },
            "Access Token": {
                "value": "",
                "type": "string",
                "required": true,
                "hidden": true,
                "description": "The authentication token required to access the Gemini API."
            },
            "Request Timeout": {
                "value": 200,
                "type": "int",
                "required": false,
                "description": "The maximum duration (in seconds) before a request to the Gemini API times out."
            },
            "is_saved": {
                "value": true,
                "type": "boolean",
                "required": false,
                "description": "Is Provider details stored"
            },
            "models": {
                "gemini-pro": {
                    "Response Variability": {
                        "value": 0,
                        "type": "int",
                        "required": true,
                        "description": "Adjusts the response's randomness, impacting how varied or deterministic responses are."
                    },
                    "Maximum Result Rows": {
                        "value": 10,
                        "type": "int",
                        "required": false,
                        "description": "The maximum number of result entries to retrieve in a response."
                    },
                    "Max Tokens": {
                        "value": 2000,
                        "type": "int",
                        "required": false,
                        "description": "The limit on the number of tokens that can be generated in a response."
                    },
                    "Set as default": {
                        "value": false,
                        "type": "boolean",
                        "required": false
                    }
                },

So if we want to add a new model, all we need to do is add another element to the models array.

While there is a Loolup Editor app, it will only help you (to edit KV store collections) if there is a lookup configured for it. Which is not the case for the mltk_ai_commander_collection one.

High-level steps

Another way (and the one we will take) is to use Splunk REST API, and at a high level, it consists of the following steps:

  1. Get the current configuration (and the _key of the collection item) in a JSON format
  2. Update in a text editor the JSON payload
  3. Update the KV collection with the new JSON

Detailed steps

I will provide examples using Postman, but you can use curl or any other method of your choice for interacting with the REST API.

Get the current configuration

Run a GET call to the collection/data endpoint

The actual URL is https://localhost:8089/servicesNS/nobody/Splunk_ML_Toolkit/storage/collections/data/mltk_ai_commander_collection

Copy the results and take a note of the _key at the end of the JSON.

Update the JSON

Paste the JSON in a text editor of your choice.

Go to the Provider for which you want to add a new Model (Gemini) in our case,

Duplicate the model object inside the Service object and change the model name.

For example, here I copied/pasted the gemini-2.0-flash to the end of the Gemini service object and renamed it to be gemini-2.0-flash.

NOTE: You must ensure that the model name you provide here is exactly the same as it would appear when calling the inference API for the LLM Service.

For example, for Gemini

Update the KV collection

Now we need to update the collection with the updated JSON payload.

Send a POST request to the collection/data endpoint

  • replace the _key part of the URL with the value that you have in your JSON
  • remove the square brackets ([]) that surround the JSON

The actual URL is something like that: https://localhost:8089/servicesNS/nobody/Splunk_ML_Toolkit/storage/collections/data/mltk_ai_commander_collection/68540d2d0d2a214efd0d3b61.

Now, refresh the Connection Management page and enjoy a fresh new model at your disposal

Simply use the new model in the | ai command

And here is a sneak peek into an LLM Telemetry dashboard I’m working on

I hope that helped you to understand how to add new LLM models to Splunk MLTK.

The post Add new LLM models to Splunk MLTK appeared first on ISbyR.

]]>
https://isbyr.com/add-new-llm-models-to-splunk-mltk/feed/ 2
Learning about RAG and Vector Databases https://isbyr.com/learning-about-rag-and-vector-databases/ https://isbyr.com/learning-about-rag-and-vector-databases/#respond Thu, 21 Mar 2024 14:19:25 +0000 https://isbyr.com/?p=1174 I am learning about different concepts and architectures used in the LLM/AI space and one of them is Retrieval-Augmented Generation. As always I prefer learning concepts by tinkering with them and here is my first attempt at learning about RAG and Vector Databases. A bit of the terminology, I will not dive too deep here, … Continue reading Learning about RAG and Vector Databases

The post Learning about RAG and Vector Databases appeared first on ISbyR.

]]>
I am learning about different concepts and architectures used in the LLM/AI space and one of them is Retrieval-Augmented Generation. As always I prefer learning concepts by tinkering with them and here is my first attempt at learning about RAG and Vector Databases.

A bit of the terminology,

I will not dive too deep here, but just enough to get started. The definitions below are my simplified understanding, and they are most likely not fully correct.

What is RAG

There are many places where you can learn about RAG, but for the context of this post, I’d say that RAG allows you to supplement the initial prompt for the LLM with a bit more (or a lot more, that’s up to you) context.

What is a Vector Database?

Vector Database is one of the mechanisms/data stores that will enable you to provide this additional context to the LLM. Unline “regular” databases, vector database doesn’t necessarily store the actual data (though it can), but it will store the embedding of the data you later wish to search to retrieve the above-mentioned context.

What are embeddings?

Embeddings are multi-dimensional numerical representations of a piece of data (text for example), The multi-dimensionality allows to “place” semantically similar terms close to each other. For example, if using semantic search we search for “dog” then “puppy” and “mutt” and mutt will be considered close terms, while if using a lexical search (one that looks at the literal text similarity), will probably consider “dogma” and “hot dog” as closer terms.

The ITSM Assistant

The problem

Now let’s say you want to open a ticket in your ITSM ticketing system that your Internet Access is not working properly. You could start by searching for a particular request type or a knowledge base article, but what if you are not a technically savvy person and all you care about is that you can’t get to Facebook?

The Solution

ITSM Assistant to the rescue!!! It’s a chat interface that will:

  1. ask a user about the issue they are currently facing
  2. will look in the vector database for semantically similar historical requests, and get their IDs
  3. get the content of the tickets from the data store (simple CSV file int in this case)
  4. will feed this context to an LLM
  5. provide a user with the suggested request form and some of the fields that should be populated

As you can see in the screenshot below the user didn’t mention that they have problems with “internet access”, but just said, “I can’t get to Facebook”. Despite that ITSM Assistant was able to pull data that is related semantically to the user’s issue. LLM (after being fed all the context) suggested the correct Service Request type and some of the information the user should add to the ticket for it to be promptly resolved.

How it works under the hood?

ITSM Assistant using RAG and Vector Databases Solution Diagram

The components

  • Pinecone – vector database
  • Streamlit – “…turns data scripts into shareable web apps in minutes.”, both front and back-end all in Python.
  • Stramlit Community Cloud – for hosting the Stremalit app
  • AzureOpenAI – the LLM
  • all-MiniLM-L6-v2 – “…a sentence-transformers model: It maps sentences & paragraphs to a 384-dimensional dense vector space”

Step 0 – Load the data into Vector Database

I found an ITSM ticket dump on the internet.

Next, we need to get embedding for each ticket and insert it into the vector database (Pinecone in my case).

I had a Jupiter notebook that was doing this job.

# Importing the necessary libraries
import pandas as pd

# Importing the csv file
data = pd.read_csv('GMSCRFDump.csv', encoding = 'ISO-8859-1')

# removing duplicate tickets
ID_mins = data.groupby(['Title', 'Description', "CallClosure Description"]).ID.transform("min")
data_n = data.loc[data.ID == ID_mins]

# create a new array with a field that has both title and description each ticket
title_description = data_n['Title'] + " __ " + data_n['Description']
# create an arraid of ticket IDs
tid = data_n['ID']

# import a transformer that will be used to encode the ticket data
from sentence_transformers import SentenceTransformer
import torch

# Define the model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

# Setup Pinecone connection
import os
os.environ['PINECONE_API_KEY'] = '1b5da094-f784-4beb-8fc3-262712a667ae'
os.environ['PINECONE_ENVIRONMENT'] = 'gcp-starter'

from pinecone import Pinecone, PodSpec

# get api key from app.pinecone.io
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'
# find your environment next to the api key in pinecone console
env = os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENVIRONMENT'
pinecone = Pinecone(api_key=api_key)

# Create index

index_name = 'snow-data'
# only create index if it doesn't exist
if index_name not in pinecone.list_indexes().names():
    pinecone.create_index(
        name=index_name,
        dimension=model.get_sentence_embedding_dimension(),
        metric='cosine',
        spec=PodSpec(
            environment=env, 
            pod_type='s1.x1'
        )
    )

# now connect to the index
index = pinecone.Index(index_name)

# the following section, takes a batch of tickets, for each one of them makes and embeeding, "attaches" id and title+description as meta data , and upserts that into Pinecone index

from tqdm.auto import tqdm

batch_size = 120
vector_limit = 12000

title_description = title_description[:vector_limit]
title_description

for i in tqdm(range(0, len(title_description), batch_size)):
    # find end of batch
    i_end = min(i+batch_size, len(title_description))
    # create IDs batch
    ids = [str(x) for x in range(i, i_end)]
    # create metadata batch
    metadatas = [{'tid': t_id, 'text': t_desc} for t_id, t_desc in list(zip(tid,title_description))[i:i_end]]
    print(metadatas)
    # create embeddings
    xc = model.encode([t_desc for t_desc in title_description[i:i_end]])
    # create records list for upsert
    records = zip(ids, xc, metadatas)
    # upsert to Pinecone
    index.upsert(vectors=records)

Step 1 – The Streamlit App

Streamlist is a straightforward Python framework that allows you to build (simple) web apps. All without any HTML, JavaScript and CSS knowledge. You can run it locally or host it somewhere, for example using their Community Cloud.

You can find the code for the app in the ITSM Assistant repo here. So I’ll not provide much code from now on, but instead, talk to write about my any caveats.

To try it at home one will need to create secrets.toml file under the .streamlit folder and populate it with your Azure OpenAI and Pinecone credentials/configuration

AZURE_OPENAI_API_KEY = "xxxxxxxxxxxxx"
AZURE_OPENAI_ENDPOINT = "https://xxxxxxxxx.openai.azure.com/"

PINECONE_API_KEY = "xxx-xxx-xxx-xx"
PINECONE_INDEX = "snow-data"

Steps 2 and 3 – Searching for Historical Tickets

One caveat is depending on the amount of data one can decide to upsert into the vector db (in addition to the embeddings themselves) not only ticket ID (as metadata), but all the ticket fields (like Description, Resolution, etc.). This way your semantic search can return all the data you need and there is no need to have Step 4 (retrieval of data from data store)

For the sake of learning, I did not, so after we get ticket IDs from Pinecone, we use it to filter the data in the data store (fancy name for CSV) to get the ticket information that needs to be sent as context to the LLM.

Step 4 – Ask LLM for help

Now that we have the context (similar ticket data) we can send the request to LLM to help our struggling user and point them in the right direction.

Step 5 – Response to User

The only thing worth mentioning here is I had a bit of a struggle in printing the list of fields nicely.

LLM is coming back with a JSON response similar to below:

{
  "common_theme": "Server Reboot",
  "title": "Server Reboot Request",
  "suggested_fields": "SSO ID, Name, Email, Seat Number, Location, Cell Phone No"
}

Streamlit can use markdown for output, so to format the list of fields nicely I had to do something like this:

suggested_fields = llm_response['suggested_fields'].split(', ')
suggested_fields = "- " + "\n- ".join(suggested_fields)
 
nl = "  \n"
st.chat_message("ai").markdown(f"It looks that in order to help you, you will need to raise a new **\"{llm_response['title']}\"**.{nl}\
When raising this request please provide some of the required information like:{nl}{suggested_fields}")

P.S.

You can find the app here: https://app-itsm-assistant.streamlit.app/

More posts related to my AI journey:

The post Learning about RAG and Vector Databases appeared first on ISbyR.

]]>
https://isbyr.com/learning-about-rag-and-vector-databases/feed/ 0