Splunk Machine Learning
Splunk Machine Learning Toolkit (MLTK) is Splunks answer to machine learning. It assists in applying machine learning techniques and methods against data in the network.
The Splunk MLTK includes methods to analyse data include algorithms such as regression, anomaly and outlier detection. These are essential for understanding, modelling and detecting trends in your data not easily identifiable by observation.
Custom visualisation and example datasets to practice your Splunk Processing Language (SPL) commands. ¬ It also provides an assistant to create a model for analysing your data, which helps get you started faster with Machine Learning.
Two apps need to be installed to properly install and configure the Splunk MLTK.
These apps only need to be installed on your Search Head and you should use the normal app installation process you would follow for any other app.
Step 1 – Install the Python For Scientific Computing Add-on
The first application you need to install is the Python for Scientific Computing add-on. There are three versions, based on your OS, Windows 64-bit and Linux 32/64-bit.
The Python for Scientific Computing add-on allows Splunk to import python’s scientific, engineering and mathematical libraries for computing statistical tests and data exploration for use with custom Splunk commands.
Step 2 – Install the Machine Learning Toolkit App
This provides all of the dashboards, knowledge objects and custom search commands and can be downloaded here: https://splunkbase.splunk.com/app/2890/
After the app is installed, Splunk will need to be restarted. Once the restart is completed and you log back into Splunk, the Splunk Machine Learning Toolkit app will be visible on the app bar from the Splunk launcher page.
Use the Machine Learning Toolkit
After launching the app, you will be taken to the ‘showcase’ tab of the app by default, which lists the analytical capabilities provided with the app. It also showcases some examples to illustrate how to apply various algorithms to sample datasets.
Test the Machine Learning Toolkit Assistant
Each example in the MLTK demonstrates the analytical capabilities of the toolkit. Below is an example of looking at Forecast Monthly Sales in the Forecast Time Series Section. Click on the example to navigate to an Assistant page, that will allow you to use the forecasting as shown below.
This assistant allows us to select either ‘ARIMA’ or ‘Kalman Filter’, also known as the ‘Linear quadratic estimation’ algorithm, for forecasting and we will adjust the variables to populate for the algorithm. The assistant auto populates the search bar with a sample dataset of monthly sales and adds default values to the variables. The default values can be adjusted based on your forecasting requirements.
Once the variables have been populated click on “Forecast” to calculate the forecasted values.
The predicted outcome of the times series data is displayed in a ‘time chart’ visualisation.
The green highlighted part of the chart is the forecasted monthly sales based on the algorithm. The X-axis in the time series will represent the _time variable while the Y-axis is the field we want to forecast (In this example, Y-axis represents the sales). The MLTK assistant makes it convenient for the Splunk user to executing algorithms for forecasting without complicated mathematical calculations.
To view the Splunk Query executed in the background in in this model, we click on ‘Open in Search’ displaying the commands used in this visualization. The search that powers this visual is:
| inputlookup souvenir_sales.csv
| eval _time=strptime(Month, "%Y-%m-%d")
| timechart span=1mon values(sales) as sales
| predict "sales" as prediction algorithm="LLP" future_timespan="24" holdback="0" lower"95"=lower"95" upper"95"=upper"95"
| `forecastviz(24, 0, "sales", 95)`
Machine Learning Search Commands
The Splunk Machine Learning Toolkit contains custom commands, referred to as Machine Learning- Search Processing Language (ML-SPL) that can be utilized to implement statistical modelling. These commands are as follows:
Load Existing Settings
This is a useful feature that stores your history for modelling data. When working on your model, the ‘Load Existing Settings’ saves a history of your model fields used and settings such as; Actions, time, Search Query, Preprocessing Settings, Algorithm, Algorithm Settings, Field to Predict and Fields to use for predicting. We can quickly refer back to any of the previous settings we used during the creation of the model.
After using the test Oxygen environment, the Machine Learning Toolkit does have a number of useful showcases. Including:
- Predicting Hard Drive failure, presence of malware etc
- Forecasting Internet traffic, number of employee logins, monthly sales, Bluetooth devices, etc
- Comparable values that differ from historic values like, server response times, number of logins, purchases etc.
Dependant on the data you are feeding from your environment, you can gain a lot of insight and make predictions to help reduce a number of key failures on the network.