Splunk and Regex

This week I have been on day support, which means I have to deal with any problems that come up relating to our service.
Splunk is one of the tools I have been using, and it is incredibly helpful.

You can feed machine data into Splunk, which processes it for you, extracting relevant data and making it searchable. You can save your searches and create visual dashboards for monitoring.

We get this machine data from pretty much every part of the chain, in the form of logs from tills and services. When something goes wrong, Splunk is the first port of call for figuring out what happened.

I used it extensively this week to figure out the root and solutions for a whole range of issues, and I even created a dashboard myself to track one particular recurring issue.

Unfortunately I can’t share much information, but I can share this part of the dashboard.

Trust me, it’s 2 more than it should be…

One of the things I have had to learn to get better at this week is regex. A regex (regular expression) is a special text string for describing a search pattern. This basically means using some pre-defined rules, you can use regex to search through strings of text to find specific values. This post has lots of information on regex and the basics of using it, although I basically used trial and error to learn it. This website is incredibly useful, allowing you to paste a test string in and see how your regex works on it.

In Splunk you can use regex to extract bits of information from logs into their own fields, then use those fields elsewhere, in tables or other searches.

One example from today was I wanted to see which version of a service a certain till was on. I knew there was a string of text which specified this in the logs, which looked like this:

“Version currently installed 4.0.1574263550-4a2a57f-20191120163137 for service”

So to find all instances of this log for a certain till, I searched the following in Splunk:

till=till_number source=logs "Version currently installed * for service" earliest=-5d

till=till_number specifies which tills logs I want to look at
source=logs tells Splunk where to look for the logs
"Version currently installed * for service" is the string I am looking for in the logs.
The * is a wildcard, and replacing the version number with it means that it will find all logs with anything written there.
earliest=-5d lets me search back through the last 5 days of logs.

I was only interested in the version number, so I piped the results of the search into a regex to extract it:

till=till_number source=logs "*Version currently installed * for service" earliest=-5d
| rex field=_raw "Version currently installed (?<version>.*) for service"

| rex tells splunk that I am about to do some regex
field=_raw tells Splunk which field I want to apply to regex to, in this case the raw event data.
"Version currently installed (?<version>.*) for service" is my regex string. The ?<version> part is me defining the name of the field that I am creating and extracting the information in to. Again I used * to return everything in that specific position, between the space after installed and the space before for.

I used regex101.com to help me get the right regex format. It was fairly straightforward as I just wanted everything between two points:

Now that I had my version extracted to another field, called version, I could chart it onto a graph for easier viewing.

till=till_number source=logs "*Version currently installed * for service" earliest=-5d
| rex field=_raw "Version currently installed (?<version>.*) for service"
| timechart span=6h count by version

timechart lets you create a chart over time (makes sense).
span=6h. You can specify the span in which it will group all logs together, I chose a 6 hour period as I was looking over a few days and didn’t want to have too much information.
count by version lets me count each time a version shows up in the logs.
The chart then shows this count over those periods for the last 5 days.

You can clearly see the point where the till was upgraded to a new version, as the number of times the old version was logged reduces and the new version appears. As I chose 6 hours, there is a clear point where the logs crossover.

I have been using regex a lot this week and I am slowly starting to feel more confident with it, and the same goes for Splunk and its various features. They are incredibly useful, and I am sure I will be using them both a lot going forwards.



You can find the list of all of my Knowledge Sharing posts here.

Leave a comment