The Machine Learning Process

Vince Sesto
Splunk User Developer Administrator
4 min readFeb 27, 2017

I might start calling this machine learning Monday. I know this is only the second week that I’ve posted about machine learning, but maybe I can make a tradition of it.

As with most technology implementation it pays to have a process or recipe to work through and machine learning is no different. After our very basic explanation of machine learning, I thought it would be best to discuss the machine learning process and how this is implemented.

What Question Are You Planning To Answer

This is the first part in our machine learning process and helps you move through the rest of the process as it should give you a clear indication on what outcome will determine success, what data will be used and how you intend to process the data. Your question should be detailed and clean to limit any confusion when you move further down the machine learning process as it something that you can always refer to for guidance.

For example, if we wanted to collect historical stock market data from Yahoo Finance, we could then try and create a model predicting if the stock will go up or down the next day.

“Using machine learning to process Yahoo Finance stock market data to create a prediction model. This model must predict with 70% or greater accuracy, if a certain stock will increase or decrease in value?”

Our question here states the specific data we are going to use, we are going to be using a machine learning process to create a prediction model, and it also shows what we will consider to be a successful result, in this case 70% or greater accuracy,

Gathering And Preparing The Data

This part of the process will generally take up a bulk of your time. Although we know the data we are going to use, we may find that the data lacks specific information or integrity needed to gain effective results. You may find that it might be worth cutting your losses at this point instead of persevering with poor data.

Once you’ve gathered your data, you can start to work through, eliminating data that may be incomplete or erroneous. This is generally running the data through specific pre processor tasks where you may want to create a visualization of the data and ensure it’s all complete. Although in our example we have used a reputable source, we still need to see if there is any data missing or incomplete.

Selecting an Algorithm

Since we’ve defined what we want to find and prepared our gathered data. It’s now time to select an algorithm to use to hopefully get some useful results from the data we have. Your selection of a machine learning algorithm will depend on a few factors including the amount of data you wish to process, the type of results you are looking for including the accuracy and the type of data you are using. We will look at introducing some of the types of algorithms at a later date as it’s a subject that will definitely need more attention that a quick summary in this post.

Training the Model

We’ve selected our algorithm that will hopefully give us the predictions we need. Now it’s time for us to use some data with this training model to allow it to learn from. Here we will use part of our data that we’ve gathered and processed. Within the data it must contain an answer to the question we are asking to allow the model to learn from this training data. This will then provide a machine learning model that will hopefully make the predictions we need when new data is introduced.

Testing the Model

Finally, we take our model out into the real world and start to make predictions against incoming data. We will need to continually train and retrain the data to hopefully improve the results we are getting. In this part of the process we will need to also present out data in some form to make sure that the key stakeholders we are delivering to will have their questions answered as well.

As you can see the process is iterative and may take some trial and error before you come up with the right solution. As we stated previously, machine learning is not intended to answer you question 100%, but answer your question enough to be useful. You may find that a lack in domain knowledge around the specific subject that you are investigating may be slowing your progress and may require you to rethink your approach completely.

Found this post useful? Kindly tap the ❤ button below! :)

About The Author

Vince has worked with Splunk for over 5 years, developing apps and reporting applications around Splunk, and now works hard to advocate its success. He has worked as a system engineer in big data companies and development departments, where he has regularly supported, built, and developed with Splunk. He has now published his first book via Packt Publishing — Learning Splunk Web Framework.

--

--

Vince Sesto
Splunk User Developer Administrator

Vincent Sesto is a DevOps Engineer, Endurance Athlete, Coach and Author. One of his passion’s in life is endurance sports as both an athlete, coach and author.