Get to know Machine Learning in 2 mins

sompop talson
3 min readJun 16, 2019

--

Machine Learning (ML) is a method of data analysis that can “Automate Analytical Model Building”. Basically, ML uses algorithms that repeatedly learn from data (experience as input) and find something insightful without being explicitly coded/programmed to get “something” out (result as output). It is not the same as the traditional programming as you are familiar with.

ML can be applied to various businesses. You can find its usability over the Internet. For examples:

  1. Fraud detection
  2. Prediction of equipment failures (a.k.a “Predictive Maintenance” which Arcadia Software implements ML model for predictive maintenance in Arfact Plant Maintenance (PM) Platform)
  3. Email spam detection
  4. Image recognition (Computer Vision)

So much more.

Machine Learning Process

Refer to https://www.superdatascience.com
  1. Data Acquisition

First thing first for ML, YOU MUST HAVE DATA. Your data is going to have a lot of “Features” and “Labels” based on whatever you are exploring.

Features are information extracted from the input data to simplify the learning of the pattern. You can consider that a feature is one column of the data in your input set.

For instance, if you’re trying to predict the type of house someone will buy, your input features might include age, size, income, location etc.

Label is the final choice , such as town-house, detached-house, condominium, etc.

Absolutely, if you need to do ML, you must have much more data for analysis (Sometimes, the customers think that the hundred of data items look enough. Absolutely, NOT). It sounds like a simple step; however, realistically your data is most likely available in multiple data sources.

2. Data Cleaning

Absolutely, the raw data you have may not look nice. Probably, you have to CLEAN YOUR DATA to make it in the nice format and correct (at least making it correct in your perception). You can learn the data cleaning techniques over the Internet (For example https://www.digitalvidya.com/blog/data-cleaning-techniques/). From my experience, sometimes you cannot 100% trust the data you have, especially the data which is input by human. You spend a lot of time and be patient to go over the data with whatever tools you use. Below is the classical quote by Data Scientists for their effort spent in data cleaning.

Data Scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it.

3. Model Training & Building

After passing steps to collecting and cleaning the data, you have to split the data into two sets, test data and training data. To have your ML model, you have to write the codes in R or Python with various types of ML libraries in your preferred programming language. In the ML model codes, you can select the suitable and preferred ML model you need to try (such as Linear Regression, K-Nearest Neighbors and else). Then, you will train your ML model codes on your training data and then you will test your model on the test data which you ML model has not experienced before. Generally, the 70/30 ratio (between training data and test data respectively) may be traditional good choice for splitting the data set.

4. Test Data

As mentioned in step 3 “Model Training & Building”, you can use the remaining data for testing your ML model. Practically, your ML model has not experienced your test data before.

5. Model Testing

With this step, you can iterate through this process below on and on until you are satisfied with the accuracy of your model prediction.

Model Training & Build <=> Model Testing

During this iteration, you can tune/change your ML model in the codes as you needed if you are not satisfied with its accuracy result.

6. Model Deployment

Once your ML model is ready, you can wrap your ML codes as a web service and deploy your codes for further request.

--

--

sompop talson

Managing Director at Arcadia Software Development Co.,Ltd