Becoming a Machine Learning Engineer | Step 2: Pick a Process
Picking your process is super important
After a few applied machine learning problems, you usually develop a pattern or process for quickly getting started and achieving good results. Once you have this process it is trivial to use it again and again on project after project. The more developed your process, the faster you can get to results!
Let me give you a head start and teach you a 5-step systematic process that I developed while becoming a machine learning engineer. This is just a starting point and you should feel free to change it to suit your needs
Define the problem
This step is all about learning more about the problem at hand. Familiarize yourself with the domain and understand why you are building this solution. To help facilitate this, always ask yourself the questions below
What is the problem? Describe what the problem is formally and informally. Make sure you list assumptions you are making and any problems that are similar
Why does the problem need to be solved? List any motivations for solving the problem. What are the benefits a solution brings and how would you use it?
How would I solve the problem? Describe how the problem would be solved manually to build up domain knowledge
Do you understand the data you have been given? Lots of people skip over this step because it is often tedious but it is super important. This work forces you to think about the data in the context of the problem before it gets lost in the craziness of algorithms
Data Selection: Consider what data is available to you. Is there any data missing? Can you remove any data?
Data Preprocessing: Organize your selected data. Format is, clean it, and take a sample from it
Data Transformation: Processed your ready data for machine learning by engineering its features using scaling, attribute decomposition, and attribute aggregation.
Explore different Algorithms
Now that you have your data it’s time to try out a bunch of different standard machine learning algorithms. Typically, you would run 10–20 standard algorithms on the transformed and scaled versions of the dataset you prepared in the last step.
The main goal of trying all of these different algorithms and dataset combinations it spreading your net far and wide. See what works and what doesn’t then go from there. More detailed explorations will follow with well performing algorithms.
After you have finished exploring the different algorithms and picked one that works well for your dataset it is time to squeeze out the best results from it. You can do this in a few ways, but it’s important to make sure that your results are significant at this point because hyper-parameter tuning isn’t going to turn a crap result in to a good result. It will just help you squeeze out a bit more performance.
Here are some standard ways to improve an already working algorithm.
Hyper-parameter Tuning: All algorithms have hyper-parameters and making sure these are optimal is key to getting the best performance.
Ensemble Methods: Where predictions are made by combining multiple models
Extreme Feature Engineering: Attribute decomposition and aggregation seen in data preparation is pushed to the limits
The results of a complex machine learning problem are often meaningless in a vacuum. It’s important to put them in context. This typically means a presentation to stakeholders. This applies to big meetings with CEOs and online competitions. Its good practice and gives everyone involved a good understanding of the problem and how you solved it.
Here is a quick template for you to present your results:
Why: Define the environment that the problem exists in and set up a motivation for the solution
Question: Describe the problem as a question that you went out and answered.
Solution: Concisely describe the solution as an answer to the question you just posed
Findings: List out all of the discoveries you made while solving the problem.
Limitations: Clearly go over the limitations of the model. What is it not good at and what can be done better.
Conclusions: Go back to the why, question, and solutions and tie it together in a way that makes it easy to remember.