60s Data Science: Manage your Machine Learning Projects with ASUM-DM

Published in

The Science Thinker

4 min readDec 1, 2019

Machine Learning, Research & Industry

[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.
— Arthur Samuel, 1959

Machine Learning (ML) has become common practice in all areas dealing with databases, particularly in research and industry. ML can provide answers to :

classification problems (ie.: do the vibrations of an engine correspond to a correct state (i), or (ii) abnormal state?),
regression problems (ie.: what will be the SNP500 price in 6 hours?),
or even clustering problems (ie.: among my subjects, can I identify groups with similar motor characteristics?).

Where a traditional computational approach would require a lot of time and energy to develop and optimize, ML allows programming to be implicit. As the algorithm learns “by itself” based on the available data, the code is simplified, will generally be faster, and much of the optimization will be automated. It should be noted that it is possible to do ML without data, but this will be the subject of another post.

When we talk about Machine Learning, the most recurrent themes remain model choice and its evaluation. These steps are necessary, but not sufficient to ensure the success of an implementation in the real world. The implementation of a ML solution within an organization must meet several criteria.

Risks Minimizations

Depending on the scope of application, this step may be essential. Let’s take the example of a classification problem for predicting cancer in computer vision. The management of error types is important. No model will have an 100% accuracy. Regardless of the algorithm, it will cause Type I (false positive) and Type II (false negative) errors. In medical imaging, it may be more interesting to weight the sensitivity of the algorithm: the consequences of a false positive are much less important than a false negative, leaving a patient with an undetected tumor.

Scalability

To develop a model, it is common to use your own computer, or Cloud Computing platforms such as IBM Watson Studio, Google Colab, or Alibaba Cloud. It is important to keep in mind the context of the application of Machine Learning. If the task requested is complex and a Deep Learning model is required, how often will I have to re-train it to maintain state-of-the-art performances? Is the algorithm optimized for the hardware on which it will be executed (CPU, GPU, TPU, etc…)? It should be kept in mind that the solution may remain effective for 5 or 10 years. If the user base is multiplied by 10 or 100 in 10 years, will the algorithm still be efficient and appropriate? Will my hardware (server, cloud service) be able to manage calculations, store data and results without degrading the performance of my service?

Analytics Solutions Unified Method for Data Mining/Predictive Analysis

To optimize the benefits of such an implementation, several key steps should be implemented[1]. At all stages, all members of the organization are an integral part of the project, even just to provide the necessary expertise to understand the specific problem.

Understand and analyze the problem: define the current state of the organization’s solution, and define the implementation objectives (increase the number of customers by individualizing suggestions?). It will then be necessary to define the prerequisites of the implementation (useful data and their means of obtaining, model performance, etc…).
Design: define all the necessary components for implementation and then start building the model on the basis of pre-existing, transformed or simply synthesized data to start evaluating the different tracks and their feasibility in the form of prototypes.
Configure and Implement: based on the prototypes created during the design phase, which is the most efficient and adapted to the problem? Then begins the implementation phase in real life situations. This iterative and incremental approach allows us to start adapting the existing software and hardware architecture to allow us to insert our solution. The model, almost functional, now requires only a few adjustments to get into production.
Deploy: a roadmap is created. The model is then deployed and configured in a production environment and deadlines are set for the various precision measurements of the model, its re-training, and the frequency of maintenance of the used Hardware.
Maintain and Optimize: the model is fully operational in production and begins to generate value (better medical diagnosis, increased revenue, better UX, etc…). The accuracy of the model and its parameters are regularly checked to stay cutting-edge.

Going Further

A. Géron, “Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems,” 2019.

References

[1] IBM, “Analytics Solutions Unified Method, Implementations with Agile principles”, n-d.