Reproducibility and Automation of Machine Learning Process

Here is a short update from PyCon Belarus’17, where our Data Science expert Denis Dus talked about ‘Reproducibility and Automation of Machine Learning Process’.

In his speech, Denis explained basic design concepts for automation of iterative processes in machine learning and shared his experience of building data pipelines within one of his projects.

Automation of machine learning process does not eliminate the data science expert, it helps to focus efforts on understanding the business problem, improving the model, and explaining results, the true value drivers for business.

Normally data scientists have to spend up to 80% of their time on data engineering tasks like data extraction, data cleaning, data transformation, data normalization, feature extraction and only 20% of the time is spent on modeling. Denis recommends considering automation if you repeatedly need to extract, clean and transform data, if you want to update models on regular basis or if you want to simplify reproducibility of data science experiments.

Check out the slides below for more details.

Reproducibility and automation of machine learning process from Denis Dus

Originally published at InData Labs Blog