Use Machine Learning Without Writing Code

igel: a machine learning tool that automates everything for you

Nidhaloff
The Startup
6 min readOct 1, 2020

--

Introduction

Probably, if you landed here to read this post, then you already know machine learning (ML) and you are excited to know how to use it without writing code as the title claims. Maybe you are even asking yourself how can this be, right?

Anyway, everyone has probably noticed how the field exploded in the last several years. It is really unbelievable what we achieved to do with ML and how far we have made it easy to use. In fact, there are much libraries out there that allow you to use ML with few line of code, right?

However, building ML models is actually the easiest and fast done part of the work. Developers spend much time dealing with pre-processing the data than actually building the model and using it. Another issue is that the field is moving so fast and most of the time, you ll need to try a lot of things and take a look at the results.

There is no correct way that will always get you the best results, you really need to try different things and most of the time make sure that you have a good quality data first before spending much time wanting to derive something useful from it. Trust me, I’m facing these challenges all the time!

I find myself sometimes writing a lot of boilerplate code to use ML models even when I’m using such a great library like sklearn and the like. Furthermore, If I’m done with a task and want to change some small parts and re-use the model, I find myself looking through my codebase in order to find the spots that I want to update, which is not quite easy if you have a large codebase and probably not what you want to do all the time.

Therefore, I created igel

Igel

OMG another ML library right? There are already more than enough stable libraries out there that make ML easy to use. However, you hopefully realized how painful this can be for the reasons I discussed earlier. Igel is a delightful library that allows you to fit/train, evaluate, compare and experiment with different ML models without writing a single line of code.

The basic idea is to group all your configurations, which refer to your model definition, data pre-processing methods, target you want to predict etc.. in one human readable yaml or json file and then let igel automate everything for you. Igel adopt more of a programming using goals approach. You tell (or describe) what you want to do in a human readable file and Igel will take your configurations and construct a model, train it and give you the results and a bunch of meta data back.

Without writing a single line of code, you can achieve to use ML easily. This is very useful for both technical and non technical users. However, Not always since you want to have more control sometimes if you want to deep down in the details, but from my experience, I can tell that this happens rarely in practice. So unless you want to dive into deep details, this tool would be ideal for you. No more writing boilerplate code again and again or searching through a thousand lines code base to make a small change.

Igel is built on top of sklearn, hence it supports all the models supported by sklearn, even the preview models. Furthermore, igel is a command line interface (CLI) tool, which means that you will use the terminal to interact with it using commands. Here is a list of all supported commands:

# use this command to get help on how to use igel
$ igel -h
# using this command, igel will create a yaml draft file for you
$ igel init <args>
# use this command to fit a model
$ igel fit <args>
# use this command to evaluate a model
$ igel evaluate <args>
# use this command to generate predictions from a pre-fitted model
$ igel predict <args>
# use this command to run fit, evaluate and predict in one command
$ igel experiment <args>

As you can see, the CLI commands are pretty straightforward to use. To summarize, you need a yaml or json file, where your configurations will land and then you just interact with igel using the commands. That’s it!

Overview

You probably asking yourself now, Ok that’s nice but what should I write in the configuration file right? One of the things I focused on when building igel is to provide the user with ways to use data pre-processing too and not only models since it is actually the hard part.

Here is an overview of all supported configurations at the moment:

# dataset operations
dataset:
type: csv
read_data_options: # options you want to supply for reading data
sep:
delimiter:
header:
names:
index_col:
usecols:
squeeze:
prefix:
mangle_dupe_cols:
dtype:
engine:
converters:
true_values:
false_values:
skipinitialspace:
skiprows:
skipfooter:
nrows:
na_values:
keep_default_na:
na_filter:
verbose:
skip_blank_lines:
parse_dates:
infer_datetime_format:
keep_date_col:
dayfirst:
cache_dates:
thousands:
decimal:
lineterminator:
escapechar:
comment:
encoding:
dialect:
delim_whitespace:
low_memory:
memory_map:

split: # split options
test_size:
shuffle:
stratify:

preprocess: # preprocessing options
missing_values:
encoding:
type:
scale:
method:
target:


# model definition
model:
type:
algorithm:
arguments:
use_cv_estimator:
cross_validate:
cv:
n_jobs:
verbose:

# target you want to predict
target:
- put the target you want to predict here

I know this is overwhelming, but this is an overview of all supported options. Chances are, you will only need a few of them to get started. You can read about what each option do in the official docs.

Example

The best way to prove the capabilities of igel is to show an example. In this example, we will use the famous indian-diabetes dataset to classify whether a person is sick (has diabetes) or not using a neural network classifier.

First, we will start by initializing a yaml file. You can either create it from scratch or run the igel init command. Then let’s configure what we want to do in the yaml file:

dataset:
type: csv
split:
test_size: 0.2
shuffle: True
stratify: None

preprocess:
missing_values: mean
scale:
method: standard
target: inputs
# model definition
model:
type: classification
algorithm: NeuralNetwork
arguments:
solver: adam

target:
- sick # column we want to predict from the dataset

So let’s explain what is happening here. In the dataset configurations, I’m telling igel that the format of my data is CSV. Then, I’m passing a split option in order to use 20% of my data for testing after the model is constructed, which will be extracted randomly from the global dataset since I set shuffle to True.

Furthermore, as a straightforward pre-processing procedure, I’m telling igel to replace missing values with the mean if there are some and to scale the inputs using the standard method, which you can read about it here.

Next, I’m specifying that I want to use a NeuralNetwork model for classification in the model configurations and I’m passing my favorite solver the adam optimizer as an argument.

Finally, I’m telling igel that my goal is to predict whether someone is sick or not. Notice that you can provide multi values here if your goal is to predict multiple outputs!

Now it’s time to start igel:

$ igel fit -dp "path_to_the_dataset" -yml "path_to_the_yaml_file"

The command above tells igel to start fitting the model and where to find the dataset and yaml file. After finishing, igel will create a model_results folder in the current working directory and save the model along with some results in a description.json file inside it.

You can then evaluate your model if you want to. If you have a separate data for evaluation, you can use:

$ igel evaluate -dp "path_to_the_evaluation_dataset"

This will also generate an evaluation.json file automatically inside the model_results folder created previously.

Finally, you can use your model to predict if you have a new data:

$ igel predict -dp "path_to_the_new_dataset"

This will generate a predictions.csv file again in the model_results folder, where you can find your predictions.

Remember that I mentioned the experiment command previously? you can automate all these three steps if you like with the experiment command:

$ igel experiment -DP "path_to_train_data \
path_to_eval_data \
path_to_test_data" -yml "path_to_yaml_file"

Here, you provide from start the path to your train, evaluation and test data and you let igel automate everything for you.

Conclusion

That’s all! Hopefully, you saw how useful this tool can be and how much time it can save you in some cases. I hope this will make some people’s life easier. You can run or download the examples from the official github repo. Additionally, based on people’s request, I started developing a GUI desktop app for interacting with igel, since non-technical users are not familiar with the terminal. The app is still under heavy development. You can check it out and play with it here. I would appreciate it if you consider supporting the project. Check it out and feel free to contact me anytime.

--

--

Nidhaloff
The Startup

Software Engineer | machine learning | IoT | Open Source ❤