Orange Juice for Better Machine Learning

5 min readOct 1, 2019

With the plethora of Machine Learning Techniques and Tools to implement them, novice users are lost in the middle of the highway trying to figure out where to start.

Easy, Intuitive, Powerful. The combination of these 3 words are generally never come across while describing a software. Here’s presenting ‘Orange’ which you would might not have come across, but will change the way you look at machine learning. And in a good way.

No codes necessary to kick-start your machine learning career. The Orange suite features GUI based Data Exploration, Data Preparation, Chart Generation, and ML techniques for all your predictive analytic needs.

The birth of the fruit

With the capabilities of python powering this Freeware, Orange was developed in 1996 by the good folks in the University of Ljubljana.

How does Orange taste ?

With this pretty intuitive interface, it becomes so easy to manipulate data and build a ML model. All operations are drag and drop.

Orange Canvas : This is how the interface looks like

What can you accomplish with Orange ?

Business understanding : That’s the data here. You should know what the data stands for and why you are analyzing.

Data Understanding : This is handled easily by the orange charts and plots. Depending on the variable type, specific charts can be plotted and variables changed on the go. The different types of supported charts are shown above.

Data Preparation : This is easily achieved with its functional tools like outlier treatment, missing value imputations, normalizing, standardizing, encoding, etc. The Data tab above shows the functions available for achieving the desired output.

Model Building and Scoring : Easily build using the drag and drop tools and quickly customize by tuning the hyper-parameters of the models. Scoring can be achieved using different validation techniques, including k-fold cross validation. All the various supported ML modules are listed above in the Model tab.

Prediction and Evaluation : Once the model is obtained on the training data, the model coefficients can be used to predict the test data and generate outputs. Various Evaluation metrics listed above can be used to understand the best iteration from the generated models.

No better way to show features about this incredible piece of software than showing machine learning in action.

All hands on deck

Let me guide you through a simple classification model. I will keep it mostly pictures. You know, because a picture is worth a thousand words.
The below is a network diagram for building a classification model for the readily available IRIS Flower dataset.

Simple Classification Tree Example using the IRIS Dataset

Step 1

Import/load the data
Define the target variable
Choose the independent variables as features or as meta

File Load and define the Dependent and Independent Variables

Step 2

See the data visually

Note : We are skipping the data preparation phase as data is already clean and good to go.

Step 3

Check plots and visually identify correlations while changing variables on the fly.

Step 4

Run the classifier
Tune Hyper parameters

Classification Tree Hyper-Parameter Tuning

Visually check the Tree Diagram and understand each split

Step 5

Score and Predict

The Predicted Class along with the Accuracy and other Scores

Identify the misclassification errors

The above example is of a Simple Classification Model which can be built in just a minute, up and running with plots. I have intentionally not described each individual step and the inferences from it. That, I leave unto you.

Cool huh ?

This is a powerful freeware with a few genius people coding away in the background to bring an easy interface in the foreground. As with any other thing in life there are some hits and misses, but in the end it is worth it.

Hits
- Freeware, no ads.
- Ease of use, no coding required.
- GUI based visual connections.
- Ease of hyper-parameter tuning.
- Flexibility to create own codes in python and apply on the dataset as well

Misses
- Lack of in-depth tutorials.
- A bit of learning curve there.
- Certain In depth functions missing.
- Lack of all ML algorithms (eg. XGBoost)

Hope this guide was lucid enough to get you started with orange. It is a very powerful tool which when used correctly will save a lot of time and effort for basic to intermediate data scientists.

Thanks for reading. Some quick optimized links below to quickly squeeze out the 🍊 juice