Thoughts on a GUI for Scikit Learn and TensorFlow

I’ve been thinking for a while about building a GUI for Scikit Learn and given that I am currently taking the Google Deep Learning course on Udacity, I thought I would extend my original idea to TensorFlow. This post covers my reason for building the tool, some of the requirements and my design ideas for bringing the project to life.

Why a GUI for Scikit Learn?

Scikit Learn is a great library. I use it lots in all my data science projects — it has clustering, classification (inc Affinity Propagation) and dimension reduction (inc Non-negative Matrix Factorisation and tsne) algorithms as well as experiment setup (e.g., splitting a dataset into a train and test set), cross validation and parameter tuning (e.g., gridsearch) functionality. Scikit Learn is well maintained and actively developed. I wrote Scikit Learn code from scratch for my first few projects. As I got more experienced I created a few abstractions but ultimately started to copy and paste code. At this point I noticed a workflow that was really dependant upon the choices I was making. These choices were either the type of algorithm or related to the experiment (evaluation) setup. I realised that a GUI that could facilitate these choices would reduce setup and coding time.

Lets try to capture the Data Science workflow!

GUI Requirements

I’ll be frank — I don’t like visual tools that hide code or stop me from interacting with code. I like the idea of the GUI generating either an IPython Notebook or just Python code once you have made all your experiment setup choices. The GUI could even be an IPython widget that generates cells in the IPython Notebook that can be edited.

I also don’t like visual tools that employ a flowchart metaphor. There is a lot of dragging and dropping. Also tools such as Orange already exist. I don’t like the step by step wizard interfaces either.

So basically my requirements are actually a list of things I don’t want.

…but there are also things that I do want. I want to make Data Science using Scikit Learn easier for both experienced users as well as users that never want to touch Python code.

Design Ideas

I have a few rough ideas for the GUI and I’m about to start prototyping. I’m still very much figuring out how everything will work. The GUI metaphor will be a tree-based outline editor. As choices are made the tree expands, change choices and the child tree objects get modified. Each node of the tree will be easily translatable to python code for Scikit Learn.

…. a clearer picture helps but will come later

I have a defined workflow of choices for supervised machine learning, which will be the first prototype I build. The choices include selecting the algorithms, selecting parameters for the algorithms, running cross-validation (perhaps viewing scores, confusion matrices and plotting learning curves), performing feature selection and feature engineering and persisting the best model.

I’ll post a follow-up post soon that will include a link to the prototype. If you have any feature requests or suggestions please include in the comments below.