What I learned from the MIT Professional Education Program “Data Science: Data to Insights”

Yes! I have just finished a class about Data Science from the Massachusetts Institute of Technology — more famously known as MIT. Unfortunately, while I’m still yet to visit the campus in Cambridge, Massachusetts, USA, I did find this course very rewarding I participated in a six week online course, the MIT Professional Education program: “Data Science: Data to Insights”. It’s all about solving complex issues with your data — as the teaser explains. The topic is becoming more and more relevant nowadays, because 90 percent of the world’s data was only created in the past few years.

Florian Hoeppner
6 min readFeb 10, 2017

You’re possibly wondering whether or not you ought to take such a class? So, I think it’s only fair that I share some insights with you. To understand my point of view, I’ll give you some personal background information on my IT career and artificial intelligence (AI) fore-knowledge.

I hold a diploma in computer science and have in addition, a Master of Science in Digital Media. During my education AI was not as popular a subject as it is now. However, it was my personal focus in my diploma thesis in 2005. Unfortunately, I was only able to work for a short time as a developer, and wasn’t able to work in another discipline within AI. For around 10 years now, I have focused on IT consulting topics, mainly on IT-shoring, -sourcing, and vendor consolidation. I specifically picked this online class because my aspiration is to understand the possibilities and limitations of the methods and technology.

The class is structured in five modules, case studies and assessments.

Each module is accompanied with 10–20 videos (culminating in a predominately multiple choice assessment) and anywhere between one and seven case studies. Students receive their certificate and the CEU (1.3) only with the successful completion of all assessments.

To successfully complete the modules, (assistant) professors provide the students with online courses, enriched with animations and graphs. Participants can discuss open topics and questions in an online forum. Algorithms and concepts are always explained on industrial or real life examples, i.e. Netflix or Facebook.

The topic “Data Science” is broken up in five modules:

  1. Making sense of unstructured Data
  2. Regression and prediction
  3. Classification, hypothesis testing and deep learning
  4. Recommendation systems
  5. Networking and graphical models

In the first module you learn how to discover patterns and latent structures in data. For example, you’re able to learn how to structure all your text files on your laptop based on certain themes. Or, if you want to discover latent communities in a social network- also called clustering- then you’re able to do so.

In regression and prediction the focus is on bivariate and multivariate regression for purposes of prediction and causal inference, followed by logistic and non-linear regression. You will learn how to solve prediction problems with high-dimensional data, namely lasso, ridge, regression trees, boosted trees, random forests as well as others.

The third module Classification, hypothesis testing and deep learning starts with statistical methods of classification, testing hypothesis and its applications, including detection of statistical anomalies, detection of fraud, spam, and other malicious behaviour. For example, binary classification like an email is categorised as either spam or not spam. You will be introduced to neural networks, perceptron (an algorithm for supervised learning of binary classifiers), deep learning and their limitations.

Module four Recommendation systems, teaches you how to discover relevant information from vast amounts of data. You will learn how Netflix is recommending new films to his users; how Amazon, Facebook or Spotify are recommending to their users. You learn different principles and algorithms for recommendations, ranking, collaborative filtering and personalised recommendations.

The last module, Networking and Graphical models let you understand the behaviour of a network. For example, how information or ideas are spread in a social network which is relevant not only in a marketing sense but also how it could be used for other purposes, for example in crime detection. You learn about algorithms to analyse large networks and methods to model network processes.

I could already smell the Prüfungsangst (exam anxiety), and I was only in a virtual classroom!

I really loved the program because it opens the door to the unknown possibilities in technology, ones which have continuously changed and will carry on changing our world in the next years dramatically. The lecturers teach at the highest level and speaking for all of us who undertook the course — we can honestly say that we learnt something new.

There was a lecture (lasting an hour and a half) and was held by Victor Chernozhukov (http://web.mit.edu/~vchern/www/), a young professor with Russian/American heritage from the Department of Economics.

His talent, (beside his enormous capabilities in regression and prediction), lies in his ability in giving lectures only through mathematical formulas. He reads formulas like others read the news headlines! It’s not hard not to have ultimate respect for him. It was only after I watched the video numerous times that I had SOME idea as to its content. The internet, specially the YouTube videos on maths were very helpful. And that’s the exact difference right there. While at school, our teacher only explained subject matter once, whereas now we can rehearse the material as frequently as we wish until the topic sticks.

Machines are climbing up the ladder and take over mental labour.

Once you’re able to get past this tough lecture all others are absolutely enjoyable, especially those about deep learning. It was such an eye opener that I’ve since watched it twice. In deep learning the developer is merely teaching the system on how to learn, and how to solve a problem. The system receives a teaching set of information to explore and learn on its own — the program is learning from past data.

For example, the system is reading images from animals and the name of the animal in the test that is set. So, it learns how to detect a dog on images by itself. The system learns what characteristics a dog has purely based upon images. Hence, computers have now entered the area of mental labour which until now was a merely an area exercised by humans.

With this kind of algorithm, machines can take over new classifications of jobs that only humans had executed in the past. For example, when interpreting medical images and detecting cancer, where images of dogs have merely been substituted for x-rays. The teaching set includes the information on if cancer was detected or not. After learning how to read and interpret the x-rays, the system is able to reduce the time that a doctor has to spend on analysing the images. Machines are climbing up the ladder and taking over mental labour.

To get hand-on experiences, case studies are included after each module. For some studies you need developer skills in Python or ‘R’. If you are not familiar with the program languages, the documents in the case studies are supported with code fragments. For example, the code is given for how to read from an external file or for visualising your data. You develop your skills in a practical real-world setting. For example, you will build your own recommendation system for movies, similar on the recommendation system from Netflix. In another case study, you use the network-theoretic ideas to identify new candidate genes that might cause autism.

The course is not at a high level, as you learn the general concepts in maths. From end to end, you directly implement the problem to develop your own solution on real data.

On one hand the course was extremely challenging because of the deep dive in mathematics and programming and the fixed time frame of solely six weeks. It was satisfying because it’s taking you deep into the world of data and intelligent systems which is one of the main drivers in my particular business area.

However, on the other hand, the drawback is that you have only for a few months’ access to the course material, and especially when it comes to the videos, this isn’t enough. Also, you cannot download the videos, only the text files, which are not useful. The course is starting again in February 2017. For consultants focusing on the digital transformations, it is definitely a must. Enjoy!

If you like to dig deeper into Data Science:

A visual introduction into machine learning online:

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

A good book covering similar topics as in the course:

● Data Science for Business by Foster Provost, Tom Fawcett

● Foundations of Machine Learning by Mehryar Mohri and Afshin Rostamizadeh

The link to the MIT course:

https://mitprofessionalx.mit.edu/courses/

About the author: Florian Hoeppner is working as Technology Advisor for New IT in Financial Services North America. His focus is on Enterprise Agile, DevOps, SRE combined with sourcing and shoring strategy. Right now, Florian is living the dream in New York City.

Articles and comments are my own views and do not represent the views of my employer, Accenture.

--

--