Machine Learning Intern Journal — Real-World Data Is Messy

As the title indicates, this is the journal of a Machine Learning (ML) intern at the impactIA Foundation. I’ll be attempting to keep a weekly journal of my activities in the Foundation to keep track of my progress and leave a roadmap for the interns who come after me.

Léo de Riedmatten
impactIA
3 min readNov 2, 2020

--

As a new period of confinement looms over us, I look back at recent advances, and peer forward at what’s next.

As many European countries enter a second period of ‘lock down’, Switzerland seems to be trailing behind. This is worrying considering the situation in our country is one of the worst in the continent. While the Federal Council introduced ‘stricter’ measures last Wednesday, many were surprised by the lassitude of these new measures. Frankly, I was expecting much more and I am disappointed. Cantons are free to put in place stricter measures if they want, and this is what many French-speaking cantons have started doing. Geneva is entering a period of semi-confinement for a month starting this evening. All non-essential stores, bars and restaurants will have to close, and work-from-home is encouraged. Thankfully, as I mentioned before, our Foundation has always encouraged working from wherever is most convenient, and the nature of our work allows for this flexibility. As long as we have our laptop and an internet connection, we can get things done.

Last week, we had a meeting with our aiXlr8 client to present our results so far and discuss the next steps. As I previously explained, the past few weeks have been quite tough mentally, as we hit a ceiling in terms of performance of our quality control system. Working on this project has well and truly shattered my reliance on beautifully constructed datasets we were provided during our studies. The real world is messy, really messy. Last week, I came across a new venture of AI researcher Andrew Ng, called Landing Lens. Landing Lens is an AI-powered platform that helps manufacturers develop computer vision solutions that can identify defective products. This service has two unique features:

  • “Rather than holding the training set fixed and trying to improve the model, we hold the model fixed and help manufacturers improve the training set. We’ve found that this approach leads to faster progress in production settings.”
  • “Rather than focusing on building models that recognize defects better than humans can, our tools aim to improve human-level performance. The better humans can recognize defects, the more consistently they’ll label those defects in training data, and the better the trained models will be. This is a very different philosophy from usual in AI research, where the goal often is to beat human-level performance.”

These are very interesting features that I can definitely vouch for after spending over a month working on implementing a quality control system. We have seen that most base model neural networks with simple head structures lead to similar performance, however the actual accuracy of the system is too low. Improving the quality of the training set as well as the labelling done by humans (therefore removing ambiguity) will vastly improve the performance of our system, and that is what our client is doing now. Gathering more data and balancing the classes (up to now we had a very unbalanced dataset), as well as improving the quality of the dataset by reducing variance in resolution, lighting conditions, etc. We look forward to the new and improved dataset for a (hopefully) big leap in performance!

In other news, we have quite a few educational workshops coming up for students of various age groups. I’m happy to get involved in these as I’ve always enjoyed sharing my knowledge and experience with others. However, there is much uncertainty surrounding these workshops. Will they take place in person, or remotely — or even at all? This makes the planning and organisation quite tricky, but there is one certainty we can use as a guiding principle: Change is the only constant (a lyric from my upcoming song). We need to learn to adapt quicker than ever, in a time where our livelihood can change daily.

--

--

Léo de Riedmatten
impactIA

BSc in Computer Science & Artificial Intelligence with Neuroscience from Sussex University, currently a Machine Learning Intern at impactIA in Geneva (CH).