Last year, IT publishers heise and d.punkt approached me if I could present at a new machine learning event they were holding jointly with The Register in London: Minds Mastering Machines (“m3”). My introductory level talk Computational Decision Making in a Nutshell was perceived quite well by the audience, and I was invited back for the inaugural m3 event in Germany, which took place in Cologne on 25–26th May.
I’m a big fan of their conferences, being a regular presenter at Building IoT, Data2Day and m3, as they are developer-focused without being overly technical. In fact, despite frequent sightings of code snippets, most talks provide entry-level information and often the keynotes are the least technical of all presentations, highlighting important but often neglected aspects such as usability, ethics, etc. While there are some sponsored talks, the companies supporting the event are usually service providers with good presenters and interesting use cases, thus no slot seems wasted.
The event opened with a keynote from Oliver Bendel, a professor of machine ethics at University of St. Gallen. My first unexpected learning (for m3 being a machine learning conference…) was that ethics is a philosophical discipline and deals with moral. I’d just leave it at that, but after introducing machine ethics and machine morality, and their connection to artificial intelligence, Oliver took us on a tour-de-force of his work. As a trained computer scientist and philosopher, in the past he was researching rule sets for notoriously nice or lying chat bots, mused about the difficulties of transferring responsibility between humans and computers for autonomous driving, and even caught a massive wave of media attention for his reflections how much a sex robot for sadomasochistic practices is allowed to hurt their users… If any of that hits you, pun intended, he recommeded a book by Luis Pereira on Programming Machine Ethics.
Then followed 18 talks in three parallel tracks. I was torn between time-series analysis and the explainability of machine learning models. As Shirin Glander from codecentric already wrote a nice blog post on model descriptions with the LIME method, I revisited the foundations of time series and event prediction with parametric methods with two presenters from zoi. They also made some Jupyter notebooks available, which added to the rather academic character of this talk.
Things stayed rather academic for what I, initially, perceived as the most manufacturing-oriented presentation. Daniel Trauth from RWTH Aachen talked about which data from a fine blanking machine may be useful for quality predictions of the pressed material. While the abstract sounded as if this was all already happening on a massive scale, with >1000 parallel data streams at data rates of 10 Gbit/s, as also mentioned in his introduction, turns out that the work is still at the proof-of-concept stage, with a rather familiar stack known to many of us:
The next talk I attended was by Lars Gregori from SAP Hybris. He demonstrated on the example of a XOR input/output scenario how to train a machine learning model with Keras and then deploy said model to the iPhone. Using the CoreML libraries provided by Apple as part of iOS 11, after importing coremltools into Python, the conversion of a Keras into a CoreML model is as simple as
coremlmodel = coremltools.converters.keras.convert(model, …)
Needless to say, there are model converters for a wide range of machine learning libraries, including scikit-learn 0.18. After importing the model into Xcode, a few method calls are auto-generated for the model and using it from your app is a matter of two or three lines in Swift.
I then attended a really fast-paced and engaging introduction into deep learning by Christoph Reinders, an image recognition PhD student from University of Hannover. There is no point condensing a really comprehensive tour-de-force of deep neural network research into one paragraph here, since he also detoured into comparisons between different convolutional networks for image recognition and related areas such adversarial attacks. Let’s just say it’s worth to watch as soon as the organisers make the video available to the public. If you can’t wait, Christoph recommeded http://neuralnetworksanddeeplearning.com as great resource to get started on the theory behind neural networks, including code examples.
My day concluded with a talk by Klaas Bollhöfer from Birds on Mars. He spoke about the collaboration between the media art collective YQP and the painter Roman Lipski. The wider Birds on Mars team developed an artificial intelligence system that can do style transfer learning on images. The system extracts key “features” from the artist’s paintings, such as shape or colour. It then uses this information to emphasise what it “perceives” and produces new images on the basis of the original. Lipski uses these images to inspire his next iteration of paintings.
The images he paints are continuously digitised, and fed into the system, and over time both artist and computer system co-evolve. A fascinating collaboration.
While the technical aspects of this work are impressive (requiring super-resolution neural networks, style transfer learning and generative adversarial networks), Klaas claimed that the data scientist of today is going to be tomorrow what the HTML guy from yesterday is today. In his opinion, the focus should shift away from the technology towards using AI systems to inspire our work [note from me: much like Lee Sedol later admitted the inspiration he gained by famous move 37 from AlphaGo in match #2 of the Google Deepmind Challenge].
The second day of m3 started for me with a technical deep-dive into natural language processing. Gerhard Hausmann, the only knowledge system architect at insurance company Barmenia, first described the problem of extracting entities from medical bills and how business logic has to action on semi-standardised descriptions of treatments.
He then went on to explain how once rule-based expert systems were considered artificial intelligence, and how Barmenia combines the IBM Operational Decision Management system with their own tools to automate as much of the case handling process as possible.
Gerhard took us through the logic of creating an input tensor from low-level character recognition to train a neural network for matching items from the bill against the German medical fee schedule (GOÄ, Gebührenordnung für Ärzte).
He explained in some detail the workings of his deep convolutional network, and how his implementation differed from code he found on the Internet:
Software has to be maintainable for considerable time frames in the enterprise. He therefore favoured Tensorflow over other frameworks, assuming it had gained enough traction to still be around in ten years time. Interestingly, he has also had a play with the Stanford NLP Classifier, and found it showed similar performance but with significantly less effort to get started. We later had a chat about this and he estimated two weeks to get to grips with the convolutional networks versus two days with the Stanford Classifier.
The conference went on with another NLP presentation, this time recognising up to 70 different labels from commercial bills. Chi Nhan Ngyuen from SMACC talked about his stock of 300k bills in 25k different layouts from which he extracts entities using bidirectional recurrent neural networks with PyTorch. The RNN approach is useful when the network needs to “remember” something when entities are expected in a particular sequence. For example, it is highly unlikely that a tax identification number is wedged in between name and street in the address field (though, in practice, people make mistakes…). There are different ways of creating such “memory” within a neural network, long-short-term-memory (LSTM) or gate recurrent units (GRU).
I next chose a talk that assessed our readiness to utilise patient data like electronic health records for automated computational analysis. Marc Pickhardt from GesundheitsregionNORD e.V. provided an overview of different efforts to standardise medical data formats over the past thirty years. To understand that landscape, he introduced the different stakeholders in the medical data field and described top-down vs. bottom-up data silos in healthcare. Top-down are hospitals and doctors, government departments, insurance companies and a flock of archiving services for such data. Historically, their focus has been on billing, and thus it is not surprising to find more accounting-related fields in some data formats, and less on what could constitute actual medical information. Bottom-up information is collected by devices close to the patient, data that is often medically relevant, but which cannot be set into perspective to the patient as a whole.
Marc explained a general problem with medical data in Germany. There is not one’s central “electronic health record”, but a loosely connected and often incomplete collection of “cases” at individual doctors. Retrieving and analysing the data for one patient is therefore nearly impossible, and be it for the zoo of data standards used in this country.
Turns out only image data (encoded in DCOM) and Health Level 7 (HL7) are somewhat universal, the latter is usually not present in software systems of general practitioners (Hausärzte). Also, on a semantic level, there is confusion. While in English speaking countries there is a prevalence for the ICD-10 identifier and categorisation of diagnoses, other countries including Germany have their own directories. Marc reported a proof-of-concept aiming to train a machine learning model to recognise different types of cerebellar bleeding from computer tomographic images. However, the project failed already at the training stage, as it was impossible to extract medical information from the data files. In some cases, where the format would expect an ICD-10 identifier, there was reference to MS Word documents that would then describe the diagnoses in prose…
After lunch followed the second m3 keynote. Marcel Tilly, Program Manager at Microsoft AI & Research in Germany, gave a highly entertaining historical perspective on “artificial intelligence”. Some obvious examples aside, I gained a deeper appreciation of how rapidly the performance of classifiers improved during the ImageNet competition over the past eight years, with ResNet now performing even more reliable than human curators. Also, Marcel talked about the groundwork of Donald Michie in the 1960s, who developed a simple model of reinforcement learning involving match boxes, coloured rice corns and tic-tac-to. As a keynote should, the scope of the talk was also to highlight the limitations and dangers of technology. With decisions being taken on the basis of machine learning and not human-defined rules anymore, it is somewhat worrying that, e.g., common face recognition software has failure rates of better 0.3% for young, white males and worse than 20% for females of colour.
We need to make sure that the bias we face in our everyday lives doesn’t become a bias when training machine learning systems!
The first talk dealt with fraud detection. In cases where there it is suspected that a fraudster is trying to order, the option to pay after receiving the goods isn’t offered. Without being able to provide too much detail, for obvious reasons, the researchers explained how they ideally draw from a set of a three-digit number of potential features to make a classification. Unfortunately, not all features are available for all potential customers, creating a sparse matrix. They assessed various strategies of dealing with the missing values, e.g., simple fill-up, proper imputation, or even creating dedicated classifiers for feature set combinations. In the end, they concluded that filling up empty values with a pre-defined constant does the job (in their case!). Interestingly, bias in machine learning isn’t restricted to training. Also model selection can be biased: The speakers admitted using models with different precision/recall characteristics for fraud in different countries, depending on how angry their customers might get if they don’t see the option for paying later.
The second Zalando talk concentrated on the difficulties of introducing machine learning and data science in the organisation. With several thousand employees at all qualification levels, even at their company it can be difficult to make all processes data-driven. The level of despair becomes clear as soon as Machiavelli is quoted:
The speakers defined five characteristics of data-driven companies:
- All relevant data is curated and stored in an accessible manner.
- Management decisions are made solely on the basis of data.
- User experience is key and continously improved, A/B testing a standard tool.
- What can be automated and optimised will be.
- Data unlocks new customer-facing products.
What followed was a typology of organisational patterns and show stoppers that hinder digital transformation: The Lord of the Realm, The Inertial Entrepreneur, The Cya Crowd, The Pessimist, The Detail Planner, The Black Box Believer, The Evangelist, The Visionary and The Tinkerer. Without going into too much detail, all of these types have advantages and disadvantages for the organisation, but their powers need to be carefully managed…
Zalando concluded with a few use cases how the routes for pickers in the warehouse can be optimised, how small orders can be efficiently batched such that a picker can deal with several at once, or how the ideal placement of popular stock can increase picking efficiency.
The last talk of m3 was a real geek highlight: Using the Minecraft as testbed for reinforcement learning.
Lars Gregori introduced Project Malmö, a modification (“mod”) for the Minecraft game that offers a simple agent to explore the world. It’s been developed by Microsoft, who have bought the game in 2015. The above screenshot shows a bridge (grey fields) across lava (red fields) and a target destination (blue field). Making a step costs 1 credit, stepping into lava costs 100 credits (and your life…) and arriving at the destination is worth +100 credits. Now, without being aware of the surrounding, the agent is allowed to initiate a random walk with the aim to achieve the highest possible score. Most walks will end in death and a considerable negative score, while after a few hundred iterations, by chance, the destination is reached, yielding a considerably positive score. During all the time, the agent is aware of the highest score ever achieved at that particular position. It is then capable of working out the optimal route from start to finish. (Note from me: In a way, this is classic dynamic programming and traversing over a graph structure, and thus nothing new for seasoned bioinformaticians.)
To complicate things a little more, Lars then took forward-facing screenshots from every possible position along the playing field. He then trained a deep neural network to associate a good, neutral or bad next move with a particular perspective. Conceptually, this borrowed from DeepMind’s original paper Playing Atari with Deep Reinforcement Learning. Thanks to code examples available on Github, he only had to make minor adjustments:
The first German m3 conference was a success. It’s an event I’d definitely recommend to my colleagues. The mix of people was good, ranging from curious programmers to seasoned practitioners, and from IT consultants to enterprise data scientists. The bullshit factor was very small for a conference dealing with “AI” (a bit of handwaving here), and I’m sure there was something to be learned for everyone.
It’s become clear to me that machine learning is growing up quickly. Not only as a field, but also by the number of people who can do it. Whereas my London m3 session was jam-packed and I had the feeling the content was new to most, at least half of my audience in Cologne indicated some or even good familiarity. The same degree of growth goes for service providers. While I remember being one and meeting a few “token data scientists” at previous German conferences, at m3 it became clear to me that it’s not unusual anymore that they have a good handful of machine learning practitioners.