Some months ago, I decided to build more experience in core AI and explore its applications in my research domain (HCI). In addition to taking online courses, personal experimentation, sharing expertise, an important part of this journey has been to engage with the community.
And so I was really excited at the opportunity to attend the 2018 deep learning Indaba (Indaba means “meeting” in Zulu)! My main goals were to learn from the rather high quality speaker panel (Omoju Miller, Naila Murray, Katja Hofmann, David Silver, Kyunghyun Cho, Jeff Dean among many others), learn more about AI for Africa, meet other students/community members and get acquainted with their work. And I was not disappointed!
Most of this post summarizes my notes on talks, panels and practical labs I attended. But first, here are 6 reasons I found the conference very compelling!
#1 The Quality and Depth of Technical Talks, Practicals and Panels
The indaba provided opportunities for in-depth (2hr) talks on important areas within deep learning — AI Revolution and its Frontiers by Nando de Freitas (Deepmind, Oxford) Fundamentals of deep learning by Moustapha Cisse (Google AI, Ghana), Convolutional Neural Networks by Naila Murray (Head of Computer Vision Research, Naver Labs), Recurrent Neural Networks by Kyunghun Cho (Professor, NYU), Reinforcement Learning (Katja Hofmann) Success Stories of Deep Learning (David Silver, DeepMind) etc . Most of these lectures allowed presenters to explore fundamental questions, intersperse much appreciated audience interaction (presenter invites audience to speculate on rational behind established practices or provide responses to questions) and address the slew of good questions asked by the audience. Importantly the presenters made themselves available after their presentations for questions/discussions, visited participant poster sessions (gave feedback) and were available to discuss over the conference days!
These topical areas were also followed by practical lab sessions where we walked through code to implement these concepts with additional tasks implemented by participants.
There were also interesting panel discussion sessions covering important topics such as “AI for Africa” , “Machine Learning in Production”, “AI Ethics and Policy” amongst others. I particularly enjoyed the session on “How to write a good a good research paper” where Ulrich Paquet, Martin Arjovsky, Stefan Gouws, Kyunghun Cho shared some very good (and sometimes passionate) advice on best practices for research writing both from the writer, and reviewer perspective. Extra points to Ulrich for making us promise to not engage in some dont’s (don’t ignore a reviewers feedback, don’t write without real motivation..) … and to Martin for passionately arguing for strong experiments/references to back every claim as well as the removal of all irrelevant proofs or theorems. More details on this panel are provided later.
#2 The Community and Feeling of Community
There is something inspirational (perhaps magical) about having 550+ conference attendees scurrying between buildings, finding sessions, writing code, listening to talks, and discussing ideas and research opportunities in machine learning. Everyone truly embraced the theme and spirit of the conference — Masakhane — we build together!
The 2018 Indaba theme is Masakhane. Bringing the African ML/AI/DS community together to learn, share and collaborate. We work to build together across the continent.
It is exciting and impressive what the Deep Learning Indaba organizers have accomplished. Huge credits to that team! It is also remarkable how the subject of Deep Learning has inspired so many within the community — such that people are learning independently, conducting experiments and exploring new ideas and areas.
While it was a relatively small conference — what struck me was the potential for even greater impact here. As Moustapha Cisse mentioned, Africa has some of the youngest population in the world, providing immense opportunity to really drive change. Furthermore, given that AI is only beginning to take root across Africa, we have the opportunity to do things right from the get go — ethical research practices for AI, responsible axiological development of AI and the design of accountability structures that mitigate the negative effects of any (unavoidable) capitalist agenda.
#3 The Opportunities to Discuss Research and Get Feedback
While it is just impossible to talk to everyone, the organization and structure of the conference provided numerous opportunities for this. Daily poster sessions (12.00–14:00) during lunch period, two social evenings, short interactive activities (discuss concepts with your neighbor) during talks/labs etc. This setup allowed me get some excellent feedback and meaningful discussions while presenting my poster.
In addition, I got to have a serendipitous (and very nice) discussion with two Reinforcement Learning experts at breakfast on Day 4 — Katja Hofmann from Microsoft Research Cambridge and David Silver, head of Reinforcement Learning at Deepmind, where they graciously shared their stories and recent work on RL.
They encouraged me to read the book by Sutton and Barto “Reinforcement Learning: An Introduction”. David credits this book as an inspiration for his interest in Reinforcement Learning and Katja mentioned she recommends it to all who are interested in RL.
I will certainly be going through the book over the next few months and if you are interested in RL … do so too! The book is available as a free download.
The closing social event on Day 5 was also very well organized — additional opportunities to discuss, network, enjoy good food and music …. and dance!
#4 Learning about Compelling Projects in Africa
I was really happy to learn about several research labs conducting research across Africa and also learn about compelling AI projects by students and researchers across universities in Africa. I learned about the SKA telescope hosted in South Africa (one of the largest telescopes in the world) and research which it enables. I learned about H3ABionet — a Pan African Bioinformatics network comprising 32 Bioinformatics research groups distributed amongst 15 African countries and 2 partner Institutions based in the USA. These units support H3Africa researchers and their projects (genomic , demographic, disease, data collection etc) while developing Bioinformatics capacity within Africa. I also got to see numerous interesting student/researcher posters spanning applications in Deep Learning for plant disease detection, Reinforcement Learning for addressing bin packing problems (instadeep), Generation of drug molecules for curing diseases using GANs etc. A set of interesting spotlight talks in the computer vision domain can be found here.
It was also inspirational to see the award winning projects (Kambule and Maathai award) and all the other research projects that got awards from Google, NVIDIA, IBM and Microsoft. Huge congratulations to all the winners!
Given all of the above progress being made, a consensus notion throughout the conference was that the contributions so far are meaningful but small compared to the overall opportunity.
#5 Beautiful Stellenbosch
It was my first time in South Africa and I was pleasantly surprised by the rather exquisite landscape on the drive from Cape Town airport to the conference venue at Stellenbosch. Changing scenery of fields, vineyards, hills/mountains and lots of trees/green. While I did not have much time to visit much of the countryside, I found the Stellenbosch university campus to be both beautiful and very well appointed.
6# The TPUs
During Jeff Dean’s talk (in true Oprah style), he mentioned that all conference attendees would be provided with access to Google’s research cloud - 5 regular TPUs and 20 preemptible TPUs per person for the next few months. Thank you Google! This gesture is really impactful as access to compute can be a major limitation, particularly for independent researchers working within small groups across Africa. It is great to see this level of support from large companies, and I am really excited to see the range of research results that this will enable.
At the end of the conference, saying goodbyes was hard — a testament to the strength of the community (someone actually told me it felt like the end of summer camp where everyone felt warm and fuzzy … and sad the camp was over). However, I think Shakir (one of the organizers) got it spot on when he mentioned in his closing remarks that the real work starts when we all go back to our institutions and engage in high quality research — this is what will build up and truly elevate the community.
I found the talks to be excellent, the labs to be useful in getting familiar with (or refreshed on) implementing NNs and the panels to be illuminating.
Feedback: The one area which I think would be good to cover in a future Deep Learning Indaba is related to exploring the UX of AI deployments in Africa — with a focus on training and deploying models on resource constrained devices. This is critical, especially given that most users within Africa only have access to such devices. I had a brief chat with Sarah Hooker who also mentioned model quantization as an interesting focus area! I look forward to helping out with a session that helps address this.
I absolutely look forward to attending the event again next year — Deep Learning Indaba 2019, Kenya!
Notes from Sessions, Labs and Panels
A collection of notes taken during sessions, labs and panels as well as links to resources.
Day 1: Practical Machine Learning Basics
Lab | Mathematics for Machine Learning | Github
Talk | Mathematics for Machine Learning | Avishkar Bhoopchand, Deepmind | Slides | lecture notes
This talk covered important areas of mathematics relevant for machine learning.— probability theory and bayesian inference, linear algebra, matrix decomposition, differentiation, optimization, integration, functional analysis. Probability distributions — (how likely it is for a random variable to take on a one of several values). Bernouli, binomial, Gaussian, etc. Joint probability, marginal probability, conditional probability etc. Differentiation and integration. Sum, product, chain rules. Jacobian vectors and matrices.
Day 2: Deep Learning Fundamentals, Feed Forward Models, Convolutional Models.
Talk | Deep Learning Fundamentals | Moustapha Cisse
This was a really interesting talk and what I got out of it was some really clear explanations and intuitions behind common decisions in DL practice. These explanations were enabled by the great set of questions asked during the talk. Why SGD instead of analytical approaches to solving for regression coefficients (matrix invertibility, computation costs, lack of closed form solutions)? Why NNs (some functions are not linearly separable)? Why not not explore preprocessing steps that make such functions linearly separable? Why large models (theoretical results that show shallow models need exponentially large parameters to represent some functions, and results that show it is easier to optimize larger models due to the shape/landscape of such models)? Why do RELUs perform better (they are non-saturating activation functions and this helps address gradient vanishing)? Why and when to use which optimizer (Adam works well for datasets which contain rare events)?
Lab | Feed Forward Networks | Github
The tutorial covered concepts on building a multi layer neural network for classifying fashion MNIST data (Tensorflow Eager and Keras). I found the tasks in implementing DropOut and BatchNorm to address overfitting issues to be an excellent refresher on these concepts.
Day 3: Convolutional Networks, Probabilistic Thinking, Recurrent Models.
Lab | Convolutional Neural Networks | Github
This lab included tasks on the impact of initialization schemes for convnets (Xavier, He, Glorot initialization) and the simple plot_model keras method for visualizing a model (handy for papers etc).
Talk | Convolutional Networks | Naila Murray (Naver Labs) —
Convnets are used to process data with meaningful local relationships. Motivations for using CNNs — allows for sparse connectivity, parameter sharing, translation equivariance, arbitrary input sizes.
Talk | Probabilistic Thinking | Yabebal Fantaye,
First quick task — “An urn contains 5 red balls and 7 green balls. First draw is done without disclosure. Second is disclosed. Does probability of first change with knowledge of the second?”. Probably can be viewed in terms of syntax (objective universally true notation) and semantics (interpretation based on either frequentist or bayesian views). There is no probability without condition.
Under what conditions is p(x|y) > p (x)? When there are correlations between both variables e.g. probability of wearing a jacket is greater if it is raining (condition y) than base probably of wearing a jacket.
Bayes theorem | p(x|y) = p(y|x) p(x) / p(y).
Talk | Text Classification | Kyunghun Cho (NYU)
Prof. Cho’s talk covered concepts related to training neural networks and language modeling; discussions on Softmax as a generalization of a benoulli distribution; back-propagation as a case of maintaining and managing jakobian vectors/matrices; early stopping close to the solution of the minimization problem to avoid overfitting (especially with SGD and large datasets). He also covered formulations where CNNs+ RNNs are used for SOTA NMT and language modeling tasks. An interesting recommendation was that it was better for performance to train an embedding from scratch if we have enough data. He also covered character based models, their expressivity and superior performance for handling new tokens within translation tasks.
Lab | Recurrent Neural Networks | Github
Lab covered implementing a small feed forward recurrent neural network (with Keras) and using it to model/predict a simple repeating signal (sinusoidal time-series).
Panel | How to write a good research paper | Slides
Martin Ajovsky, Stefan Gouws, Kyunghun Cho, Ulrich Paquet.
Below are a list of important points from the panelists:
- Simon Peyton Jones — How to write a great research paper.
- Write before starting the experiments. Experiments should be done to put evidence on the claims. So formulate the claims first.
- Good writing is good editing. Writing is more of an iterative task. Write out the idea, distribute the ideas and get feedback.
- The entire paper should tell One story. Repeat this in the abstract, repeat it in the introduction, run experiments to support this claim, repeat claim in conclusion. Across the paper, follow up on this one story.
- Figures: All papers should have one main figure that encapsulate the main message in the paper. All figures and tables must have self explanatory captions. E.g. Figure X is a table of model performance; we can see on the top right that the xx model performs better (bolden).
- Preschedule a paper swapping session e.g. 3–5 days before the deadline. Getting early feedback is important.
- Treasure feedback from reviewers. Some authors dont adopt feedback and resubmit same unrevised paper to a new venue. Bad form.
- Dont ask to be an author unless you have really contributed at least 5% of the work.
- Research results must be falsifiable — add benchmarks, tests etc and strong evaluation.
- Let the motivation be real. Do not simply adopt a generic motivation such as “Neural networks are popular so we use it too”.
- Care about the readers mental model — what does the reader understand up until this point … what is the reader thinking? what is on their mind? how will my next narrative change that? Don’t switch notation, do not use notation that is not defined or defined later in the paper.
- You can create your own benchmarks or baselines for some tasks that are novel. This is ok.
- Writing should be a daily habit. Writing IS research ..
- Best papers are completed long before the deadline. Two weeks before the deadline, send to critics and then integrate feedback. The best papers are the ones that have been heavily critiqued and sometimes resubmitted after a revision/rejection.
- Ask yourself .. what have I learned from the paper, or what can people reuse from this paper? How can this paper make people imagine or gain new ideas?
- Don’t assume reviewers are experts in your field. Be clear and bring your message across in the clearest way as possible. Reviewers have a short amount of time ~1hr ? In the end, reviewers may be looking to answer the question — “how can I justify rejecting this paper”? Don’t make it easy for them.
- While it is tempting, avoid telling story of the journey within the paper — skip to the main points. Science is a random walk. However, report the shortest path.
- Unlike traditional prose, research writing should have no surprises — every important concept should be presented upfront.
- Never make any claim that is not directly validated by a theorem, experiment or a reference. If you claim something is a problem, experimentally show this! More claims is not always better.
- Highlight problems and negative results. Remove meaningless proofs.
Day 4.: Generative Models, Reinforcement Learning
Talk | Reinforcement Learning (RL)| Katja Hoffman
RL is focused on decision making and learning under uncertainty. How agents can make decisions under uncertain environments. Early RL was inspired by optimal control problems.There are relationships between RL and neurological phenomena e.g dopamine neurons firing represents the brain’s way of modeling a reward signal (Schultz et al 1997).
Formalizing RL: we need a common language to describe a wide variety of problems. Agent (takes actions A) within an environment (provides states S). Relationship between agent and environment proceeds in time steps. Reward is information agent receives that can used to update their decision policy p(a|s).
The RL process is represented by a Markov Decision Process … M = (S,A,P,R,γ). It adheres to the Markov property (dynamics only depend on the most recent state and action.). Discount parameter γ allows a weighted effect on rewards (recent rewards have more weight).
Lab | Reinforcement Learning | Github
Day 5: Non-Recurrent Sequence Models, Frontiers of CV, AI for Africa
Panel | AI for Africa | Nyalleng Moorosi, Moustapha Cisse, Linet Kwamboka, Jon Lenchner, Sumir Panji.
Panelists discussed their work across the African continent.
Nyalleng identified interesting areas — data science with a focus on Africa. Data for science (biosciences, energy, natural resources, defense), Data science for gov (municipalities, SMEs, ), data science for education and advocacy (workshops, training etc)
Linet Kwamboka highlighted limiations of AI (cultural images vs indecent content?), limitations in langauges translations. She also showed examples of how incorrect translations can have serious consequences (e.g. on an official website of Sudan, a title containing the word Sudan was incorrectly translated as “India”).
Jon Lenchner gave an overview of IBM projects across the African continent. He mentioned IBM explores the most pressing problems, adopting machine learning approaches only where appropriate. He mentioned projects that used airtime top off pattern to model credit score rating; studies on correlation between malaria gene and breast cancers; using DL to speed up the management of cancer records etc.
Sumir detailed the fascinating work done by Human Heredity and Health in Africa (H3Africa). The institution served to facilitate a contemporary research approach for the study of genomics and environmental determinants of common diseases with the goal of improving the health in Africa. H3africa.net is composed of 15 collaborative centers, with 48 funded H3Africa projects; with over 170 Million USD invested. H3Africa projects include data collection for phenotype data, demographic information, anthropometric data, disease and health related phenotype data. These can be found at Biorepository.h3africa.net. Sumir also mention a machine learning based project from H3ABionet led by Dr Amel Ghouila from Institute Pasteur Tunis. The ML project covers the development of training materials, handbooks, platforms, tools and packages. Sumir mentioned progress is sometimes held back by lack of collaboration — scientists see data as the currency for publications and are reluctant to share.
There was also a group discussion session period where participants shared ideas on potential ripe project areas in the AI domain — AI for personalized education, AI to support mental health issues, projects around data collection and sharing efforts, AI for payments (transparency etc).
Talk | Success Stories in Deep Reinforcement Learning | David Silver (Deepmind) | Slides
David gave an overview of important RL projects over the last few years.
- Design an agent that can solve any human-level task. Reinforcement learning defines the objective. Deep learning gives the mechanisms.
- DL In Practice: We use NNs to represent the value function , policy or model of RL. After this we optimize the loss function end-to-end.
- First success story: TD Backgammon. Tesauro 1992
- Mnih 2014 DQN on Atari. Can we learn a model that learns to generalize across multiple games? Goal was to maximize the score given the pixels. Modeled using CNNs. Represent value function by a q-network with weights w.
- AlphaGo. The search space is exponential for the game of go. Address this by training a policy network (predict the next move with confidence) and a value network (understand and evaluate the state and board position). Initial phase used supervised learning from examples or experts.
- AlphaZero explores updates to Alphago — no human examples etc repeated learning from self play, monte carlo tree search. Trained for 8 hours on several thousand TPUs. Examples of strategies selected by AlphaZero showed it discovered some common opening moves which it later discarded for more complex strategies it discovered.
- Dota2 — Multi agent domains with DeepRL.
- David also introduced (for the first time) his 10 principles for RL.
Panel | NLP Frontiers Panel | Sebastian Ruder, Herman Kamper, Stefan Gouws, Jade Abbot, Omoju Miller
This panel started with Sebastian Ruder giving an overview of important NLP work over the last few years.
Open Problems in NLP: Based on a survey from NLP experts, the set of problem areas were summarized as — Natural Language understanding, NLP for low resource scenarios, Reasoning about large or multiple documents, Dataset problems and Evaluation.
An interesting idea that was also discussed was related to unsupervised speech recognition via alignment and unsupervised language modeling. These include methods for mapping/morphing embedding space across languages without parallel texts — e.g. train corpus in different languages, project into a latent embedding space and use known words to align both spaces. We can use this alignment information for document/topic classification and sentiment analysis; however it is too imprecise for language translation.