Why Deep Learning is a Design Challenge

And how design can improve technical performance of artificial intelligence

Deep learning and design are seen as being on opposite sides of the spectrum, one seemingly requiring expert technical knowledge and the other innovative creativity. However, more recently these two fields are converging. Companies like IBM are ramping up the design departments of their technology and Alibaba has even set up an AI Design Lab working on The future of AI & Design (Chinese). This article highlights the importance of design in the development and application of deep learning solutions. It will discuss the designing of training data, measuring performance and different design methodologies that can be useful for both deep learning engineers and designers.

What is Deep Learning

Deep learning is part of the larger family of Artificial Intelligence and machine learning. Machine learning gives systems the ability to learn to perform a specific task based on previous experience and to continuously improve itself by getting a more diverse experience. In deep learning specifically, mathematical models use a neural network to find a pattern in the experience that you feed it. A basic representation of a neuron within such a network is called a sigmoid neuron. It takes in inputs x, these inputs have a weight w. When you multiply the inputs with their respective weights and they pass a threshold value, the neuron fires an output. Once several of these neurons are connected, you get a neural network. The main power of a neural network is that it represents extremely complex real-life data in a way that allows you to perform calculations on that data.

Neural Networks and Deep Learning — Michael Nielsen

Deep Learning has been around for decades, but why is it only getting so popular now? The main reasons are computational power and data. Moore’s Law came true and computing power has been growing relentlessly. In addition, social media, smartphones and other sensors around the world are contributing to an ever growing amount of available data; every day more than 2,5 billion Gb of data is generated — a lot more than in the 80’s, when the first deep learning model was developed. In order to train a neural network, you need to perform many calculations, and the combination of these two developments gave way to the implementation of deep learning in the current era.

On the engineering and science side there are an enormous amount of unanswered questions: how to build the right hardware for deep learning, coming up with improved architectures, developing machine learning models that rely less heavily on data than deep learning and more. The hype on deep learning is a little risky. There are many other mathematical models that can perform specific tasks much more efficiently or with much less data. While keeping this in mind, this article offers design opportunities within the deep learning process specifically.

Applied Deep Learning

A lot of scientific work was required to get to current state of deep learning networks. However, most of the solutions that we use at the moment are based on architectures that were developed years ago. Research is still essential for the development of better deep learning architectures and smarter algorithms. There are however many areas that are benefiting from existing neural networks, such as natural language processing, computer vision, multimodal intelligence and time series analysis. The main industries that are on the brink of a deep learning revolution, can be seen in ‘Notes from the AI frontier, applications and value of deep learning’ by McKinsey & Company.

Tools like Tensorflow, education platforms like Coursera and public research institutes such as OpenAI are quickly democratising the skill of deep learning, previously only available to mathematicians and later computer science engineers. Similar to how computers were at first reserved to large corporations and government, they slowly trickled through to people’s homes and into our pockets in the form of tablets and smartphones. While a good understanding of programming and electronics was initially needed, even a 6 year-old is able to operate a computer now. Deep Learning is not there yet, but the process is well underway — this is where design comes in. Once the computer science skills threshold is reduced, it is all about how you use the tool.

Deep playground is an interactive visualization of neural networks — TensorFlow

Within self-driving vehicle development, deep learning networks are mostly used for motion planning. Based on the input of the sensors of the vehicle (camera, radar, lidar, …), the vehicle analyses its environments and crafts a virtual understanding of its surroundings. The main field of deep learning that is being used to achieve motion planning is computer vision.

In the MIT course ‘Deep Learning for Self-Driving Cars’, Research Scientist Lex Fridman describes the priorities for a successful computer vision Deep Learning solution:

  1. Data: A lot of real-world data
  2. Semi-supervised: Human annotations of representative subsets of data
  3. Efficient annotation: Specialised annotation tooling
  4. Hardware: Large-scale distributed computing power and storage
  5. Robustness: Algorithms that don’t need calibration
  6. Temporal dynamics: Algorithms that consider time
“It starts and ends with data. Once you have the data, you have to annotate it. Raw data: video, audio, lidar, etc. needs to be reduced into real-life representative cases. We need to collect the 100% and then use (semi-) automated ways to find the 1% of data that we need to train the neural networks. The big takeaway is that collecting, cleaning and annotating data is much more important than great algorithms, as long as you have good algorithms.”


Similar to any engineering challenge, it is important to define the problem, conduct background research, specify the requirements, the desired output, and only then can you think of how to solve that problem with a deep learning solution. Once these definitions are in place you can start building the network and its success highly depends on the data you feed it with. The main challenges in training a network are basically ‘dirty work’

  • Data gathering
  • Data filtering
  • Data labelling

Only once the data has been labelled, you can move on to the more exciting part of choosing the network architecture and finding out how to improve the model (spoiler: often with better ‘dirty work’).

Designing training data

Besides the obvious relevance of design in defining the functionality of a deep learning solution, design is especially useful in the dirty part: data gathering, data filtering and data labelling. Within these phases of the process, creativity is necessary if you want to avoid losing too much time, adding too many costs, or requiring too much data. Every specific application of deep learning will have different requirements and thus require different training data.

Gathering the right data

The accuracy of a deep learning algorithm depends wholly on the data that it was trained with. Gathering this data can be a challenge if it is not readily available. Imagine you need to get data about how people hail a taxi with gestures. This is not a large and labelled dataset that you can just download from the web. Generating this data can be designed in many ways, from putting a camera on a taxi -requiring a lot of filtering- to asking your colleague to enact the situation in a film studio -giving a very narrow and biased dataset. An interesting avenue is to use crowdsourcing and involve many people in the challenge.

A great example of design for crowdsourced data gathering was developed by Coord. Their AR app allows the general public to contribute to the Coord dataset. In ‘The future of mobility starts at the curb’, CEO Stephen Smyth describes the process in which they gather their data. “To address this challenge, we decided to take to the streets ourselves with a next-generation surveying tool: a smartphone app that leverages augmented reality technology to help ‘code the curb’.” The main motivator for users of this app to generate that data is that it will improve the interaction with their personal environment (finding a free spot to park). With this data, Coord can design a user-friendly dataset that gives people access to previously unavailable information.

Gather data through AR app on mobile phones — Coord (Brian Dawson)

Data synthesis

One specific type of generating data that is applied when you want to create a more diverse dataset is called data synthesis. The paper ‘Do We Really Need to Collect Millions of Faces

for Effective Face Recognition?’ gives a good description for the need of data synthesis. If you do not have access to 200 million labeled faces for training a network, then you need to find a creative way to process a lower amount of images with domain specific tools, before being analysed by CNN’s. A part of data synthesis is data augmentation, which means you augment part of the data, to generate more of it. Several simple augmentation methods are basically mirroring images, rotating them, cropping them, already tripling the amount of training images and adding more angles of the face. A more advanced version that is described is the merging of the image with a 3D model of a face. This allows them to generate an unlimited amount of poses, shapes and expressions, by generating new face images. A risk that should always be kept in mind, is that the data that is generated by data augmentation is still very closely related to your existing data. This technique is only useful for intra-class variations (differences of appearance that do not change subject label), not inter-class appearance variations (differences between different types of subjects).

As you can now imagine, there are many skills that designers have, that can prove useful in this process. An advanced solution to this problem is this Panoptic Studio. The studio films 3D poses from every possible angle, generating a huge dataset by just filming one behaviour of someone in a studio. This solution is useful for joint position recognition, but if they would have added a green screen to the background, they could have augmented the people in an unlimited amount of environments for per-pixel recognition models. These applications of design in generating datasets can make the life of deep learning engineers and data scientists much more pleasant.

Data filtering

Once the raw data is gathered it needs to be filtered to make sure that the algorithm learns the correct behaviour. Imagine you are trying to make a neural network that recognises whether someone is limping, so you can inform traffic lights to reduce the speed of crossing. You’ll need many videos of people limping to recognise that behaviour. A large amount of videos can probably be scraped off youtube, using the search terms ‘limping person’. However, there are several issues with this data. Some of the videos are cartoons, others show only a close-up of a leg. In order to train a network with that data, it needs to be cleaned up. Human filtering can be done through tools such as Amazon’s Mechanical Turk, paying a fee per filtered amount of data, but this method can be very costly and time-consuming. An automatic method would improve the efficiency and be a more long-term solution.

There are several ‘data normalisation methods’ applied by data scientists in basic data filtering, such as min-max scaling, duplicate and outlier detection techniques. These will allow you to make the data more representative of the problem (eg: make the contrast of the image frames similar) or remove data that seems to be too different to the average data (eg: shapes that are obviously not human). From a data perspective these choices are suitable, but there are interesting solutions that you can make from a design perspective as well.

Data-driven solutions look only at the data, ignoring possible implications that certain normalisation methods might have on the real-world implementation of the network. Taking outlier detection techniques as an example, a design perspective will highlight the negative implications when these outliers are removed. Imagine you are training a face recognition model and a specific ethnicity is underrepresented in your dataset, it will perform less good in your validation of the model. From a data perspective it makes sense to remove those outliers from the dataset. However, it will most likely be an unwanted side-effect for the usability and social acceptance of that product.

Ultimately, normalisation methods can (and probably must) be applied to bring it all together, but the design method gives you more control over the type of data that you use to train your networks and prevent unwanted side-effects of the general normalisation methods.

Data labelling

The most well-known public example of crowdsourcing for data labelling is reCAPTCHA. By both creating a security tool to differentiate bots from humans, Google was able to obtain millions of labels from the average internet user. At first it was used to digitise text from books into a digital format. By now you have probably encountered the image labelling reCAPTCHA, where you have to select parts of an image that contain a specific object. This means that Google is getting their image dataset labelled with every time that someone wants to log in to a website that requires a higher security.

reCAPTCHA: Tough on Bots, Easy on Humans — Google Developers

With design solutions like these, labelling becomes more of a value generator than a chore. Not only does it offer an advantage to the user of the product, it generates valuable information for the service provider as well, in a user-friendly way. Understanding the needs of the multiple user groups will allow for a design that solves problems on both sides.

Evaluating model performance

Bias and Variance

The accuracy of a neural network can be improved by understanding the bias and variance errors. A bias error occurs when a network is informed by data where assumptions are too simplified. Variance errors occur when the data that feeds the model is too specific. Finding out the bias or variance is an engineering problem at first: your model will need to output the error rate at every stage of training your network. This way you can find out whether your training, development and evaluation data is good. The error rate tells you how good the algorithm is performing on that dataset. Based on the values of the bias and variance errors, it is clear how to improve the model, whether you need to create a bigger model, add more data, train for longer, add regularisation or change architecture.

Although most of the improvements to neural network functionality are engineering activities, better data will become the largest challenge once more human-like tasks need to be performed by artificial intelligence. John Maeda’s 2018 Design in Tech report includes a complete chapter on AI with the main focus being inclusion, meaning human diversity, social inclusion and equality. Machine learning algorithms are biased by nature, because we decide which data to train it with. An example of the importance of bias is Apple’s FaceID. It faced backlash on being inaccurate for specific user groups, causing them to release a white paper on it which states:

“We worked with participants from around the world to include a representative group of people accounting for gender, age, ethnicity, and other factors. We augmented the studies as needed to provide a high degree of accuracy for a diverse range of users.”

Ultimately this is an inclusion issue. With design engineering tools such as ‘Design Specification’ this can be prepared for and translated into a suitable brief for engineers, where all possible uses, users and risks of a product are taken into account. Several companies are already focusing on the accessibility of these tools, to allow for a broader conversation and governments such as the European Union are in the process of setting up an appropriate ethical and legal framework for the development of artificial intelligence, which will need to be taken account in the development of new deep learning solutions.

Defining performance measures

Of course there are many sides to deep learning that are purely an engineering or a science challenge. Building the whole software architecture, performing the full cycle from preparing the datasets to validating a model and selecting the right model architecture still requires expert engineering skills.

Deep learning engineers can get stuck on the technical side of the problem. How do you go from a deep learning optimization problem to a real world solution? And how do you evaluate its performance? Setting key performance indicators to measure the success of your models and communicating its progress is essential. Interview techniques and data visualisation tools are very relevant in this case. A designer can provide a visual dashboard of the deep learning process that makes it more descriptive and relevant to measure performance.

The communication gap between engineers and the rest of the company or the general public can also be improved using design techniques. Through mediation between business and engineering teams, more communicative key performance indicators of the models can be defined. Instead of just a technical description of a segmentation algorithm that needs to be improved, it is more useful to know that it is being done to recognise where the road is, so that it improves the accuracy of understanding whether a person is going to cross; the rest of the company might actually understand how the software is improving and why the engineers are working on specific problems.

To be continued…

To bring together this information, what must a designer understand of deep learning and what must a deep learning engineer understand of design? The main takeaway is that there is much scope for designers and deep learning engineers to benefit from each other. Properly designed data strategy solutions can massively improve the performance and building process of deep learning networks. Thus allowing the deep learning engineer to focus on more than just the dirty work.

An understanding of different design methodologies will help deep learning engineers in solving problems they encounter in the development process. On the other side, an understanding of the process and capabilities of deep learning technology will allow a designer to develop more realistic solutions and a better communication with engineers. For both parties it helps development new innovative solutions through insights they would not have otherwise gained. Design and deep learning combined, make a much more powerful team.

For design tools for deep learning specifically, read the follow-up article Design Methodologies for Deep Learning.

This piece was written by Leslie Nooteboom, CDO and Co-Founder of Humanising Autonomy. We develop natural interactions between people and autonomous vehicles. 👋🤖🚗

Get in touch with us at hello@humanisingautonomy.com 📩