Rethinking Design Tools in the Age of Machine Learning
The creative reach of the individual is expanding.
The assortment of available tools, platforms and devices for design is growing while their costs are diminishing. You can make a film, record an album, design a city or print your own flower pot. You can do all of this on a home computer or even on your phone.
Naturally, we all want to try our hands at these exciting new possibilities. We want to revel in the freedom to make things. We like the idea of being Renaissance thinkers, makers, doers.
But just because the tools have gotten cheaper, more accessible and easier to use does not mean that it’s gotten any easier to create a powerful image or tell a compelling story.
Saying something original is as difficult as ever. Structuring the many parts of an aesthetic experience is as difficult as ever. These things still take expertise, practice, experience.
Design tools and programming languages put a lot of power in our hands. But the power isn’t really ours unless we know how to use it. And not just use it, but do something of our own, have an original idea within a given medium.
Design tools shouldn’t help you to feel like you’re making something. They should help you to actually make something of your own.
Design Tools + Machine Learning
I would like to suggest that machine learning can help us to simplify design tools without limiting their expressivity, without taking creative control away from the designer.
This may seem totally counter-intuitive. When we think of machine learning or artificial intelligence, we think of automation.
In a design context, we would probably imagine something like this:
which in all cases would return this image:
There is no getting around the fact that producing an original design involves making a lot of decisions. Decisions take time.
As a result, design tools have tended towards two opposite extremes:
On one end of the spectrum, we have the “one-size-fits-all” approach, which can generally be found in consumer-level design tools — tools that simplify design processes by forcing users into one of a handful of pre-ordained templates.
On the other end of the spectrum, we have the “kitchen sink” approach, utilized by professional design tools — tools that provide an overwhelming number of low-level features that come with steep learning curves and often do not coincide with the user’s way of thinking.
At first, it would appear that machine learning offers a slightly more sophisticated version of the “one-size-fits-all” approach — a way of simplifying design processes by shifting some of the decision-making responsibilities away from the designer.
I’ll admit that machine learning can be used this way and is likely to be, especially in its early years. But it also offers much richer possibilities.
Though we can’t really change the number of decisions involved in a design process, we can change what’s involved in making those decisions.
I’d like to talk about a few ways that machine learning can transform how we interact with design software and make decisions through it.
They are: emergent feature sets, design through exploration, design by description, process organization and conversational interfaces.
I think these ideas have the potential to streamline design processes without taking creative control from the designer.
But, perhaps even more exciting is that these mechanisms will enable designers to give their entire focus to the design work itself rather than on learning how to map their ideas to the ways in which a particular tool has been organized.
In other words, the designer will lead the tool, and not the other way around.
Emergent Feature Sets
When a designer sits down to produce a design, maybe they have an exact picture in their mind of the end product and maybe they don’t.
In either case, they need to find their way to that end product by uncovering a sequence of component tool actions that transform the blank canvas into the final product.
This reminds me of a famous quote:
“Every block of stone has a statue inside it and it is the task of the sculptor to discover it.” — [attributed to] Michelangelo
I like this quote because it frames artistic and design processes as a kind of search.
A block of marble has certain concrete boundaries and within these boundaries, an infinite number of possible sculptures exist simultaneously. The artist’s job is to discover the needle in the haystack — a particular combination of properties that meet a specific set of requirements.
This is much like a chemist searching for a new molecule or a chef looking for a new flavor combination.
The search-spaces may be entirely different for these problems, but there is a definite similarity of process in that every design problem relates to a particular set of interrelated properties and constraints.
Let’s look at some of the considerations that might be involved in designing a household object such as a wine glass.
If we make the glass taller, we probably need to widen the base to prevent it from tipping over too easily. Here, we vary two properties with respect to one constraint.
When we first encounter the problem, we may not have an internal sense of the constraint: “At what point does this property ratio cause the glass to tip over?”
We gain expertise by experimenting within the search space, learning the relation of properties to one another and to an initially unknown set of constraints imposed by the physical world.
Let’s imagine this search-space as a very large map in which each possible end-state is represented by a unique set of coordinates.
In this map, each feature of the software acts as a road that takes us a certain distance in a particular direction.
A low-level feature (within a professional design tool) would be equivalent to a local road in the sense that it moves us only a short distance within the map.
If we want to make the glass larger, we could apply a sequence of low-level commands.
Or, we could distill this sequence into a single high-level feature, like those offered by consumer design tools. This feature would act more like a highway.
The nice thing about highways is that they take us great distances with a relatively small number of component actions.
The problem, however, is that highways only have exit ramps at commonly-visited destinations. To reach more obscure destinations, the driver must take local roads, which requires a greater number of component actions.
Yet, in most consumer design tools, we are not even given the option to take local roads.
Perhaps a slightly different destination (just a short distance from the exit ramp) would have been more to our liking, but we have no way of reaching it. Furthermore, we may not even be aware of this alternate destination or the effect it would have upon our overall design goals.
So, while high-level tools have the benefit of moving us through the search-space more quickly, they also reduce expressiveness or the ability to move anywhere within the space.
As a result, many users will stop at more easily reachable destinations, leaving vast regions of the map unexplored.
A determined user may find a winding path to their destination.
But if the exact properties of that destination are not fully articulated ahead of time in the user’s mind, it is unlikely that he or she will navigate there organically.
So, while a high-level tool may make only small regions of the map completely inaccessible, its fragmentation of the search-space makes much larger regions practically inaccessible.
In this sense, consumer-level design tools do not extend the human reach, they force our creative explorations into narrow passageways.
If we want to maintain creative freedom, we seemingly need to either stick to low-level operations or generate a very large number of high-level features that would cover a wider range of possible use cases but would also sacrifice the succinctness of the tool’s vocabulary.
Ideally, a new highway would be constructed for us whenever we leave home so that we could arrive at any possible destination with a small number of actions.
But this is not possible within a pre-built high-level feature set.
Machine learning allows us to extrapolate a great deal of information about users and what they wish to achieve through the observation of their behaviors.
Rather than trying to anticipate the designer’s needs through a pre-built high-level feature set, we can instead create tools that learn from the designer’s engagement with the software.
Over the last few years, a type of machine learning system called “recurrent neural networks” have been shown to be particularly adept at learning sequential patterns. These systems have been applied to tasks such as predicting the next characters in a string of text or the next notes in a piece of music.
Rather than constructing design tools around an immutable set of pre-built high-level features, a recurrent neural network can be employed to discover commonly-used sequences of low-level features and dynamically synthesize purpose-built features related to the designer’s current activity.
The behavioral patterns used in the automated production of custom high-level features can be mined from individual designers or across many designers.
Somewhat like recommendation systems that suggest music or movies based on the similarities of users’ tastes, the discovery of patterns across numerous designers can be employed to suggest relevant features to an individual based on the workflows he or she tends to utilize.
This will allow toolmakers to better address the diversity of designers and their varied ways of digesting information, making decisions and interacting with software.
It will enable toolmakers to meet designers where they are rather than asking them to adapt to a singular pre-ordained mentality or workflow offered by a more conventional, static feature set.
By extrapolating behavioral patterns across many users, toolmakers can better understand the implicit relationships between the features offered within their systems.
This will provide toolmakers with important insights for how to improve their software.
By adopting this methodology, the toolmaker’s role would shift away from the overall curation of high-level functionality and towards the creation of more granular interface elements.
This movement from preset rule systems and interfaces to implicit, intelligently-generated ones means that toolmakers would be relinquishing some control of certain aspects of the software.
Doing so, however, would enable designers to address tasks that have not been explicitly anticipated by the toolmaker.
Design through Exploration
Everyone has an innate sense of aesthetics and design — a feel for what is pleasing or useful.
But many of us lack the vocabulary, methodology or confidence to apply these intuitions towards actual creative output in a design field with which we have no prior experience.
Design tools shouldn’t only help us to execute a design in a domain we already understand, they should also help us to build expertise in new design domains.
If we were to stop a random sample of people on the street, give them a blank piece of paper and ask them to design their ideal living room, many people would not know where to start.
But, if we instead gave them to access to Pinterest and asked them to design a living room by picking and choosing elements they like, many people would have a much easier time with the task.
This “know it when I see it” sensibility can form a powerful mechanism for driving our interactions with design tools.
The user doesn’t have to memorize the behavior attached to some obscure function name within a complex menu system. Instead, they can see the behavior applied to a copy of the scene and decide whether they like it.
Earlier, we looked at a two-dimensional visualization of a two-dimensional search-space. Though limited in scope, this visualization offers a straightforward yet fine-grained mechanism for design. The user simply needs to point to a location to arrive at any possible design within the boundaries of the search-space.
This spatial organization also enables the user to build a clear mental model for the effect of a given translation within the search-space.
Of course, very few real-world design problems are comprised by only two axes of variability. But, through the use of what is called a “dimensionality reduction” machine learning system, it is possible to produce a low-dimensional map of a high-dimensional feature-space.
In the animation above, I have presented a set of images of leaf silhouettes to a dimensionality reduction system.
As the training process unfolds, the algorithm reconfigures the position of each leaf in a two-dimensional map in order to find an arrangement in which similar leafs are positioned near one another.
This ultimately creates a continuous two-dimensional map that captures the full range of variability that was expressed by the original leaf images.
Once trained, this system can reconstruct the image associated with any two-dimensional coordinates within the map’s boundaries.
Remarkably, this can be done with coordinates for which no training sample was provided, thereby offering a simple mechanism for quickly constructing novel variations of a design.
We are free to visit any position on the map. We can explore the full range of possible entities that exist within the conceptual domain of leaves. We have equal access to all points and can freely jump between them. We can, for instance, visit the coordinates that represent the halfway point between a maple and oak leaf.
If we started with a fixed idea of what we wanted to design, we can use this process to arrive at it. But, if we want to explore a bit further, see what else might be possible, this approach allows us to move outward from our solution and try new things.
Alternatively, this map view could be temporarily superimposed over the project view. This would allow the user to explore possible variations of an element without losing a sense of the context in which it is embedded.
This sort of exploratory interface makes it possible to change elements of a design without having to redo them from scratch.
For example, let’s say we’ve created the above drawing of an oak leaf using Bezier paths and later decide that we want something that looks more like a maple leaf.
To do so in traditional design software, we need to map this transformation to the logic of a Bezier path.
As designers, we’re so accustomed to this workflow that it seems completely natural. But, in truth, the knowledge of how to manipulate a Bezier path is only tangentially related to the problem at hand.
Transforming one shape into the other is likely to undo much of our original effort in creating the first shape.
This adds a high cost to exploration. It shouldn’t be this way. To get to our neighbor’s house, we shouldn’t need to travel across town.
A variation control surface allows designers to be guided by their design intuitions rather than being limited by the proclivities of a particular tool’s way of abstracting the path to a given destination.
Through this approach, we are not reducing the designer’s control, we are simply removing the auxiliary demands and conceptual remapping imposed by an earlier generation of tools.
This frees designers to focus on building their expertise in the design decisions themselves rather than on the technical mechanisms through which those decisions are fulfilled.
Design by Description
The kinds of interfaces we’ve discussed so far are very much like maps in the conventional sense of the word.
Like any other map, we can add textual labels — street signs if you will. This allows us to navigate the design space verbally with commands like: “take me to a maple leaf.” Once there, we could say something like: “take me a bit closer to an oak leaf.”
This is really powerful on its own. But, it turns out that we can take this idea even further…
In 2013, Tomas Mikolov and others released a series of papers describing a set of techniques for producing low-dimensional maps that represent the conceptual relationships between words.
Much like the maps we’ve been discussing, words that are closely aligned in their real-world usage will also be located near one another in the word embedding map.
More incredibly, though, Mikolov and his collaborators discovered that it is possible to apply conceptually meaningful algebraic transformations to word vectors.
In other words, they discovered that you can perform algebra on real-world concepts.
For example, they showed that the result of the word-vector expression:
“Madrid — Spain + France”
is closer to “Paris” than to any other word vector.
“King — Man + Woman” results in a vector very close to “Queen”.
This fascinating mechanism provides a completely new way of thinking about design tools. It allows us to operate on visual concepts visually or linguistically without the use of auxiliary abstractions and control systems.
For example, if we were looking for an aesthetic that was like Picasso’s but not from the height of his (analytic) Cubist period, we could say something like:
We could do the same with auditory information or in any other medium.
In the last few years, similar techniques including “Style Transfer” and “Neural Doodle” have extended these mechanisms even further.
These techniques have been implemented in photo-sharing apps — not as features within more extensive design tools, but as a kind of novelty image filter, somewhat like Instagram or Photoshop filters.
As the Photoshop “filter bubble” of the 1990s proved, the novelty presentation of this functionality quickly turns to kitsch and does little to re-conceptualize or extend design processes in a meaningful way.
But as individual components of a larger and more wholistic design framework, these techniques provide a powerful mechanism for operating on media without leaving their native vocabulary, without mapping them onto an abstraction.
They allow us to explore and compose ideas through direct manipulation of the concept space in which these ideas reside.
But, as transformative as techniques may be, I think there’s still something missing.
Every designer knows that the hardest thing about design isn’t what’s involved in making an individual decision. The hard part is reconciling many component decisions to one another in order to produce a cohesive whole.
As designers, we have to move back and forth between many component decisions while keeping the whole in mind. Sometimes these component decisions conflict with one another.
Like a Rubik’s cube, we can’t simply solve one side of the cube and then move onto the next. This would cause us to undo some of our earlier work. We must solve all sides simultaneously.
This can be a very complex process and learning to navigate it is at the heart of what it means to become a designer.
While the machine learning techniques we’ve discussed can help to streamline these component decisions, they do not fully address this most difficult aspect of design.
To help designers build this kind of expertise, let’s explore two more concepts…
Process Organization and Conversational Interfaces
Simple expressions that communicate an individual command or point of information are more easily understood by machine learning systems than complex, multifaceted statements. However, one of the most difficult processes in designing something is thinking through how to break down a complex, dynamic system into discrete parts.
One of the most useful things design tools could do would be to help the designer through this process.
Tools could help the designer to deliver concise statements by creating interfaces and workflows that lead the user through a series of simple exercises or decision points that each address a single facet of a much larger and more complex task.
An excellent example of this approach is 20Q, an electronic version of the game Twenty Questions.
Like the original road trip game, 20Q asks the user to think of an object or famous person and then poses a series of multiple choice questions in order to discover what the user has in mind.
The first question posed in this process is always: “Is it classified as Animal, Vegetable, Mineral or Concept?”
Subsequent questions try to uncover further distinctions that extend from the information that the user has already provided.
For example, if the answer to the first question were “Animal,” then the next question posed might be “Is it a mammal?”
If instead the first answer were “Vegetable,” the next question might be “Is it usually green?”
Each of these subsequent questions can be answered with: Yes, No, Sometimes or Irrelevant.
20Q guesses the correct person, place or thing 80% of the time after twenty questions and 98% of the time after twenty five questions.
This system uses a kind of machine learning algorithm called a Learning Decision Tree to determine the sequence of questions that will lead to the correct answer in the smallest number of steps possible.
Using the data generated by previous users’ interactions with the system, the algorithm learns the relative value of each question in removing as many incorrect options as possible so that it can present the most important questions to the user first.
For example, if it were already known that the user had a famous person in mind, it would likely be more valuable for the next question to be whether the person is living than whether the person has written a book because only a small portion of all historic figures are alive today but many famous people have authored a book of one kind or another.
Though none of these questions individually encapsulates the entirety of what the user has in mind, a relatively small number of well-chosen questions can uncover the correct answer with surprising speed.
In addition to aiding the system’s comprehension of the user’s expressions, this process can benefit the user directly in his or her ability to communicate ideas more clearly and purposefully.
At its core, this process can be seen as a mechanism for discovering an optimal path through a large number of interrelated decisions.
Each question and answer interaction serves as a translation vector through concept-space, moving the user a bit closer to his or her intended output while also probing the user to think about and articulate each facet of the idea.
This mechanism can also be extended to a design interface, allowing the user to hone in on a desired form by answering a series of questions about it. Drawing on natural modes of interaction, these questions could be answered either verbally…
…Or gesturally, preventing the user from needing to learn a complex menu system in order to access the tool’s capabilities:
Building on recent advances in machine learning, it is increasingly possible for the machine to answer the user’s complex, contextual questions about the properties of a design:
For example, the user could pose factual questions that would help him or her to evaluate the design’s suitability for some intended use:
This dialogue would imitate the form of human conversation, but would benefit from the machine’s omniscient knowledge of the design’s properties.
This could also be tied to the machine’s ability to model real-world constraints such as material, physical or chemical ones:
By embedding this capability within a realtime interaction, an architect, for example, could save a great deal of time by being able to quickly eliminate nascent ideas that are unlikely to yield fruitful results.
Aside from “real world constraints,” the user’s meaning in a given interaction may not always be clear — either because of the machine’s knowledge limits or because of a lack of clarity in the user’s statement:
Rather than going with a “best guess,” the machine could offer clarifying questions and alternatives:
This conversational approach would therefore help to clarify the user’s intent as well as build the machine’s knowledge base.
A conversational approach also presents a natural mechanism for preserving the user’s iterative process in a manner that is far more accessible for review and reflection than an “Action History.”
By unrolling the interface into a linear, traversable “news-feed,” the user is able to inspect each stage in his or her thinking and easily return to earlier iterations, branching off in a new direction while still preserving each other version of the design.
Work in Progress
Over the last few years, I’ve been working to actualize some of these ideas in software.
I’ve created a combined programming language and design tool called Foil which aims to bring many of the concepts we’ve discussed to life and is intended for users along the full spectrum of design experience, from novice to expert.
Foil tailors itself to the designer’s needs as it learns from their interaction and supports designers in growing and developing expertise.
Depending on the user, Foil can be a consumer design tool, a professional design tool, and a platform for the creation of emergent interface elements and design widgets which users will ultimately be able to share with each other.
I’ll be releasing an alpha version of Foil very soon.
These ideas have been greatly shaped by my work at NYU’s ITP, a graduate program investigating the intersection of the arts and technology.
In this interdisciplinary context, it is clear to me that we are at the beginning of an immense cultural convergence and that the tools and vocabularies of once disparate media are ever more relevant to and interoperable with one another.
To that end, Rune Madsen and I have started a research group focused on the intersection of machine learning and design.
We are also currently co-teaching a class called Rethinking Production Tools, which asks students to investigate and develop new paradigms in tool making.
The technical renaissance in machine learning over the last few years has led to incredible new possibilities. But the real work in actualizing these possibilities is only just beginning.
This work requires an interdisciplinary approach and I have been extremely fortunate to be able to collaborate with Rune, the faculty and the students at ITP in this exciting period of transformation.
A lot of people seem to be worried that artificial intelligence will take our jobs and render us useless.
I see a different possibility for the future, a more optimistic one, in which we are still relevant. In fact, in this future, we are all the more powerful, we are all the more human.
In this future, we are not competing with objects, we are using them to extend our reach as we have always done.
But to get there, we need to remind ourselves what tools are for.
Tools are not meant to make our lives easier. Not really.
They are meant to give us leverage so that we can push harder.
Tools lift rocks. People build cathedrals.