Computer vision has taken huge leaps forward and is generating a data source we can use to drive innovative digital products and services, but we have to bring our innovative vision up to the same level to unlock this technology’s true potential.
Now that cheap, high quality cameras are so widely available, we’re seeing an explosion of imager data from public, private and commercial domains. At the same time, advances in machine learning and deep learning technologies let us transform images into digital signals to support a range of tasks and actions. Early examples set out to automate visual inspection tasks, but now that the technology is maturing, it’s allowing us to explore radically new applications.
At this point, the question is not “can we use computer vision?” but rather “why would we use computer vision?” Service design provides tools to explore and understand the context of people’s problems. It’s a critical first step in learning what insights we need to design our solutions, and understanding how computer vision can help.
Straightforward visual tasks are still the primary focus for using new data gathered from computer vision — tasks like tracking stock levels, inspecting parts on a production line, or monitoring security footage. These activities were simple and routine, meaning that the technology could boost efficiency by freeing up people’s time for jobs that needed more complex thought. They’re still useful services, but new algorithms are now giving us the chance to pursue more sophisticated uses.
Disney is using computer vision to understand people’s emotional reactions to movie plots during test screenings. By capturing people’s feelings in the moment, in an unobtrusive way, they gather quantitative data that can back up (or challenge) insights from qualitative focus groups taken after the event.
3D cameras with spatial mapping capabilities are allowing machines to directly render digital models of environments. This ability to see and remember in 3D has interesting potential for use in augmented reality applications.
In these examples, task automation is the focus, and the technology is still “seeing like a human,” but when we use computer vision to augment human capabilities we open up a wealth of new innovations that could change our lives.
Rather than thinking about how computer vision can replace people in certain tasks, it’s interesting to explore how it can enhance our abilities so we can tackle problems that are currently too complex for us. The technology can allow us to see the imperceptible, to see at enormous scale, and to see across space and time. What problems could we solve if we had superhuman sight?
Seeing the imperceptible is the ability to see more and interpret more than our brains alone will allow. Researchers at MIT have been working on detecting a person’s pulse from standard video footage of a person’s face. Computer vision measures very subtle changes in the tone and colour of the skin, and the derived signal allows you take a pulse without being near the person — let alone needing to touch him/her. This signal can be used to distort the video image in real time to literally make the pulse visible (and a little freaky). This has huge potential for ambient sensing, monitoring and diagnosis in healthcare.
Seeing at scale is the ability to monitor and process enormous volumes of visual content. In a recent project at The Dock, Accenture’s Global R&D hub, we looked at how we could use computer vision to help people moderate video content on social media. Today, human moderators are responsible for inspecting content that has been flagged by concerned users — this approach is reactive, very limited in scale, and puts those moderators at risk of psychological stress. By redesigning moderation teams to include both people and artificial intelligence, we showed how computer vision could detect and act on obvious content violations, while also enriching and pre-editing content that requires human review. This approach can increase the amount of content reviewed, while also limiting the amount of troubling content the human moderators have to see.
Seeing across space and time allows us to capture footage and observe features that would otherwise be impossible. Ecological surveys of wildlife typically cost a fortune, are time- consuming and difficult to do. Computer vision is making it all easier, as it’s being used to map deforestation and biomass reduction using aerial and satellite imagery. Remote camera traps are helping to count wildlife populations in very isolated locations. In industry, companies like Reconstruct Inc. monitor progress on their large building sites by combining autonomously captured footage with building information management systems. The insights can automatically generate progress plans, and detect deviations or irregularities in the construction process or design.
Each of these applications of computer vision enhances our ability to perceive the world that exists around us, but the latest advances will allow us to see things that aren’t there. This opens up new possibilities in the creative industries. Generative Artificial Intelligence (AI) means applying AI to the creation of media and artefacts. In the last few years we’ve started to see products and services that combine generative AI with computer vision. For instance, Google’s Deep Dream Generator has been used to create new paintings in the styles of various artists, as well as some very trippy videos. Stitchfix, a subscription fashion retailer, has been using AI to identify trending styles, which their designers use to create new ranges. Artomatix creates software that supports the design and development of online games. Their software supports 3D hybridisation, an approach that uses AI to generate infinite variations of un-textured and textured meshes. Right now, it can be used to make 3D renderings look more realistic, but they believe its true power will be in automatically generating environments and even characters in massive online gaming environments. During a studio visit they showed us an example of a zombie map where they passed the algorithm two drawings of zombies and asked it to generate hundreds of variations on the theme — with very convincing results.
In all cases, this technology is supporting the ability for a designer to use computer vision and AI to automatically sketch out design solutions. This doesn’t remove the designer from the process, but it does allow them to radically expand the range of components that they can explore in their final compositions.
Picture the future
Computer vision is now a mature technology and its range of potential applications is limited only by our imagination. That said, when designing digital products and services that utilise it, it’s important to consider both the data sourcing strategy and the ultimate value that you’re bringing to the end user. Both factors are critical to success.
Image as Data Source
Social media and web platform giants sit on massive stores of imagery, giving them a huge advantage when it comes to training computer vision algorithms, but this shouldn’t stop others from getting started.
Many organisations fail to leverage the image data they already own. Retailers sit on masses of CCTV footage that is usually only inspected after a security incident. If we apply computer vision to that footage, we could understand when queues start to build at checkouts, see when customers appear lost in the store, detect a missing child, or inform the redesign of store layouts.
If companies don’t have the benefit of existing image data, they can start to experiment with user-generated content. Google’s Quick Draw is a game-like experience that tries to guess what you’re sketching on your phone. In just a few months 50 million doodles were generatedthat can now be harnessed to automatically interpret people’s hand-drawn scribbles. Asos, the online fashion retailer, launched the “As seen on me”campaign that encourages users to upload fashion photos of themselves. While the initial motivation was customer engagement and loyalty, its potential for mass personalization through computer vision is clear. These transitional data-generating services are an important tool that can play a key part in a product strategy that utilizes computer vision.
Even without existing or user-generated data, there are off-the-shelf solutions that allow you to add computer vision to products and services. Blippar provides a range of APIs that allow you to integrate face, logo and object recognition into applications. They have even developed specialist algorithms that detect specific makes and models of cars and varieties of plants.
What’s your purpose?
We’re starting to see computer vision applications that exist because they can be built, rather than because they should be. Examples include real-time synthetic videos of politicians basically saying whatever you want them to say all the way through to fake porn. This is a powerful technology that could have some deeply undesirable consequences. As demand for innovative uses of computer vision increases, it’s important that we ask ourselves what value these applications bring to people.
At Fjord, we apply service design principles to understand true human needs and use technologies appropriately to create services that delight users. We start with design research to identify and investigate pain points, patterns of behaviour and unmet needs, which gives us the context for introducing new digital tools into people’s lives.
We bring this approach to our work with Accenture, and have applied computer vision to help gardeners identify flowers, to allow shoppers to understand food nutrition, to let homeowners imagine the impact of daylight, and to help all of us navigate in the dark. Computer vision is a genuinely exciting technology and we’re only just beginning to discover its potential for digital products and services — our future could well be fuelled by it.
So, what’s your vision?