Talk Gen AI: Unlocking visual creativity with Gen AI
Generative AI is more than just text-based solutions. There are a lot of exciting opportunities to use large language models (LLMs) for image and video creation and manipulation.
At Talk Gen AI, Brett Hamilton of Nvidia shared examples of how to leverage Generative AI to unlock visual creativity.
Read some of the highlights from Hamilton’s presentation and watch the video below.
Multimodal image and video generation
Image and video generation is one of the most exciting aspects of Generative AI.
Hamilton discussed Nvidia’s Picasso offering — a Generative AI model foundry that enables developers to build and deploy models for visual content. It is truly multimodal — supporting 4K images, 3D and 360 high-dynamic range images (HDRis), 4K physically based rendering (PBR), and temporally stable videos.
Hamilton shared a demo video showcasing the multi-modal capabilities including advanced features called “control nets” that enable one to control the output for a particular use case.
For example, in the demo, the narrator entered a prompt for a nature scene involving a rock arch formation — an example of text-to-image but it could also have been done with sketch-to-image too. After selecting the image, they were able to re-imagine the scene into other images with similar arch formations — either in nature as trees, or in a city as an architectural element. This is an example of transferring to different compositions using depth. From there, the narrator highlighted a section of the image and simply added a person via a prompt. They were even able to easily highlight the person’s sweater and have the pattern replaced with a different design. These are examples of “in-painting” to add, segment, or replace items in the image. Lastly, the narrator expanded the canvas which was automatically filled in to match — expanding the scope via “out-painting.”
There are interesting enterprise use cases for the technology in advertising, product design ideation, and retail. For example, Walmart’s “View in Home” uses the technology to enable a customer to view objects like a sofa in their own home.
We are at an inflection point, in crossing the chasm from early adopters to mainstream marketplace, as Hamilton explains. For the technology to continue to cross, it needs to be part of a complete solution or a company’s workflow.
One example of a complete solution using Generative AI is for a product photo shoot. Traditionally a photo shoot could be quite time consuming and expensive — taking into account the photographer, studio, stylist, and more. With Gen AI, the whole shoot could be generated, which not only reduces the costs, but makes the ability to do one more accessible to more people. Hamilton showed an example demo of this.
Digital humans
In addition to image and video generation, Hamilton discussed the digital human space, including Nvidia’s Avatar Cloud Engine (ACE).
Nvidia’s ACE is a suite of technologies to create and power digital humans and avatars. It includes Automatic Speech Recognition (ASR), text-to-speech, and LLMs, as well as services to drive emotion and character — like audio-to-face, audio-to-gesture, and audio-to-emotion.
The services can be run in the cloud or on Nvidia RTX enabled AI PCs for edge applications.
Hamilton showed a demo of an application of the technology in healthcare, with a virtual care manager. The virtual assistant was a life-like, human avatar that checked in on a patient after surgery — with the voice, lips, and tone all in sync while communicating.
Watch the video
Watch the full video to see the demos in action.
Arte Merritt is the founder of Reconify, an analytics and optimization platform for Generative AI. Previously, he led the Global Conversational AI partner initiative at AWS. He was the founder and CEO of the leading analytics platform for Conversational AI, leading the company to 20,000 customers, 90B messages processed, and multiple acquisition offers. He is a frequent author and speaker on Generative AI and Conversational AI. Arte is an MIT alum.