Rick and Mortify: An AI Playground for Rick and Morty Storyboards

7 Learnings from Building rickandmortify.com and Tinkering with Generative AI

Julia Turc
10 min readNov 30, 2022

Nobody exists on purpose. Nobody belongs anywhere. Everybody’s gonna die. Come watch TV? — Morty

Morty might sound nihilistic at first, but I think he is making an endearing point about entertainment. There’s something fundamentally soothing about a good story. I also think that simply watching stories doesn’t completely cut it. We want to make them. We want to be part of them.

Traditionally, the complexity of producing high-quality long-form content, which requires talent and resources, has kept a clear division between creators and viewers. However, generative AI promises to revolutionize creative work. What if we could finally blur the lines? What if you could decide how your favorite TV show ends? Think of Bandersnatch, but instead of choosing a branch of the story, you create it.

Rick and Mortify

Inspired by our favorite show, Rick and Morty, we built Rick and Mortify a playground for creating storyboards for never-before-seen episodes. The premise of an episode is yours. But all the time-consuming bits (plot points, dialogue, accompanying visuals) are generated by AI. You are basically driving the creation process with AI as your copilot.

Rick and Morty storyboard generated via rickandmortify.com for the premise “Rick and Morty go to Paris driving a Mini Cooper”. The premise belongs to one of our users.

In the rest of this article, we’ll share our learnings from this project and discuss the opportunities we see in the generative AI space for long-form content creation.

1. GPT-3 chain prompting and character injection can fight dullness.

Our storyboards take the form of a plotline interspersed with character dialogue. To generate a compelling and convincing story, you need to get a few pieces just right:

  1. The plot should follow the typical narrative structure of premise, conflict, and resolution.
  2. Events must be logically coherent across episodes.
  3. The story must adhere to the laws of the TV show universe (e.g. time travel here is possible).
  4. Characters must behave consistently with their personalities (e.g. Rick is usually the one pulling Morty into the adventure, not the other way around).

While GPT-3 already has a solid understanding of the Rick and Morty universe, prompting it requires a lot more work than simply asking it to “generate a Rick and Morty plotline and dialogue”. There are certain types of unwanted behaviors that need to be mitigated via careful prompting.

For example, plotlines can be dull (e.g. “Rick and Morty go on an adventure. They face danger, but are able to overcome it and return home.”) and dialogue can degenerate into banter (e.g. “Morty: Hey Rick, this was great! Rick: Yeah it was! Morty: Absolutely!”).

To ensure a solid narrative structure and add more nuance to the generated text, we adapted ideas from Yang et al. In particular, we found it effective to chain multiple calls to GPT-3 and continuously inject character personality traits into the prompts.

2. GPT-3 is not a silver bullet for all your text generation needs.

GPT-3 is meant as a foundational building block for text generation, and it achieves that goal quite well. By simply calling its API (with some amount of prompt engineering), you can build a passable v0 for your product. However, if you want to build something truly differentiated, you will need more custom solutions.

Despite GPT-3 continuously improving (the davinci-003 version is visibly better than its davinci-002 predecessor), a great unbundling of text generation is still inevitable.

For instance, there is room for improvement in the quality of our dialogue. While the generated lines do adhere to the personalities of the characters, we wish there was more substance to it — the dialogue should be less reactionary to the narrative (e.g. “Hey Morty, check it out! We’re in Paris”) and push it forward instead (e.g. “Morty! Come here you urrp little piece of garbage, I need you to hold a cable that I need for.. urrp something important”).

Contrast between Rick’s dialogue line generated via GPT-3 versus Character AI (which specializes on personality-infused dialogue generation).

Despite GPT-3 continuously improving (the davinci-003 version is visibly better than its davinci-002 predecessor), a great unbundling of text generation is still inevitable: new companies that focus on specific verticals will provide much higher value to their customers than a universal text generator.

For instance, Rick’s voice via Character AI (which focuses on personality-infused dialogue) is net superior to Rick’s voice via vanilla GPT-3, as shown above. Granted, OpenAI’s recent release of ChatGPT promises to shrink this gap. However, there are plenty of other verticals that OpenAI is unlikely to pursue, either because of the small market size, or because of the narrow expertise required (e.g. medical, legal or financial documents).

3. Image generation severely suffers from inconsistency across runs.

From a visual perspective, long-form storytelling requires multiple types of consistency: general style, characters, object permanence, etc.

Existing text-to-image models assume that users want a single output. This works for a blogpost illustration or an ad, but is inadequate for episodic content. The most jarring type of inconsistency relates to characters. For instance, here is how DALL·E 2 illustrates Rick in various contexts:

Visual inconsistencies across images generated by DALL·E 2, containing Rick from the Rick and Morty TV show.

While the first two Ricks have some common traits (the hairstyle or the bulging eyes), it would be very hard to argue they are the same character. In this situation, the AI’s creative liberties break continuity and make the frames unusable together. No amount of prompt engineering seems to guarantee a reliable Rick either. Describing his outfit in detail (as in C) brings back some of his signature look, but it has a fan art flair that doesn’t quite match the Rick in the spaceship.

As a side note, placing the eponymous Rick in various contexts is actually the easier task for these generative models. Since Rick and Morty is a popular TV show with plenty of imagery available on the Internet, the model must have encountered the character during training. This makes it easier for it to have learned a consistent set of recognizable features (e.g. wild blue hair, white lab coat, etc.).

However, the problem becomes more complicated when introducing entirely new characters. What if I wanted to create an antagonist Mr. Octopus? How hard would it be to generate the same Mr. Octopus repeatedly?

Visual inconsistencies across images generated by DALL·E 2, for a new fictitious character (half-human half-purple octopus). There is high variance both within the same prompt, and across prompts.

Prompting the AI model with a vague description of the new character (“half-human half-purple octopus”) clearly isn’t going to work. Elaborating on the description might help to some degree but, as it was the case for Rick, it simply doesn’t guarantee full consistency.

There are methods such as Text Inversion and Dreambooth which are able to teach an existing text-to-image model about a new concept (e.g. your own dog or favorite teapot) based on a small set of existing images. However, there is absolutely no supporting imagery that can serve as teaching material for a new character like Mr. Octopus.

One option would be to generate an initial picture of Mr. Octopus for the first episode in the storyboard (e.g. drinking tea in a time machine). Then, for the second episode, we could re-draw parts of its body to perform a new activity (e.g. fixing a quantum computer). The hope is that the visual elements that are carried over between episodes will enforce consistency.

(Failed) attempt to generate the same character performing a different activity via DALL·E 2’s image editing feature.

In practice, this doesn’t work well because it doesn’t constrain DALL·E 2’s artistic liberties enough: the character in Episode 2 only preserves the eyes.

The consistency problem in generative AI remains largely unsolved,. For now, creators simply accept it — especially for children’s books, where the audience (👶) is less observant. Others base their characters on real-life figures, for which there is existing imagery.

4. In-painting needs some tweaking too.

For Rick and Mortify, we worked around the image consistency problem by cutting out character poses from frames of the TV show. Specifically, we used the images in the Rick and Morty Images Dataset and removed the background behind the characters.

For each episode in a storyboard, we select the cut-out that matches it best, and in-paint around it with a text-to-image model:

How Rick and Mortify ensures visual consistency of characters via cut-outs from TV show frames.

Getting in-painting right is also non-trivial. First, make sure to use a model that was specifically fine-tuned to perform in-painting. RunwayML released a customized version of Stable Diffusion that achieves this task much better than the vanilla model. DALL·E 2 performs in-painting out of the box.

Second, we noticed it was very common for text-to-image models to blend the characters into the background or distort them, especially if they are (a) very thin, and (b) close to monochromatic — like it is the case for Mr. Meeskees. We also saw Rick’s wild hair getting longer and wilder.

Tex-to-image models unwantedly distort character cut-outs.

This happens because we are using in-painting in a non-traditional way. In a typical use case, the model is given a nearly-complete image with a relatively small masked out portion and is expected to refill it. The in-painted bit must therefore blend in very well with the existing background. However, for Rick and Mortify we are doing the opposite: providing a character cut-out and expecting the model to fill in the background without blending the character in.

To work around this limitation, we added thick black borders around the character cut-outs, signaling to the model the desired delineation.

5. The future might not be fully generative.

On a first glance at the generative AI world, it’s tempting to assume a “click-to-generate” future with near-zero human involvement and with AI being the only tool in the toolbox. But once the novelty tapers off and people start using these models for real-world use cases, it becomes apparent that reality is a lot more nuanced.

First, the single most important reason why we were able to build this AI playground in a matter of days is that the characters are already part of popular culture — which makes them engaging to users, but also enabled us to use generative models out of the box. In other words, most of the merit goes to the talented (human) creators of the show. As elaborated in 3. above, creating entirely new characters with generative AI only is still a massive challenge.

Second, it’s quite clear that we will continue to need auxiliary tools to adjust the inputs and/or outputs of generative AI (e.g. we used good old computer vision to remove the background and produce the character cut-outs). The holy grail is finding the right synergy between humans, traditional tools, and generative AI.

6. For reliable GPU access, try Lambda Labs.

Generative AI is resource-intensive, especially if you want to train or serve your own models.

If you’re an individual with no funding, then getting access to GPUs is non-trivial. The three big clouds are very protective of GPU resources and prioritize lucrative contracts rather than the pennies they get from individual use. Getting access to even one single GPU with Google Cloud or Microsoft Azure can take days of bouncing emails.

I personally had a great experience using GPUs from Lambda Labs (we have no affiliation to them, and this is not sponsored). With a click of a button, the GPU is yours. Availability is great too — I’ve never been unable to start a GPU-enabled VM (which happens very often on Google Cloud for instance).

7. For reliable text-to-image inference, try Replicate.

With generative AI, you don’t want to re-invent the wheel. Especially for the first version of your product, calling existing APIs (with some prompt engineering) will get you 80% there.

Calling GPT-3 and DALL·E 2 via OpenAI is a fine default option. But if you want to use Stable Diffusion or serve a model that you built, Replicate is a great way to do it (we have no affiliation to them, and this is not sponsored). The price per inference is reasonable ($0.0023 per second for the model we used) and their support staff is excellent — they replied to our emails in seconds.

Generating Entire Episodes

While our initial release of Rick and Mortify leverages text and image generation, there are multiple extensions that would bring it closer to the holy grail of fully fledged episodes.

Voice synthesis is now able to reproduce a specific person’s voice, like in this entirely fake but real-sounding interview between Joe Rogan and Steve Jobs. Even more remarkable is the emerging ability to clone actors’ voices into an entirely new language and thus localize content without subtitles or expensive dubbing.

Lastly, multiple research labs are working on AI video creation by interpolating between image frames (see Google’s Imagen Video and Meta’s Make-A-Video).

Get in Touch!

Rick and Mortify is only scratching the surface of what is possible and we’re excited to see what you create with it. If you’ve been thinking about AI-driven story generation, we’d love to hear from you.

This article was written by Julia Turc and Mihail Eric. We are machine learning researchers who left big tech to build Storia AI, a creative assistant for fast and delightful video production.

Sometimes science is more art than science. — Rick

--

--