Multi-Agent Movie Scripting using Generative AI with codes

Multi-Agent Orchestration using LangGraph

Published in

Data Science in your pocket

4 min readApr 23, 2024

Generative AI has been the talk of the town for a couple of years now and everyday we are witnessing something new coming out of the magic box. The newest concept I’ve been exploring nowadays is Multi-Agent Orchestration and it has been quite exciting. I already wrote a couple of posts on basics of MAO & a debate application involving two agents fighting on a topic. Next in the list is a Movie Script Generation app that I tried out and would be discussing in this post.

Movie Script Generation, as you must have assumed:

Intakes a Movie Scene as input
Generate characters/roles to be played by AI depending on the scene.
Generates a script for the particular scene depending upon the length mentioned as input.

I would be again using LangGraph for this tutorial as well. If you’re new to LangGraph, this should help

So, let’s get started:

Import required functions and LLM of your choice. I’m using gemini-pro by Google in this tutorial

from typing import Dict, TypedDict, Optional
from langgraph.graph import StateGraph, END
import random
import time
import ollama
from langchain.output_parsers import CommaSeparatedListOutputParser
import os
import google.generativeai as genai
from langchain_google_genai import ChatGoogleGenerativeAI

GOOGLE_API_KEY=''
genai.configure(api_key=GOOGLE_API_KEY)

model = ChatGoogleGenerativeAI(model="gemini-pro",google_api_key=GOOGLE_API_KEY)

def llm(x):
    return model.invoke(x).content

2. Decide scene & dialogue length.

scene = "Ravi and Shyam are fighting as Ravi betrayed him. After a heated argument, Shyam leaves. Ravi's wife, Radha consoles Ravi"
length = 20

If you notice, this scene involves 3 characters & also an event (Shyam leaving) towards the end.

3. Generate multiple agents required for the scene

actors = "identify different characters required to perform this scene.\
Output just character names as comma separated list and nothing else.\
Movie scene: {}".format(scene)

output_parser = CommaSeparatedListOutputParser()
output = llm(actors)
classes = output_parser.parse(output)

The idea is quite simple, let the LLM decide which characters are required. Also, I’m using an OutputParser to structure the comma separated output as python list.

Time for LangGraph

4. Defining state variables

class GraphState(TypedDict):
    next_speaker: Optional[str] = None
    history: Optional[str] = None
    current_response: Optional[str] = None
    current_speaker: Optional[str] = None
    dialogues_left: Optional[int]=None
    results: Optional[str]=None
    description: Optional[str]=None

workflow = StateGraph(GraphState)

These are the variables that will be common throught the graph execution and can be read/write anytime. Do go through the beginner’s video for clarity on base concepts of LangGraph.

5. Defining nodes for the graph

prefix_start= 'You are {}.\
Conversation so far \n{}\n \
Say your next line and\
Dont repeat your previous lines.\
Scene:{}.Expected Output format\
 speaker_name:line'

def classify(classes,summary,current_speaker,scene):
    next_speaker = llm("Who should speak the next dialogue?{}. Output just speaker name. Conversation so far:\n{}\n Movie scene:{}".format(','.join(classes),summary,scene))
    return next_speaker

def handle_speaker(state):
    question = state.get('current_response')
    speaker = state.get('current_speaker')
    scene = state.get('description')
    summary = state.get('history', '').strip()
    
    classification = classify(classes,summary,speaker,scene)  # Assume a function that classifies the input
    return {"next_speaker": classification}

def handle_greeting(state):
    return {"description":scene}

def handle_dialogue(state):
    summary = state.get('history', '').strip()
    current_response = state.get('current_response', '').strip()
    count = state.get('dialogues_left')
    next_speaker = state.get('next_speaker', '').strip()
    
    if summary=='Nothing':
        prompt = prefix_start.format(classes[0],'Nothing',scene)
        argument = classes[0] +":"+ llm(prompt)
        summary = ''  
    else:
        index = classes.index([x for x in classes if x in next_speaker][0])
        prompt = prefix_start.format(classes[index],summary,scene)
        argument = classes[index] +":"+ llm(prompt)
    return {"history":summary+'\n'+argument,"current_response":argument,"dialogues_left":count-1}

def result(state):
    summary = state.get('history', '').strip()
    summary  = llm("Remove empty dialogues, repeated sentences and repeated names to convert this input as a conversation:\n{}".format(summary))
    
    return {"results":"SCENE END","history":summary}

workflow.add_node("handle_speaker", handle_speaker)
workflow.add_node("handle_dialogue", handle_dialogue)
workflow.add_node("handle_greeting", handle_greeting)
workflow.add_node("result",result)

The explanations are easy

handle_speaker : This node chooses which agent should speak the next dialogue using the classify function.
handle_dialogue: Once the speaker is selected, this function generates the next dialogue for the given speaker given the movie-scene description and script so far
handle_result: Not required if you’re using a high grade LLM. This basically helps in cleaning up the entire script as you may face some formatting issues with low grade LLMs.

6. Add conditional edges

def check_conv_length(state):
    return "result" if state.get("dialogues_left")==0 else "handle_speaker"

workflow.add_conditional_edges(
    "handle_dialogue",
    check_conv_length,
    {
        "handle_speaker": "handle_speaker",
        "result": "result"
    }
)

This conditional edge will help us introduce the cycle in the graph which checks the script length and decides whether to end or continue with generation.

7. Add edges to graph object

workflow.set_entry_point("handle_greeting")
workflow.add_edge('handle_greeting', "handle_speaker")
workflow.add_edge('handle_speaker', "handle_dialogue")
workflow.add_edge('result', END)

8. Invoke with default values for state graph variables

app = workflow.compile()
conversation = app.invoke({'dialogues_left':length,'next_speaker':'','history':'Nothing','current_response':''},{'recursion_limit':1000})

Note: You need to use ‘recursion_limit’:1000 (or even bigger number) to generate large conversation.

9. Script is ready !!

If you notice, there are minor issues towards the end of the script (like the system mixing up between Shyam & Ravi) but this can be resolved with tuning the prompt being used.Nonetheless, the framework works fine and doesn’t has any syntactic issues.

The entire LangGraph looks like this

With this, it’s a wrap. I hope this is useful !!

Multi-Agent Movie Scripting using Generative AI with codes

Multi-Agent Orchestration using LangGraph

Written by Mehul Gupta