Summarize YouTube Videos with GPT-3

8 min readJan 2, 2023

Digital AI Chat Bot that consumes digital text and summarizes for humanity. Made with Midjourney

Introduction

As a new member of the Quinquagenarian club, I’ve been reflecting on how learning new information has changed over the years. Google didn’t exist when I was in college and just appeared as I was finishing graduate school. Learning was a different experience back then. You couldn’t quickly find information or had access to resources that housed it. Google changed that, now you can find the resources quickly and then sift through the results. But that takes time too, you have to skim/read as many pages as needed to find the solution to your problem or prompt. Now, with ChatGPT of late, you can propose a direct question and drill down to the answer immediately!

That’s pretty profound to me. I often wonder what learning would have been like for me if I had Google early in my education. Would I have learned faster? What if I had ChatGPT? That certainly would have helped on all those take home physics exams. Will we have AI powered chatbots one day to guide people through learning new topics, any topic they like? If I have a question, just ask. Perhaps it takes me off a tangent from the “curriculum”, but provides a different perspective to help me understand better. I’m excited to see what new technologies will arise out of large language models in the future.

Speaking of finding information quickly, YouTube is a good resource! However, it can be time-consuming to watch an entire video or to skip through it for specific information. To save time, I thought it might be a fun project to transform my YouTube queries into GPT like responses that address my questions or provide summaries of multiple videos. Below are the steps one can take using three APIs.

Use YouTube Data API to search YouTube for videos
Use YouTube Transcript/Subtitle API to extract the transcripts of videos
Use Open AI API to retrieve a ChatGPT like response to a prompt against the text presented in the transcripts

For instance, suppose I want to search YouTube for information on the new applications that emerged from Prompt Engineering in 2022. I can create a prompt like so:

“Using the following text, What new applications emerged from Prompt Engineering in 2022… <transcript>”.

I can then feed this prompt into Open AI a get a ChatGPT like response. Let’s examine each part in more detail.

YouTube Topic Search

The function below retrieves a list of videos related to a given search query on YouTube using the YouTube Data API. You will need an API key to use the function. The ‘params’ dictionary allows you to specify parameters relevant to your needs. In the example, the search is limited to videos from 2022 and medium duration because longer videos may exceed the token limit of the GPT API that follows.

def search_youtube(query):
    api_key = os.getenv('YOUTUBE_API_KEY')
    url = 'https://www.googleapis.com/youtube/v3/search'

    params = {
        'part': 'snippet',
        'q': query,
        'type': 'video',
        'key': api_key,
        'videoDuration': 'medium',
        'maxResults': 3,
        'publishedAfter': '2022-01-01T00:00:00Z'
    }

    response = requests.get(url, params=params)
    data = response.json()

    videos = []
    for item in data['items']:
        video = {
            'title': item['snippet']['title'],
            'description': item['snippet']['description'],
            'video_id': item['id']['videoId']
        }
        videos.append(video)

    return videos

Fetch YouTube Transcript

The function above essentially returns a list of ‘video’ dictionary objects, which includes an id for the video. That id is then fed into another API to retrieve the transcript using the function below. See YouTube Transcript/Subtitle API for more details.

def get_transcript(video_id):
    try:
        transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en'])
        full_text = ''
        for x in transcript:
            full_text += x['text'] + ' '
        return True, full_text
    except Exception:
        return False, f'\nUnable to obtain transcript for video - {video_id}'

Ask GPT-3 Your Question!

We can now construct a prompt/question and apply it against the transcript. The function below uses OpenAI’s Completion.create method to generate text based on the input prompt and model. I will not go into great detail on the parameters, please see the Open AI documentation for that. I’ll just mention temperature for now since it can drastically change the type of response you get. It’s something you’ll want to play with and explore.

The temperature parameter controls the randomness of the generated text. A higher temperature will produce more random and diverse text, while a lower temperature will produce text that is more predictable and faithful to the training data. In general, a temperature of 1.0 or above is considered high and will result in more diverse and creative output, while a temperature below 1.0 will produce more conservative and accurate output. I chose to be a bit more conservative here and used a temperature of .5.

def ask_gpt3(prompt, model):
    try:
        completions = openai.Completion.create(
            engine=model,
            prompt=prompt,
            max_tokens=1024,
            n=1,
            stop=None,
            temperature=.5,
        )
        return True, completions.choices[0].text
    except Exception:
        return False, f'\nTransript for the following video is too long - {video_id}'

Test Some Prompts!

The magic is in how you format the prompts. In the example below, I’m using the prompt “Using the following text, What new applications emerged from Prompt Engineering in 2022 … <transcript>”. It appends the text of the transcript to the prompt.

def test_app(prompt, yt_query):
    videos = search_youtube(yt_query)

    for video in videos:
        video_id = video['video_id']
        video_title = video['title']

        success, transcript = get_transcript(video_id)

        # Check if we could get the transcript
        if not success:
            print(transcript)
            continue

        gpt_prompt = f'{prompt}\n\n{transcript}'
        success, answer = ask_gpt3(gpt_prompt, "text-davinci-003")
        print(video_title, f'https://www.youtube.com/watch?v={video_id}')
        print(answer)


yt_query = "What new applications emerged from Prompt Engineering in 2022?"
prompt = "Using the following text, What new applications emerged from Prompt Engineering in 2022?\n\n"
test_app(prompt, yt_query)

The model that I’m using above is ‘text-davinci-003’ which is the latest and greatest GPT-3 model. Let’s take a look at some of the video examples returned by the query.

Example 1:

What Is Prompt Engineering?
https://www.youtube.com/watch?v=sztL7rp_TkY

GPT Response: New applications that emerged from Prompt Engineering in 2022 include the ability to create custom emojis, generate marketing copy, create Pokemon cards, create Magic the Gathering cards, create aerial photography, create nature sunsets, create tiny planets, and create images from different perspectives.

Analysis: Is the above accurate? To be sure, you could look at the transcript and check. Does it mention emojis, Pokemon cards, etc? After a quick check, I verified it does. So I’m confident it’s basing its response on the transcript text as opposed to what else it “knows”.

Example 2:

Prompt Engineer | 5 GPT-3 Tips for Beginners https://www.youtube.com/watch?v=3A0xnEmcmHc

GPT Response: From this text, it is not possible to determine what new applications emerged from Prompt Engineering in 2022.

Analysis: Interesting to see that if GPT can’t answer, it tells you so! Is it accurate? Would have to check to be sure, but in this particular case it could potentially help me bypass videos that might not be relevant and save time.

Example 3:

Design Guidelines for Prompt Engineering Text-to-Image Generative Models
https://www.youtube.com/watch?v=7-XnIuH8r3U

GPT Response: Prompt Engineering in 2022 enabled the development of applications such as text-to-image generative models, which allowed users to take a text prompt and translate it into an image. These applications allowed users to explore different permutations of keywords, random seeds, and optimization lengths to generate coherent outputs. Additionally, users could explore different styles and subjects to create visual artwork, and the model was able to access and apply symbols to represent the subject.

Analysis: Again, one would need to do validation if they are interested. I believe the above is accurate and is referring to DALL-E, Midjourney, etc., that was released in 2022.

Find Exotic Animals

So we’ve only looked at one particular prompt. What about a general summary or bulleted list? You can do that as well, just add that wording to the beginning of the prompt. This is where the fun of prompt engineering happens! Try to craft the prompt to return what you might be interested in and in a format you can expect.

Let’s take a look at a more interesting example where we will search for some exotic animal videos and then use one shot learning to train the response in a format we can expect. The prompt below is broken up into two parts. The Instructions or Examples, which is above ###. The Context then follows. For more instructions see Best Practices for Prompt Engineering.

yt_query = "The Best Exotic Animal Videos of 2022"
prompt = """Can you identify the type of animal(s) begin described in the following Text?
            The cow and fox jumped over the moon.
            Animal(s): Cow, Fox
            ###
            Animal(s): 
          \n\n"""
test_app(prompt, yt_query)

Example 4:

Top 5 Exotic Pets You’ll Want to Own — Exotic Animals You Can Keep as Pets at Home 2022
https://www.youtube.com/watch?v=7f-pOlGb9PM

GPT Response: Animals(s): Macaw Parrot, Sugar Glider, Ball Python, Chinchilla, Bearded Dragon

Analysis: The video has top 5 in the title and the prompt did return 5 animals. Good enough for me! And notice how it is a comma delimited list of animals, just as I specified in the prompt instructions.

Piano Instruction

One last example. I often view piano instructional videos but these can be really long, too long to feed the transcript into the GPT API since it will exceed the number of allowed tokens. So as a future project I’d like to investigate how to get around that. But for fun, let’s try to answer the following question.

yt_query = "How to count cadenza in Chopin's Nocturne Op. 9 No. 2"
prompt = """Does the following text explain how to count the cadenza in Chopin's Nocturne Op. 9 No. 2? If so, explain how?
          \n\n"""
test_app(prompt, yt_query)

I still don’t really count this cadenza. I just play it and stop when it sounds about right. Let’s see if GPT can help or not.

Example 5:

Tricky Cadenza in Chopin Nocturne Op.9 №2 — Tips and Strategies — Josh Wright Piano TV
https://www.youtube.com/watch?v=UrWN_McdRws

GPT Response: No, this text does not explain how to count the cadenza in Chopin Nocturne Op. 9 №2. The text provides tips on how to practice the piece, such as assigning different emotions to the groups of notes, counting in groups of fours and fives, and practicing with different rhythms. However, it does not explain how to count the cadenza specifically.

Analysis: GPT is incorrect here! I think it’s because the instructor, Josh Wright, doesn’t use the word cadenza at the spot in the video. In fact, it looks like he only mentions it once at the beginning. So this would be a case where we miss a good video that would help us. But what if we were to craft our prompt differently? Let’s ask for a summary.

yt_query = "How to count cadenza in Chopin's Nocturne Op. 9 No. 2"
prompt = """Please provide a summary of the following text in 5 bullet points.
          \n\n"""

GPT Response:
Practice good technique and utilize various methods to get the passage worked on efficiently.
Divide the passage into three sections with differing dynamics such as soft, crescendo, and diminuendo.
Use the four-two-one-five counting method while adding emotions.
Practice the rhythm of the group in various iterations such as long-short or one-two-three-four.
Take one week to practice the passage slowly to get the organization and fingering correct before trying a fast tempo.

You’ll notice that highlighted method is the method we are looking for and it’s a great idea. So either I need to be better with my direct question prompts, or GPT needs to be trained on more piano instruction. Or, just ask a piano teacher!

Thanks for reading!

I hope this small project provides a brief introduction on what language models and prompt engineering can do, or may not do. Nevertheless, I’m excited to see what comes about in the next few years. How will software development change, how will education change? What can we automate? Will Everything Change? It’s going to be interesting, that’s for sure.