Summarize Youtube with text-davinci-003

8 min readDec 18, 2022

“A tablet showing a youtube video being dropped into a sausage making machine. Stock photo, 8k, realism, detailed.” — yeah ok

Someone sends you a link to a video. Looks interesting! But you click it, and, oh man, it’s 3 hours long. Is any of this worth your time?

I have this problem daily. So, I wrote this short python script, ytsummary.py, that, based on the transcript, will summarize a youtube video in 10 minute chunks, and also provide an overall “summary of summaries”.

Here’s the code: https://github.com/emlynoregan/ytsummary/blob/master/README.md

For example, I work with serverless computing on AWS. And here’s a talk from AWS Reinvent 2022 about lambda:

Might be interesting? Let’s have a look…

Let’s use ytsummary to figure out what’s in there:

> python ytsummary.py https://www.youtube.com/watch?v=0_jfH6qijVY

Found 6 chunks

Summary of chunk 1: 
In this section, Julian Wood introduces the talk and 
explains how Lambda is used to build modern 
applications with the lowest total cost of ownership. 
He also talks about how Lambda has evolved over the 
years, with features such as provision concurrency, 
container images, and 10 gig functions. He then goes 
on to discuss how customers are using Lambda for 
various applications, such as IT automation, data 
processing pipelines, microservices-based applications, 
and machine learning applications. He also explains 
the importance of security, durability, availability, 
and features in the Lambda service, and how AWS takes 
on more of the security in the cloud for serverless 
applications. Finally, he talks about the open 
source Firecracker technology and the two types of 
invocation models for Lambda.

Summary of chunk 2: 
This section of the transcript discusses how Lambda 
works to process both synchronous and asynchronous 
invokes. It explains how the frontend service handles 
authentication and authorization, the assignment 
service manages the execution environment, the 
placement service optimizes the placement of the 
environment, and the control plane service manages 
the lifecycle of the assignment service nodes. It 
also explains how the event invoke frontend service 
and pollers handle asynchronous invokes, and how 
event source mappings are used for other service 
integrations.

Summary of chunk 3: 
In this section, Julian and Chris discuss how Lambda 
manages state to serve invokes. They explain that 
Lambda uses an invoke frontend and poller fleet to 
get invoke input to the correct machine at the 
correct time, and that code is downloaded from S3. To 
improve the Lambda service, they implemented 
container packaging support, which required them to 
rethink the mechanism by which code is delivered 
into the execution environment. They realized that if 
they could download and present only the necessary 
bits of a container image to the environment, they 
could get the environment started more quickly and 
amortize code delivery time and cost. To do this, 
they changed the way container images are persisted 
in the Lambda storage subsystem and broke the binary 
representation of the file system into chunks on the 
block device.

Summary of chunk 4: 
In this section, the speaker discusses how Lambda 
uses container images and virtual machines to 
optimize code delivery and state management. They 
explain how they use chunking and convergent 
encryption to deduplicate data while maintaining 
security, and how they use Firecracker to reduce 
overhead and improve efficiency of the VM fleet.

Summary of chunk 5: 
In this section, Chris talks about how Lambda 
leveraged lessons from storage services to improve 
the performance, efficiency, and overall experience 
of the Lambda service. He explains how Lambda uses 
customer access patterns to influence how data is 
laid out in the storage subsystem, how shared state 
is important for utilization and performance, and 
how storage services spend a lot of time meeting 
their caller where they are to hide the complexities 
inherent in a storage service from a customer. He 
then talks about how Lambda solved Java cold starts 
with the use of Firecracker and microVM snapshot and 
restore, and how this improvement to the cold start 
experience was made possible by turning a compute 
problem into a storage problem. Finally, he mentions 
the issue of state with the Worker Manager service 
and how the control plane service manages Worker 
Managers.

Summary of chunk 6: 
In this section, the speaker discusses how the 
Lambda service works to efficiently run workloads 
and minimize idle time. They explain how the 
Assignment Service works to ensure that execution 
environments are not orphaned in the event of a 
zonal failure, and how the service is written in 
Rust for performance, latency, and memory safety. 
The speaker also mentions how they use models and 
machine learning to pack workloads optimally 
together, and how they have a team of research 
scientists dedicated to the placement problem. 
Lastly, they encourage viewers to learn more about 
serverless on AWS and to provide a five star 
rating for the session survey.

Summary of summaries: 
This video discusses how Lambda is used to build 
modern applications with the lowest total cost of 
ownership. It explains how Lambda works to process 
both synchronous and asynchronous invokes, and how 
it uses container images, virtual machines, and 
Firecracker technology to optimize code delivery 
and state management. It also talks about how 
Lambda leveraged lessons from storage services to 
improve the performance, efficiency, and overall 
experience of the Lambda service, and how the 
Assignment Service works to ensure that execution 
environments are not orphaned in the event of a 
zonal failure. Finally, it encourages viewers to 
learn more about serverless on AWS and to provide 
a five star rating for the session survey.

>

I’m pretty sure those summaries are better than I do, if I were to watch the video and try to write them by hand.

And, they’re extremely informative! Reading through this summary, I can see that while there’s a lot of cool detail in there on how Lambda is implemented at AWS, it contains next to nothing that would actually change how I’d use it, and no hints of future features. So, I can probably skip this.

How does it work? This is what text-davinci-003 thinks:

This program is a python script that is used to summarize the transcript of a YouTube video. It takes a YouTube video ID or URL as an argument and then retrieves the transcript of the video. The transcript is then split into chunks of 10 minutes each. For each chunk, the program calls an API from OpenAI to summarize the chunk into a single sentence. Once all of the chunks have been summarized, the program calls the OpenAI API again to summarize the summaries of the chunks. Finally, it prints out the summary of the summaries.

10/10, perfect. Great job.

Here’s the full technique for that; I pasted the program into OpenAI’s sandbox, using text-davinci-003, and asked it to describe how it works. I cut out all the optional stuff in the code (about mentioning people) for brevity:

Here is a program written in python.
---
# In this program we will read the transcript of a youtube video and summarize it
# Where the transcript is longer than 10 minutes, we will split it into 10 minute chunks

import setcreds
import openai
import sys
from youtube_transcript_api import YouTubeTranscriptApi

diagnostics = 0
include_mentions = 0

def get_video_id_from_video_id_or_url(video_id_or_url):
    # a youtube video id is 11 characters long
    # if the video id is longer than that, then it's a url
    if len(video_id_or_url) > 11:
        # it's a url
        # the video id is the last 11 characters
        return video_id_or_url[-11:]
    else:
        # it's a video id
        return video_id_or_url

def get_chunks_from_youtube(video_id):
    # this function will get the transcript of a youtube video
    # and return it as an array of chunks
    # where each chunk is an array of lines

    # first get the transcript
    transcript = YouTubeTranscriptApi.get_transcript(video_id)

    chunks = []

    start_timestamp = 0.0
    current_timestamp_mins = 0.0

    current_chunk = []

    for entry in transcript:
        current_timestamp_mins = entry['start'] / 60.0

        # if the current timestamp is more than 10 minutes after the start timestamp
        # then we have a chunk
        if current_timestamp_mins - start_timestamp > 10:
            # add the current chunk to the list of chunks
            chunks.append(current_chunk)
            # reset the start timestamp
            start_timestamp = current_timestamp_mins
            # reset the current chunk
            current_chunk = []

        # add the line to the current chunk
        current_chunk.append(entry['text'])

    # add the last chunk
    if len(current_chunk) > 0:
        chunks.append(current_chunk)

    print(f"Found {len(chunks)} chunks")

    return chunks

def summarize_chunk(index, chunk):
    chunk_str = "\n".join(chunk)
    prompt = f"""The following is a section of the transcript of a youtube video. It is section #{index+1}:
    {chunk_str}
    Summarize this section of the transcript."""

    if diagnostics:
        # print each line of the prompt with a leading # so we can see it in the output
        for line in prompt.split('\n'):
            print(f"# {line}")

    completion = openai.Completion.create(
        engine="text-davinci-003", 
        max_tokens=500, 
        temperature=0.2,
        prompt=prompt,
        frequency_penalty=0
    )

    msg = completion.choices[0].text

    if diagnostics:
        print(f"# Response: {msg}")

    return msg

def summarize_the_summaries(summaries):

    summaries_str = ""
    for index, summary in enumerate(summaries):
        summaries_str += f"Summary of chunk {index+1}:\n{summary}\n\n"

    prompt = f"""The following are summaries of a youtube video in 10 minute chunks:"
    {summaries_str}
    Summarize the summaries."""

    if diagnostics:
        # print each line of the prompt with a leading # so we can see it in the output
        for line in prompt.split('\n'):
            print(f"# {line}")

    completion = openai.Completion.create(
        engine="text-davinci-003", 
        max_tokens=500, 
        temperature=0.2,
        prompt=prompt,
        frequency_penalty=0
    )

    msg = completion.choices[0].text

    if diagnostics:
        print(f"# Response: {msg}")

    return msg

def main():
    # Get the transcript of the video
    if len(sys.argv) < 2:
        print("Usage: python3 sumvid.py <video id or url>")
        sys.exit(1)

    # transcript_file_name = sys.argv[1]
    video_id_or_url = sys.argv[1]

    # if the video id or url is a url, extract the video id
    video_id = get_video_id_from_video_id_or_url(video_id_or_url)

    if len(sys.argv) > 2:
        for arg in sys.argv[2:]:
            if arg == "--diagnostics":
                global diagnostics
                diagnostics = True

            if arg == "--mentions":
                global include_mentions
                include_mentions = True

    # chunks = get_chunks(transcript_file_name)
    chunks = get_chunks_from_youtube(video_id)

    if len(chunks) == 0:
        print("No chunks found")
    elif len(chunks) == 1:
        summary = summarize_chunk(0, chunks[0])
        print(f"\nSummary: {summary}")

    else:
        # Now we have the chunks, we can summarize each one
        summaries = []
        for index, chunk in enumerate(chunks):
            summary = summarize_chunk(index, chunk)
            summaries.append(summary)
            print(f"\nSummary of chunk {index+1}: {summary}")

        # Now we have the summaries, we can summarize the summaries
        summary_of_summaries = summarize_the_summaries(summaries)

        print(f"\nSummary of summaries: {summary_of_summaries}")

if __name__ == "__main__":
    main()

---
Provide a description in a couple of paragraphs of how this program works.

This program is a python script that is used to summarize the transcript of a 
YouTube video. It takes a YouTube video ID or URL as an argument and then 
retrieves the transcript of the video. The transcript is then split into 
chunks of 10 minutes each. For each chunk, the program calls an API from 
OpenAI to summarize the chunk into a single sentence. Once all of the chunks 
have been summarized, the program calls the OpenAI API again to summarize 
the summaries of the chunks. Finally, it prints out the summary of the 
summaries.

It’s child’s play, really. I guess that’s what it means to have AGI (Artifical General Intelligence) available.

I’ve been working with AGI for the last year or so. I’ve been tinkering with the original davinci model, and I was on the beta test of Github Copilot (I currently pay for it, and man, if you’re a developer, it’s the best $10/month you could spend on anything). And of course I’ve been playing with the image generators — fun!

I *do* believe this technology is AGI, in the same sense that computers are General Purpose Machines. Computers are called General Purpose Machines to distinguish them from Special Purpose Machines, because they are programmable. Likewise, computer software is a form of Special Purpose Intelligence; it can only do the one intelligent thing it was written to do, but the new large language models are General Purpose Intelligence, or just General Intelligence, because they can be applied to a whole plethora of tasks that require intelligence, and that they were not specifically programmed / designed to do.

End of rant. I hope you find this youtube summarizer useful!

Summarize Youtube with text-davinci-003

Written by Greyboi