Using Cohere’s API to Summarize YouTube Videos.

Overview in Colab, Streamlit, and CLI.

ibrahim el-chami
9 min readFeb 18, 2023



One of the most popular uses for large language models (LLMs)is generating text. These models can be used to write anything from short sentences to long essays. They can also be used to help write blog posts, including this article!

LLMs can summarize information, answer questions, translate text, and can use that knowledge to generate human-like language. These models are trained on a vast amount of text, from Wikipedia articles to Harry Potter books, allowing them to learn about the world and human behavior. This knowledge can then be used to make inferences about new, unseen text. LLMs have been used to generate sentences that are convincing enough to pass school-level standardized tests and to summarize information from scientific papers. They can also be used to generate recipes, create humorous responses, and complete other simple tasks. As LLMs continue to grow in size and capability, they may be able to automate a variety of tasks, including some that currently require human expertise. For example, they could be used to summarize medical records or to provide customer service support.

In some cases, summarizing text can be used for non-written applications, where the text is hidden or is not the main source of content, for example, in audiovisuals such as movies and videos. Summarizing texts in such applications works just as well. How could summarizing a video be useful?

Say for example, someone sends you a youtube video. It looks interesting, but it’s 45mins long, and you aren’t sure if you want to commit the time to watch it or if the content is worth it. In this article,’s X-Large LLM model is used to summarize YouTube videos.

To make it easier to showcase how API calls are made to Cohere’s X-large (and other models), I built a quick app using Streamlit. Sometimes, it may be more convenient to run an app in Google Colab, for various reasons including computational restraints on a local PC or privacy concerns. Running Streamlit on Colab requires a workaround, I will be showing you this as well. In other types of applications, Command-line interface (CLI) calls may be the best option to showcase how functions are called. This will also be demonstrated.


The general approach is to fetch the transcript of a YouTube video. This can be done using the YouTube Transcript Api.

from youtube_transcript_api import YouTubeTranscriptApi

transcript = YouTubeTranscriptApi.get_transcript(video)

Since the transcripts are generally long for long videos, the text is chunked into text portions. The portions have to be short enough to be used as a prompt input for the X-Large model to generate a summary for each chunk. For the X-Large model, the maximum number of tokens that can be used for each prompt input is 2048 tokens. So, we have to make sure that each chunk of text is less than the max number of tokens. This amounts for an approximate of 5 to 6 mins of text. To chunk the text into 5 min intervals, we estimate the time it takes to iterate over text. To make it easier, this is converted to estimated minutes first, then into 5min intervals. The chunks are saved into an array to facilitate iterating the summarization. The following is used to break down the transcript into 5 minute texts, in addition to whatever is left at the end of video that is less than 5 mins.

chunks = []

start_timestamp = 0.0
current_timestamp_mins = 0.0

current_chunk = []

for entry in transcript:
current_timestamp_mins = entry['start'] / 60.0

# chunk at 5 minutes intervals
if current_timestamp_mins - start_timestamp > 5:
# add current chunk to a list of chunks
# then reset the start timestamp
start_timestamp = current_timestamp_mins
# reset current chunk
current_chunk = []

# append the chunk's text

# the last chunk of the video
if len(current_chunk) > 0:

Now that we have an array of texts, where each element fits within the prompt input limits, the transcript is summarized by calling the X-Large model to summarize each chunk.

To call X-Large, you must have an API token. recently introduced a developer tier which allows developers free API access to Cohere’s models. The API is relatively straightforward and allows developers a range of customizability to generate suitable texts to their needs. This includes the size and randomness of the generated text, and penalties where users can tailor the output to generate more consistent and relevant texts. The documentation provides more details about the model parameters. The generated text can then be saved as an array.

co= cohere.Client(open_file('/content/cohereapikey.txt')) 
prompt = f""":
Briefly summarize this text in 100 characters or less."""

response = co.generate(
prompt= prompt,
text_response = response.generations[0].text.strip()

It’s generally good practice to clean up the generated text to clear up any random spacing or randomly generated characters. It’s also good practice to log prompt responses. This allows users to identify issues with responses, should the output be unexpected.

text_response = re.sub('\s+', ' ', text_response)
filename = '%s_logs.txt' % time()
with open('response_logs/%s' % filename, 'w') as outfile:
outfile.write('prompt:\n\n' + prompt + '\n\n---------\n\nresponse:\n\n' + text_response)

In a similar approach, those summarized chunks can be then further be summarized into one short summary. To do this, join all the generated chunk summaries into one text that is fed into as a prompt input.

summaries_str = " "
for index, summary in enumerate(summaries):
summaries_str += f"\n{summary}\n\n"

prompt = f"""
Using the text above, provide a detailed and coherent summary of the text."""

response = co.generate(
prompt= prompt,
text_response = response.generations[0].text.strip()

Note how in both cases the prompt input has the following layout:

prompt= "
{paste the text from somewhere else}
what the prompt should do with the text."

It’s generally good practice to not have prompt input end with leading or trailing spaces from text, this generally includes spaces. The model sometimes misinterprets the trailing text as incomplete prompt input, and attempts to finish writing the prompt. This could lead to unexpected results. One way of ensuring that the prompt has no trailing text at the end of the input is to end with a complete sentece. The start of the prompt can include information that is derived from somewhere else, where it could include sentece fragments, partially completed content, or as in our case, text generated from a previous prompt.

Run in CLI:

To run the above, the code can be better organized into functions and ran through the command line.

  1. Clone the repo:
git clone
cd youtube_summarizer_cohere

2. Install dependencies

These are the required dependencies. The file includes:

  • Cohere library
  • Youtube Transcript Api library
pip install -r requirements.txt

3. Run the app:

python3 <link to youtube video>

Streamlit app

To have a more user-friendly user interface, I used the streamlit library to build a quick app. The app has two main parts. A sidebar where users can paste their API key and the link to the youtube video they want to summarize. Since summarization can take a few minutes to complete (for long videos), it may be a good idea to include the video in the sidebar, for users to have the ability to run the video or a sample of the video as the summarization runs in the background.

# Sidebar
with st.sidebar:
user_secret = st.text_input(label = ":red[Cohere API key]",
placeholder = "Paste your Cohere API key",
type = "password")
youtube_link = st.text_input(label = ":red[Youtube link]",
placeholder = "")
if youtube_link and user_secret:
youtube_video = YouTube(youtube_link)
streams = youtube_video.streams.filter(only_audio=True)
stream = streams.first()
if st.button("Start Analysis"):

with st.spinner('Running process...'):
# Get the video mp4
mp4_video ='youtube_video.mp4')
audio_file = open(mp4_video, 'rb')

# Summary
summaries, summary_of_summaries = summarization_video(youtube_link)
summarization = {
"title": youtube_video.title.strip(),
"summarizations of video in 5mins chunks": summaries,
"overall summary": summary_of_summaries

The main section can include tabs for visualizing the output and for various features that the user may want to include. The following shows the main section with two tabs, one as a general info about the app. The other tab is where the summarization can be visualized.

st.title("Youtube Summarizer Using Cohere API ")
tab1, tab2 = st.tabs(["Intro", "Video Summary"])
with tab1:
st.markdown('A simple app that uses Cohere\'s models to summarize a youtube video, without having to watch the video. ')
st.write('***What this app does:***')
st.checkbox('Visualize/play the video in the app.', value=True, disabled=True, label_visibility="visible")
st.write('***Progress and features:***')
st.checkbox('Play the youtube video within app.', value=True, disabled=True, label_visibility="visible")
st.checkbox('Build a quick/simple app using streamlit.', value=True, disabled=True, label_visibility="visible")
st.checkbox('Alternative option: run streamlit app in colab.', value=True, disabled=True, label_visibility="visible")
st.checkbox('Multi-language integration: non-English videos compatibility.', value=False, disabled=True, label_visibility="visible")
st.checkbox('Multi-language integration: allow users to ask questions in their languages.', value=False, disabled=True, label_visibility="visible")
st.write('***Main tools used:***')
st.write("- Cohere's X-Large model.")
st.write("- Streamlit")
st.write('Repo: [Github](')

with tab2:
st.header("Video Summary:")
if os.path.exists("summarization.csv"):
df = pd.read_csv('summarization.csv')

As mentioned above, it’s generally good practice to create a log of the data and results. In this case, we save the data into a .csv file rather than log files, as it’s easier to create a dataframe in pandas, rahter than independent log files.

st.success('Video summarized! Check out the Summary Tab')

The Streamlit app can be run locally using the following steps:

Running Locally

  1. Clone the repo
git clone
cd youtube_summarizer_cohere

2. Install dependencies

pip install -r requirements.txt

3. Run the app locally (in a browser)

streamlit run

Running in Colab

Running the CLI version in Colab is straightforward. The same steps can be followed to run the app. However, running the Streamlit app requires a work around, since the Streamlit app runs in a browser.

Google Colab “allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs.” [1] This makes Colab a lucrative and user friendly tool to quickly run snippets of code and apps as well as set up virtual environments.

In some ways, Colab can be limited. For example, Google Colab doesn’t have a browser access inside the notebook, making it difficult to debug code related to web apps. With a workaround, Colab can also be used to run web apps.

Running on Google Colab

  1. Clone the repository as above. (Note that to run bash scripts in Colab cells, characters need to be added. More information can be found here)
!git clone
%cd youtube_summarizer_cohere

2. Install dependencies.

!pip install -r requirements.txt

3. Set up environment on colab for webapp access:

Pyngrok is a wrapper for ngrok that makes it available via a Python API. This allows users to be able to open a browser on Colab’s server, through rerouting.

!pip install pyngrok

4. The setup for streamlit and ngrok in Colab requires that users first create an account with ngrok. The free version allows limited access, but is great for this type of application. From ngrok, users get an authentication key that allows API access.

!streamlit run /content/youtube_summarizer_cohere/ &>/dev/null&

Create an account on ngrok, and paste your authenication token — —

!ngrok authtoken ----

5. The following code makes sure that users are using the correct versions for Colab compatibility.

!unzip /content/youtube_summarizer_cohere/

6. Assign a port, and allow Colab to reroute ngrok as a localhost.

get_ipython().system_raw('./ngrok http 8501 &')
!curl -s http://localhost:4040/api/tunnels | python3 -c \
"import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

7. Run the Streamlit app

!streamlit run /content/youtube_summarizer_cohere/

The code above can be expanded on in various ways from using a better prompt, choosing different Cohere models, and adding more features to the app such as transcription and translation tools to non-English videos!

As a general disclaimer with artificial intelligence tools, LLMs also have limitations and may not be able to handle more complex tasks or understand the context of a situation. They also tend to be biased toward the text they were trained on, so they may not always provide accurate or relevant information. As with any AI tool, LLMs should be used ethically and diligently to prevent misuse and to ensure privacy-centric solutions.

Link to repo:




[3] Image by Ivoci from Pixabay