Create your very own YouTube Sidekick using LangChain 🦜🔗

DIY Youtube video Assistant to answer all your questions!

Harshita Sharma

Published in

Accredian

5 min readJun 5, 2023

Introduction

In my previous article we talked at a high level about LangChain and it’s components, you can check it out here:

Breaking Down the LangChain 🦜🔗

A high level introduction to the framework that is simplifying the development of AI applications

medium.com

In this article, we’re going to build our very own Youtube assistant, that can answer any questions related to any video, all we have to do is paste the link and ask the question!!!

The Plan

For creating this, we are going to leverage the ChatGPT model using OpenAI api key and then use Langchain to add the desired components. Once we are satisfied and get the desired result that we want, we will use Streamlit to make it a web application.

Let’s get on with the code then!

The How ❓

For this project I’m assuming you know you to work in an environment, it’s not a mandatory step, but highly recommended.

For starting the project you can clone my repository here and run the code yourself, along with all the changes you want to do OR you can just follow along and start from scratch!

You can simply download the requirements.txt file to get started with all the necessary dependencies.

Step 1: Create .env file

Now that you’re all set up, the first thing is to create a .env file which will contain your secret OpenAI key. If you don’t know how to generate one, follow this.

Simply create a new file in your working directory with .env extention and paste your api key there:

openai_api_key= "YOUR SECRET KEY"

Step 2: Coding

Create another .py (python file) for the coding this bit.

Installing dependencies:

import openai
import os
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from dotenv import find_dotenv, load_dotenv
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
import textwrap

Importing the secret key that you created into this file.

#importing the .env file containing the api key
load_dotenv(find_dotenv())
embeddings= OpenAIEmbeddings()

We will be using the Youtubeloader function provided by langchain which will load the transcript of the video from the URL input by the user.

The general idea is to extract the transcript of the videos and use them to generate the responses.

The first thing that we are going to do is create a function which can hold all the data of the transcript, like a database.

#creating a database
def creating_db(video_url):

    loader= YoutubeLoader.from_youtube_url(video_url)
    transcript= loader.load()

    #to breakdown the enormous amount of tokens we will get from the transcript as we have a limited set we can input
    text_splitter= RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

    #this is just a list with the bunch of splits from the above
    docs= text_splitter.split_documents(transcript)
    
    #the final database
    '''
    when a user asks a question, this database will be used to perform the similarity search and 
    generate output based on that 
    '''

    db= FAISS.from_documents(docs, embeddings) #embeddings are the vectors we convert the text over into
    return db

The tokens that we will get from the transcript will be enormous based on the length of the video, and there’s only a limited amount we can pass to the model (we are using the free version, gpt 3.5 turbo )using the Open AI api. (You can read about rate limitations here.)

To solve this issue, we are going to use the text splitter which will convert the large amount of tokens into chunks for effective input without error.

We will be using the FAISS library , created by facebook to convert all the tokens into vectors like how we naturally do in any natural language embedding process.

Now that the database is set up, we need to get a desired response out of it.

#creating another function to get response from querying the above database
def get_response(db, query, k=5):

    '''
    gpt-3.5 turbo can handle up to 4097 tokens. Setting the chunksize to 1000 and k to 4 maximizes
    the number of tokens to analyze.
    '''

    docs= db.similarity_search(query, k=k)

    #joining them into one single string
    docs_page_content = " ".join([d.page_content for d in docs])

    chat= ChatOpenAI(temperature=0.4)


    #template for the system message prompt

    template= '''
              You are a helpful assistant who can answer question from Youtube videos based on the video's transcript: {docs}

              Only use the factual information from transcript to answer the question.

              If you feel like you don't have enough information to answer the question, say: "Sorry, I cannot answer that".

              Your answer should be verbose and detailed.

              '''
    
    system_message_prompt= SystemMessagePromptTemplate.from_template(template)

    #Human question prompt

    human_template= 'Answer the following question: {question}'

    human_message_prompt= HumanMessagePromptTemplate.from_template(human_template)

    chat_prompt= ChatPromptTemplate.from_messages(
        [system_message_prompt, human_message_prompt]

    )


    #chaining
    chain= LLMChain(llm=chat, prompt=chat_prompt)

    response= chain.run(question=query, docs= docs_page_content)
    response = response.replace("\n", "")

    return response, docs

We will perform a similarity search of the question (query) input by the user with the database that we created. We’re setting k=5 here which simply means we need to find the 5 topmost similar vectors to the query provided.

You can change the prompts to your idea, I kept mine like what’s shown in the code above. After creating the prompts we will simply chain everything to get the best response.

Step 3: Creating the web application

You can test out your code by simply calling the functions, and see what works for you.

We will create a new file in the same working directory, and put all the app code there.

As mentioned before, we will be using streamlit for a very simple and beginner friendly web application.

import streamlit as st
from langchain_main import creating_db, get_response
import textwrap




#setting up the title
st.title("Hello, I'm `Ray` your Youtube Assistant 👓  ")

#User input video
video_url= st.text_input('Please enter your Youtube link here!')

#User input question
query= st.text_input('Please enter your question here 👇')


def answer():
    db= creating_db(video_url)
    response, docs = get_response(db, query, k=5)

    if video_url and query:
        st.write(textwrap.fill(response, width=50))


#aesthetics
st.button('Find the Answer', on_click=answer)

Well I named my assistant Ray, you can name it whatever you want and play around the aesthetics even more.

Conclusion

We created a very simple web application using the gpt model and youtube links. I would say this was dipping your toes in the ocean of langchain and it’s widespread ability, your creativity has no bound here.

So experiment and iterate, hoping this helped you grasp the concepts a little better, check out the LangChain documentation and suprise yourself with your creativity.

My Respository link

Happy Coding :)