How I come up with my instagram captions using Python and Scrapy

Ashishsubedi
Ashish’s Stories
Published in
6 min readApr 9, 2021

Greetings readers! I know that the title is kind of misleading so let me tell you a story. There are days when you click a perfect picture and you want to upload it in instagram, and you know the unwritten rule of any social media — you need to have a caption to match the picture. Now you could go to google or any search engine and search for something that matches the picture but I know how to code. And why would I do something that my computer can do automatically for me, that’s what I thought.

Photo by pine watt

Now you understand the premise, let me make you clear about what I am going to write in this article. Brainy Quote is the world’s largest quotation site. In Brainy Quote, you search for topics and it’ll give you relevant quotes. And we are going to use python to get the best caption for our picture.

Disclaimer: This article is only for educational purpose.

As mentioned in the title, we are going to use python as our scraping language. There are many scraping tools and framework available in python and scrapy is one of the popular ones. Having a basic knowledge of python is helpful, but not strictly required but make sure python 3.6+ is installed in your system (there are lots of tutorials available online) . We will be using other tools which I’ll show how to install as required.

As writing this article, I am on python version 3.9.2. I am using linux for this tutorial. If you are in windows, you might want to install git(https://git-scm.com/downloads) and use its bash to run the commands. You can check your python version by running the following command in the terminal.

$ python -V
Checking python version

As you can see above I have made a new directory called “brainy_quotes_scrape” where I’ll be working from. Now before starting, we are going to make a virtual environment for this project. Virtual environment in python is python environment such that the python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments as well as from the system libraries as well. It is like starting from nothing other than bare python in the environment. This helps us to avoid packages version conflicts for future.

You can create virtual environment as shown below

$ python -m venv venv

This command python -m venvexecutes a venv module and create a virtual environment in directory called venv. Now activate virtual environment.

source ./venv/bin/activate (For linux & mac)
venv\Scripts\activate.bat (For Windows)

You should see (venv) on left side of your terminal.

Now let’s install scrapy.

pip install Scrapy
Successful installation of scrapy
scrapy startproject brainy_quotes .

Running the above command creates a scrapy project in the same directory. Now we are going to create a basic spider that scrapes data from Brainy Quotes. Open the current directory in your favorite code editor. I am using VS Code(https://code.visualstudio.com/) for this.

Now in the terminal run command

scrapy genspider brainy_quotes_spider brainyquotes.com

This commands generates a spider named brainy_quotes_spider and adds the following codes inside ‘brainy_quotes/spiders/brainy_quotes_spider.py

Replace the line 11 with

print('Response',response)

Now run in terminal

scrapy crawl brainy_quotes_spider

You should see some jibberis in your terminal. If you look carefully, you’ll find something of this sort.

And that is the print statement we wrote in line 11. parse method gets the response of from the url we provide. Great. Now let’s look at the url of search page in Brainy Quotes.

As you can see, tree is the keyword we told it to search for. And it separates multiple keywords by ‘+’. This is great news for us.

Now, time to code.

We hae made quite a few changes. We removed start urls and used start_requests method to generate url based on user input keywords. Keywords can be passed as an argument when running the spider command.

scrapy crawl brainy_quotes_spider -a keywords="tree landscapes"

Now for this part

keywords = '+'.join(keywords.split(' '))

This first splits the keywords by space and make it into list of keywords (in example, list contains [‘tree’,’landscapes’] ) and we join it using ‘+’ sign. Running the spider, you can see the url printed out in the terminal.

This is the same url as of the search page.

Time to dig again in Brainy quotes search page. Now to find the quotes in the page, we need to study a little bit about website structure. We need to use Developer Tools or Inspect Element of the browser. I am using firefox for this, but you can use chrome and any chromium based browsers as well.

As we can see, all the quotes are under the div field with an id of quotesList. Let’s search for other components.

Great! We found our quote as well as the author. If we look at other quotes, it follows the same pattern.

As searching for patterns, we can see that the quote and the author are both <a> tag children of div with class clearfix. So now do let’s this in code.

Look at the parse method. First we get the elements that contains quotes and authors. As we saw previously, we selected all div tags with class clearfix. After that, we iterate over all these divs. For each item, that is the node, we extract the quote text and author name. Notice that we use title attribute present in <a> tag. This makes it easier to extract text and author. Now run the spider using

scrapy crawl brainy_quotes_spider -a keywords="tree landscapes"

You can see the quotes and author names printed in the terminal.

Quotes and authors

Now the process to choose instagram caption. Well, I leave that upto the random chance. Yes, that’s right, we are going to pick a random quote and use it.

Final Code

We used python’s built-in random module to choose one of the quote from the quotesList.

scrapy crawl brainy_quotes_spider -a keywords="tree landscapes"

Running the above code gives this. Your result may vary.

Final Quote

And kids, that’s how I choose my instagram caption.

--

--