Instagram Data Scraping from Public API

With Python and Scrapy: NO access-token or Authentication required


NB: Instagram changes its public APIs and probably my scripts need some reviews.. or no longer works…

The endpoint

Scrap data from Instagram is easy if you know the right endpoint.

For example take a look at the #italy hashtag: https://www.instagram.com/explore/tags/italy/

Try now to append the query string ?__a=1 and see what happens: https://www.instagram.com/explore/tags/italy/?__a=1

YES.. It is a JSON with a lot of information that shows the first 30/40 post for the hashtag #italy.

And to go to the next posts?

You can use the end cursor that you can find from graphql -> hashtag -> edge_hashtag_to_media ->page_info -> end_cursor

Making the following reques https://www.instagram.com/explore/tags/italy/?__a=1&max_id=endcursor

In my tests I’ve been able to scrap about 13000 posts from an hashtag, so probably you cannot go to the first post at all (if the number or post are hight)

NB: The tricks __a=1 and max_id=endcursor are available also for the other endpoint (post, users and so on)

The Scraper

To automate the scraping I used Scrapy, a powerful and fast crawler.

You can take a look at my github repository Insagram Scraper.

Feel free to fork and change its behaviour to fit your needs.

I haven’t scraped all the information, only some useful fr my purposes: caption, locations (latitude and longitude), owner, display_url and so on.