Instagram Data Scraping from Public API

Andrea Tarquini
Feb 7, 2018 · 2 min read

With Python and Scrapy: NO access-token or Authentication required

NB: Instagram changes its public APIs and probably my scripts need some reviews.. or no longer works…

The endpoint

Scrap data from Instagram is easy if you know the right endpoint.

For example take a look at the #italy hashtag:

Try now to append the query string ?__a=1 and see what happens:

YES.. It is a JSON with a lot of information that shows the first 30/40 post for the hashtag #italy.

And to go to the next posts?

You can use the end cursor that you can find from graphql -> hashtag -> edge_hashtag_to_media ->page_info -> end_cursor

Making the following reques

In my tests I’ve been able to scrap about 13000 posts from an hashtag, so probably you cannot go to the first post at all (if the number or post are hight)

NB: The tricks __a=1 and max_id=endcursor are available also for the other endpoint (post, users and so on)

The Scraper

To automate the scraping I used Scrapy, a powerful and fast crawler.

You can take a look at my github repository Insagram Scraper.

Feel free to fork and change its behaviour to fit your needs.

I haven’t scraped all the information, only some useful fr my purposes: caption, locations (latitude and longitude), owner, display_url and so on.

Andrea Tarquini

Written by

#software #developer and #engineer ➙ Proud #italian 🇮🇹 #geek ➙ #fullstack and #maker for fun

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade