Instagram Data Scraping from Public API
With Python and Scrapy: NO access-token or Authentication required
NB: Instagram changes its public APIs and probably my scripts need some reviews.. or no longer works…
Scrap data from Instagram is easy if you know the right endpoint.
For example take a look at the #italy hashtag: https://www.instagram.com/explore/tags/italy/
Try now to append the query string ?__a=1 and see what happens: https://www.instagram.com/explore/tags/italy/?__a=1
YES.. It is a JSON with a lot of information that shows the first 30/40 post for the hashtag #italy.
And to go to the next posts?
You can use the end cursor that you can find from graphql -> hashtag -> edge_hashtag_to_media ->page_info -> end_cursor
Making the following reques https://www.instagram.com/explore/tags/italy/?__a=1&max_id=endcursor
In my tests I’ve been able to scrap about 13000 posts from an hashtag, so probably you cannot go to the first post at all (if the number or post are hight)
NB: The tricks __a=1 and max_id=endcursor are available also for the other endpoint (post, users and so on)
To automate the scraping I used Scrapy, a powerful and fast crawler.
You can take a look at my github repository Insagram Scraper.
instagram-scraper - Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)github.com
Feel free to fork and change its behaviour to fit your needs.
I haven’t scraped all the information, only some useful fr my purposes: caption, locations (latitude and longitude), owner, display_url and so on.