Crawl Instagram profiles and posts the most efficient way possible without FB Graph API

In this article, we’re going to look at how you could fetch Instagram profile and post data without the need to get into FB’s Graph API. Before we begin, let me be clear that this technique only works only on public accounts.

Before we begin, let me give you a context of the problem that leads to this solution. I worked on building a platform that could leverage Instagram influencers to connect with brands for paid promotion easily. What we needed was a database of Influencers. Initially, we thought of running a selenium script to scrape the data out of the Instagram website later to realize that it was both time-consuming and resource-intensive. We thought of using FB Graph API, but that would require us to go through app review & business verification first, which was impossible for the time. So, we needed to figure out an efficient and low-cost way to crawl the Instagram profile. We did a lot googling here and there, looked at APIs of the Instagram website, and finally found a way to do this, which this article is really about.

So, What we’re going to do here is call some Instagram APIs which are open to all and do not require any token or authentication. It’s like browsing a profile, but instead of fancy rendering, you get JSON formatted data.

I’m going to separate this article into three sections

  1. getting profile data such as the username, profile image, profile description
  2. getting post related data
  3. some limitations of this technique

Get Profile Data

Well, It’s relatively easy to get profile data. All you need to do is replace the placeholder from the URL below with the Instagram handle of the profile you want to crawl and hit the URL. And that’s it.<username_here>/?__a=1

You’ll get a response similar to the one shown below.

JSON formatted Instagram profile

Look for edge_owner_to_timeline_media for post related data. Here you’ll only get a maximum of 12 recent posts. Shown below is a snippet of all data you get for an Instagram post.

Post info inside edge_owner_to_timeline_media.edges

Get Post Data

The above API gets you 12 recent posts, but if you want more than 12 or all of the posts, it gets a little tricky because there is a lot more in getting posts of a user than merely replacing the username.

I had to do a bit of research on the API pattern of Instagram’s main website and how it fetches posts. But, I figured it.

Hit profile API (the one we just discussed). Look for key “id” inside “user” and key “end_cursor” inside “edge_owner_to_timeline_media.page_info”.

page info inside edge_owner_to_timeline_media

If the value of “has_next_page” is True. Copy the “end_cursor” value and replace it with the end_cursor_here placeholder in the URL below without trailing “==”. Also, replace the instagram_account_id_here with the user id (which is 7370820917 in my case).<instagram_account_id_here>%22%2C%22first%22%3A50%2C%22after%22%3A%22<end_cursor_here>%3D%3D%22%7D

And your API should look like this.

Hit the API, and you should get a response similar to the one shown below, containing a maximum of 50 posts and page_info to crawl posts further.

posts info with max 50 posts and page_info

Typically this response has some extra info that you don’t see in profile API’s post data. Look for the screenshot below to get an idea of the type of data you can get with this API.

Single post

Now let’s discuss some of the limitations or problems you may bump into using this technique.

  1. Rate limiting: Instagram allows a maximum of around 200–300 request hits per hour. After reaching the limit, you’ll start getting the server error.
  2. No Insight Data: You cannot get Instagram insight data using this API. Look into FB Instagram Insight API.
  3. No Private Account Data: Since this works on publically available API’s, you cannot get private account data.

I’m a young, passionate and nerdy software developer. Love reading articles, watching movies, huge marvel fanboy.