The tool was authored by Adam Smith, in a pilot project led by Daniel Angus and Nicholas Carah.
Instamancer supports Instagram research by collecting public Instagram posts and associated metadata by using public hashtags or accounts as queries.
The outputs are formatted as JSON or CSV files, with images, and videos downloaded in jpg and mp4 formats. The library can allow for batch scraping over several different queries. Our Insta-explorer data visualisation tool enables simple exploration of downloaded material.
In this post we walk you through how to download and install Instamancer. In our next post we’ll show you how to use it to scrape Instagram images, video and their metadata.
How does Instamancer work?
Using a browser for large-scale scraping jobs is memory intensive. To circumvent this Instamancer uses an innovative scaping technique that we call ‘grafting’. In grafting, the scraper intercepts and saves the URL and headers of each request, and then after a certain number of interactions with the page it will restart the browser and navigate back to the same page.
Another specific feature to scraping Instagram is how it sends limited information through its feed API. To get extra metadata such as the tagged users, and comments, Instamancer can open new tabs for each post that it scrapes, and then read the metadata from memory. In contemporary web applications like Instagram, there is an invisible state held in memory that is not necessarily being reflected in the rendered text at any given moment. By accessing this memory Instamancer can reveal processes and data that are building that internal state by fetching data from the API. Older scraping techniques do not do this, potentially missing important platform data.
Downloading and installing Instamancer
Instamancer is an open-source tool, and can be accessed (for MacOS and Windows) via GitHub at https://github.com/ScriptSmith/instamancer
Two pieces of software are required to be downloaded and installed on your computer before you can install the Instamancer software. These are Git (a tool to gather packages from GitHub) and Node.js (a web-based software library).
• First, go to: http://git-scm.com/downloads and install Git from the download page. There are download and install options for both MacOS and Windows. Select the relevant option and click though the prompts;
• Then, go to: https://nodejs.org/en/download/ there are download and install options for both MacOS and Windows. Select the relevant option and click though the prompts.
After downloading and installing Git and Node.js you shouldn’t need to restart your computer; however, if you do encounter any issues try resolving this by restarting your computer.
Once you have both Git and Node.js successfully installed, the next step is to install Instamancer. To do this, you will need to open a command line. A command line is basically a text interface of the computer that can be used to input instructions and commands to tell it what to do. To access command line in Windows this is done via Command Prompt (or PowerShell), and on MacOS via Terminal.
To open Command Prompt (or PowerShell) on Windows:
• On Windows 10: Open the start menu and go to the shortcuts folder called “Windows System”. Pressing the dropdown menu should reveal a shortcut to open the Command Prompt application. Right click on the shortcut, press “More”, and press “Run as Administrator”.
• For Windows 8: Go to the start screen, press “All Apps”, and scroll right until the “Windows System” folder shows up. You can find Command Prompt there.
• For Windows 7: Open the start menu and click on “All Programs”. Click on “Accessories” and you’ll find the Command Prompt shortcut. Right click on the shortcut and press “Run as Administrator”.
To open Terminal on MacOS:
• Click the Launchpad icon in the Dock, type Terminal in the search field, then click Terminal.
• In the Finder , open the /Applications/Utilities folder, then double-click Terminal.
• This should open up an app with a black background. When you see your username followed by a dollar sign, you’re ready to start using command line.
Once you have a Command Prompt (Windows) or Terminal (MacOS) open you will need to run a series of commands to install Instamancer. There are a few things you need to bear in mind when you’re typing commands in Command Prompt or Terminal. Firstly, each character matters, including spaces. So, when you’re executing a command, make sure you include the spaces and that the characters are in the correct case.
To enter the command to install Instamancer, use the mouse/mouse pad to navigate to the command line window to make sure that’s where your keystrokes will go, then type the following commands, all in lower case, before pressing the Enter key to run it.
• cd\ (then press the Enter key)
• git clone https://github.com/ScriptSmith/instamancer.git (then press the Enter key)
• cd instamancer (then press the Enter key) if this doesn’t work try this first: md instamancer (then press the Enter key)
Then: On Windows, enter the following commands
• npm install (then press the Enter key)
• npm run build (then press the Enter key)
• npm install -g (then press the Enter key)
Then: For Mac, enter the following commands (it will prompt for you to enter your password, at which point do so, noting it won’t show these keystrokes but is accepting what you enter behind the scenes, followed by Enter key)
• sudo npm install (then press the Enter key)
• sudo npm run build (then press the Enter key)
• sudo npm install -g (then press the Enter key)
Now you have successfully installed Instamancer you’re ready to begin scaping Instagram data. Check out our next post for instructions on how to use the tool.