How to help Ukrainian women find urgent help, hack Instagram API, and get reposted by a supermodel

Disclaimer: if you try it using your account, you will get banned by Instagram

Margo Roi
FinUA
19 min readDec 29, 2023

--

Original: https://57032b174f3c9bde7707-1206b93ed1b5526fc96734f329c5aa52.ssl.cf1.rackcdn.com/Cody-Rasmussen-Tanya-Kizko-1.jpg

Recently I was in a tech interview, and the person asked me the following:

Could you present something interesting from your previous work history? This could be an architecture description (application, infrastructure, or component architecture), or perhaps an intriguing “war story” and the lessons learned from it. If possible, please share this material in advance so we have time to review it.

For those who do not know me, I am Ukrainian. So when she suggested I write a “war story”. I interpreted that quite literally. I figured ok, since you want a “war story”, I got them.

This is the story of an app built over maybe 5 days and used to help Ukrainian women in the first phase of a full-scale invasion in March 2022.

February 2022

Back in February/March 2022, Russia attacked Ukraine. It was scary, and chaotic and a huge amount of the Ukrainian population started moving towards the western border. People were disoriented and scared and there was a shortage of network connection, medical supplies, etc.

During the chaos, one of the most prominent fashion bloggers Margarita Muradova started posting help requests in their Instagram stories.

As a result, it was picked up and several others started posting this to their Instagram stories. Bloggers with millions of followers were reposting cries for help from people who needed insulin, access to doctors, transportation, and shelter from the rockets. They reposted instructions on crossing the borders to Poland and finding a train to Lviv.

They called themselves “cable girls”.

For those who do not know, “cable girl” is a nickname for female radio operators. A radio operator is a person who is responsible for the operations of a radio system and the technicalities in broadcasting. These women were crucial in WW1 and WW2 for communication and data transmission.

Source: https://www.fbi.gov/image-repository/radio-operator.jpeg/image_view_fullscreen

The modern cable girls used Instagram. Among them were supermodels, fashion bloggers, designers, and photographers. Together their audience was millions and millions of Ukrainian women.

Instagram is not the best medium for urgent information

This approach worked because these women had millions of followers. Amongst the messages, there was a lot of useful information, such as how to cross borders, where to get medical help, and what are open pharmacies.

The only thing is: Instagram stories are the worst way to post urgent Information

  • It is not searchable
  • Instagram stories that disappear in 24h
  • It is not optimized in case of low internet access (which was the case for many bomb shelters)
  • Being able to find stuff in your area is hard

Then I had an idea: what if it was easier to find information?

The idea was to scrape Instagram stories from these ladies and turn them into searchable text. Sounds simple enough, right?

Actually writing the Instagram story scraper

Now, a very important disclaimer, also mentioned in the very beginning, this was all done over the course of 5 days in total, so the code was very much hacked together mess. Accept it. Move on.

A very fast way to do it is to create a very simple NodeJS express server.

server.js and public/index.html

import express from "express";
import http from "http";

const SERVER_PORT = 8008;

async function startServer() {
const app = express();
const server = http.createServer(app);

app.use(express.static("public"));

server.listen(SERVER_PORT, () =>
console.info(`Listening on port ${SERVER_PORT}.`)
);
}

startServer();

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Радістки</title>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" href="./styles/html5doctor.css" />
</head>
<body>
<div class="container">
<h1>Radistka</h1>
<div id="radistkas" class="grid"></div>
</div>
</body>
</html>

We can also add a simple API endpoint to our server.js file

app.get("/radistkas", async (req, res) => {
res.json({});
});

Now the question is, how do we get Insta stories data? Here is a thing: there is no official API from Instagram to get stories and content and their API docs are terrible.

After some googling, I found this npm package, which we ended up using.

In order to use it, you will need to provide it the target account ID, your Instagram ID, and your current session ID. Both you can find in your browser cookies.


// Get stories of Instagram
// id: account id for get stories
// userid: me id
// sessionid: value of cookies from Instagram
getStories({ id: 25025320, userid: 1284161654, sessionid: '' }).then(stories => {
console.log(stories)
})

In the server.js file the code will look smth like this:

app.get("/radistkas", async (req, res) => {
const stories = await getInstaStories(listOfRadistkas);
res.json(existingStories);
});

Now we can write the code for `getInstaStories`


import express from "express";
import http from "http";
// add insta stories to the server
import {
getStories as fetchStories,
getUserByUsername,
} from "instagram-stories";

// add your user ID and session ID
const myInstagramUserId = xxxxxxxxxxxxx;
const myInstaSessionIds = [
"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
];

// what account you want to scrape?
const listOfRadistkas = ["greenteanosugar"];

// potentially, we might want to have several sessions on different browsers,
// and randomize them
function getRandomSessionId() {
const instaSessionIndex =
Math.ceil(Math.random() * myInstaSessionIds.length) - 1;
return myInstaSessionIds[instaSessionIndex];
}

// Using promises, we fetch the stories from each insta user
export const getInstaStories = async (userNames) => {
const stories = await Promise.all(
userNames.map(async (userName) => {
const randomId = getRandomSessionId();
const result = await getUserByUsername({
username: userName,
userid: myInstagramUserId,
sessionid: randomId,
})
.then(async ({ user }) => {
if (user) {
return (
fetchStories({
id: user.id,
userid: myInstagramUserId,
sessionid: randomId,
})
.then(async ({ items }) => {
return items.map(
(
{ caption, image_versions2, taken_at, id, ...rest },
i
) => ({
id,
userName,
caption: caption?.text ?? "",
takenAt: new Date(taken_at * 1000),
imageUrl: image_versions2.candidates[0]?.url ?? "",
})
);
})
.catch((err) => {
console.log(
`Could not get a story (with randomId ${randomId}):`,
err
);

return [];
})
);
} else {
console.log("Could not get user:", userName, user);
return [];
}
})
.catch((e) => {
console.error(
`Below error occured when fetching data for IG user: '${userName}' impersonating as '${randomId}'.
It is possible the impersonation account has been blocked.`,

e
);
return [];
});

console.log(
`Impersonated with id '${randomId}' and got '${result?.length}' stories from user '${userName}'`
);
return result;
})
).catch((err) => console.log("error", err));
if (stories) {
return stories.flat();
} else {
return [];
}
};

// eslint-disable-next-line
// COMMENT: child mode, where the server runs
const SERVER_PORT = 8008;

async function startServer() {
const app = express();
// create http server and wrap the express app
const server = http.createServer(app);

app.use(express.static("public"));
app.get("/radistkas", async (req, res) => {
let stories = await getInstaStories(listOfRadistkas);
res.json(stories);
});

// important! must listen from `server`, not `app`, otherwise socket.io won't function correctly
server.listen(SERVER_PORT, () =>
console.info(`Listening on port ${SERVER_PORT}.`)
);
}

startServer();

If you did everything right, at this point your endpoint works and returns JSON like this

Great, now we only need to call the endpoint from the front end, for that we write a simple public/index.js script. We have an HTML element with an ID “#radistkas” and we will fetch the story data through the get request and display them inside that element.

let CURRENT_POST_DATA = [];

async function ajax(url) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.addEventListener("load", function () {
try {
resolve(this.responseText);
} catch (error) {
reject(error);
}
});
request.open("GET", url);
request.send();
request.addEventListener("error", reject);
});
}

const root = $("#radistkas");
function dataParser(postData) {
root.empty();
if (postData.map && postData.length) {
root.append(
postData
.filter((i) => i !== undefined)
.map(
(d) =>
`<div class="story">
<div class="story-image" id="${d.id + "abc"}">
<img src="data:image/jpg;base64,${
d.imageDataURI
}" />
<div>
Blogger: @${d.userName}
</div>
Image URL: <a href="${d.imageUrl}">Image url</a>
</div>
<div class="story-image-text">
Текст:
${d.imageText ?? "Нет текста"}
</div>
<div class="story-caption">
<span>
Время: ${new Date(d.takenAt).toLocaleString(
"uk-UA"
)}
</span>
</div>
</div>`
)
.join("")
);
}
}

async function main() {
// Load initial posts with pagination
await ajax(`/radistkas`)
.then((data) => {
console.log("data", JSON.parse(data));
dataParser(JSON.parse(data));
})
.catch((err) => console.error(err));
}

main();

Your UI looks ugly (for now), but it works

That is a POC: we do not want to re-scrape on each request. Not only this is super long and inefficient, but also a very fast way to get banned on Instagram.

N we need to store the story information. That is the thing. We do not want to re-fetch the same stories each time we load the page. Also, we should limit the number of requests to as little as possible. That is a real issue with Instagram: it is clever. It detects activity outside of “normal” and it can very easily ban your account. Happened to me at least a few times.

So, we are going to be clever. We will store the results in a database and run a cron job. You have probably already figured this is a low-tech approach. So we use SQLite.

sqlite.js

import path from "path";
import sqlite3 from "sqlite3";
import sanitizeHtml from "sanitize-html";

sqlite3.verbose();

const databasePath = "./radistka.db";

let db = null;

const createTables = async (db) =>
db
.exec(
`create table stories (
story_id text primary key not null,
user_id text,
story_text text,
story_caption text,
story_image_data_uri text,
story_image_url text,
story_image_text text,
taken_at int
);
`
)
.catch((err) => console.warn(err));

export const initDb = async () => {
db = await open({
filename: path.resolve(databasePath),
driver: sqlite3.Database,
mode: sqlite3.OPEN_READWRITE | sqlite3.OPEN_CREATE,
});

// TODO: Don't try to create tables if they already exist
await createTables(db);
};

export const getStoriesFromDb = async (offset, limit) =>
(
await db?.all(
`SELECT story_id, user_id, story_caption, story_image_data_uri, story_image_url, story_image_text, taken_at FROM stories ORDER BY taken_at DESC LIMIT ${limit} OFFSET ${offset}`
)
)?.map((raw) => {
return {
id: raw.story_id,
userName: raw.user_id,
caption: raw.story_caption,
imageDataURI: raw.story_image_data_uri,
imageUrl: raw.story_image_url,
imageText: sanitizeHtml(raw.story_image_text),
// TODO: tz?
takenAt: new Date(raw.taken_at),
};
}) ?? [];

export const putStoriesToDb = async (stories) =>
// TODO: Probably don't need to foreach
stories.forEach((story) => {
db?.run(
`
INSERT OR IGNORE INTO stories (story_id, user_id, story_text, story_caption, story_image_data_uri, story_image_url, story_image_text, taken_at) VALUES ($story_id, $user_id, $story_text, $story_caption, $story_image_data_uri, $story_image_url, $story_image_text, $taken_at)
`,
{
// TODO get text from story
$story_id: story.id,
$user_id: story.userName,
$story_text: "",
$story_caption: story.caption,
$story_image_data_uri: story.imageDataURI,
$story_image_url: story.imageUrl,
$story_image_text: story.imageText,
$taken_at: story.takenAt,
}
);
});

Now we just use that to populate the db. As you might notice, we use a different format to store images. That is because I was lazy and did not want to store actual image files. New additions to server.js

// using it to store images
import imageToBase64 from "image-to-base64";
....
export const addDataURIToStories = async (items) => {
const parsedItems = [];
const resolvables = [];

items.map((item) => {
if (item.imageUrl) {
resolvables.push(
imageToBase64(item.imageUrl).then((data) => {
parsedItems.push({
...item,
imageDataURI: data,
});
})
);
} else {
parsedItems.push({
...item,
});
}
});

await Promise.all(resolvables);

return parsedItems;
};
...
app.get("/radistkas", async (req, res) => {
let stories = await getInstaStories(listOfRadistkas);
// after we get stories, we process image data
stories = await addDataURIToStories(stories);
// and we put them into db
await putStoriesToDb(stories);
console.log(
`===== Stories fetched and inserted to DB (new stories added: ${stories.length}) =====`
);
res.json(stories);
});

We see images now! Now we are not technically correct.

We now shall retrieve the stories data from the DB.

app.get("/radistkas", async (req, res) => {
let stories = await getInstaStories(listOfRadistkas);
stories = await addDataURIToStories(stories);

await putStoriesToDb(stories);
console.log(
`===== Stories fetched and inserted to DB (new stories added: ${stories.length}) =====`
);
const existingStories = await getStoriesFromDb(0, 5000);
console.log(
`===== Stories fetched from DB (${existingStories.length}) =====`
);
res.json(existingStories);
});

Now, obviously, as I said, we do not want to fetch stories every time we make a request. So we have to use a cron job. I am pretty much using https://crontab.guru/ all the time to do cron jobs.

We install this package

node-cron 

and create a separate cron.js file that we start separately. This will be our recurring script that will run and populate the database. Great, now let's clean up, now we can isolate and do some refactoring:

server.js

import express from "express";
import http from "http";
import { initDb, getStoriesFromDb } from "./sqlite.js";


const SERVER_PORT = 8008;

async function startServer() {
const app = express();
// create http server and wrap the express app
const server = http.createServer(app);
await initDb();

app.use(express.static("public"));
app.get("/radistkas", async (req, res) => {

// we only need to get stories from DB
const existingStories = await getStoriesFromDb(0, 5000);
console.log(
`===== Stories fetched from DB (${existingStories.length}) =====`
);
res.json(existingStories);
});

// important! must listen from `server`, not `app`, otherwise socket.io won't function correctly
server.listen(SERVER_PORT, () =>
console.info(`Listening on port ${SERVER_PORT}.`)
);
}

startServer();

cron.js will be responsible for actually fetching and adding the stories to our local db

import cron from "node-cron";
import imageToBase64 from "image-to-base64";
import { putStoriesToDb, initDb } from "./sqlite.js";
import { getInstaStories } from "./insta.js";

const listOfRadistkas = ["greenteanosugar"];

const addDataURIToStories = async (items) => {
const parsedItems = [];
const resolvables = [];

items.map((item) => {
if (item.imageUrl) {
resolvables.push(
imageToBase64(item.imageUrl).then((data) => {
parsedItems.push({
...item,
imageDataURI: data,
});
})
);
} else {
parsedItems.push({
...item,
});
}
});

await Promise.all(resolvables);

return parsedItems;
};

const fetchStoriesAndPutToDb = async () => {
let stories = await getInstaStories(listOfRadistkas);
stories = await addDataURIToStories(stories);

await putStoriesToDb(stories);
console.log(
`===== Stories fetched and inserted to DB (new stories added: ${stories.length}) =====`
);
};

initDb();

cron.schedule(
"*/20 * * * *",
() => {
fetchStoriesAndPutToDb();
console.log("will execute every 20 minutes until stopped");
},
{
name: "my-task",
}
);

insta.js will contain the code to retrieve the Instagram stories

import {
getStories as fetchStories,
getUserByUsername,
} from "instagram-stories";

const myInstagramUserId = xxxxxxxxxxxxx;
const myInstaSessionIds = [
"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
];

function getRandomSessionId() {
const instaSessionIndex =
Math.ceil(Math.random() * myInstaSessionIds.length) - 1;
return myInstaSessionIds[instaSessionIndex];
}

export const getInstaStories = async (userNames) => {
const stories = await Promise.all(
userNames.map(async (userName) => {
const randomId = getRandomSessionId();
const result = await getUserByUsername({
username: userName,
userid: myInstagramUserId,
sessionid: randomId,
})
.then(async ({ user }) => {
if (user) {
return (
fetchStories({
id: user.id,
userid: myInstagramUserId,
sessionid: randomId,
})
// NOTE: library doesn't handle insta API failure well - rate limiting?
.then(async ({ items }) => {
return items.map(
(
{ caption, image_versions2, taken_at, id, ...rest },
i
) => ({
id,
userName,
caption: caption?.text ?? "",
takenAt: new Date(taken_at * 1000),
imageUrl: image_versions2.candidates[0]?.url ?? "",
})
);
})
.catch((err) => {
console.log(
`Could not get a story (with randomId ${randomId}):`,
err
);

return [];
})
);
} else {
console.log("Could not get user:", userName, user);
return [];
}
})
.catch((e) => {
console.error(
`Below error occured when fetching data for IG user: '${userName}' impersonating as '${randomId}'.
It is possible the impersonation account has been blocked.`,

e
);
return [];
});

console.log(
`Impersonated with id '${randomId}' and got '${result?.length}' stories from user '${userName}'`
);
return result;
})
).catch((err) => console.log("error", err));
if (stories) {
return stories.flat();
} else {
return [];
}
};

sqlite.js is the database API

import path from "path";
import sqlite3 from "sqlite3";
import sanitizeHtml from "sanitize-html";
import { open } from "sqlite";

sqlite3.verbose();

const databasePath = "./radistka.db";

let db = null;

const createTables = async (db) =>
db
.exec(
`create table stories (
story_id text primary key not null,
user_id text,
story_text text,
story_caption text,
story_image_data_uri text,
story_image_url text,
story_image_text text,
taken_at int
);
`
)
.catch((err) => console.warn(err));

export const initDb = async () => {
db = await open({
filename: path.resolve(databasePath),
driver: sqlite3.Database,
mode: sqlite3.OPEN_READWRITE | sqlite3.OPEN_CREATE,
});

// TODO: Don't try to create tables if they already exist
await createTables(db);
};

export const getStoriesFromDb = async (offset, limit) =>
(
await db?.all(
`SELECT story_id, user_id, story_caption, story_image_data_uri, story_image_url, story_image_text, taken_at FROM stories ORDER BY taken_at DESC LIMIT ${limit} OFFSET ${offset}`
)
)?.map((raw) => {
return {
id: raw.story_id,
userName: raw.user_id,
caption: raw.story_caption,
imageDataURI: raw.story_image_data_uri,
imageUrl: raw.story_image_url,
imageText: sanitizeHtml(raw.story_image_text),
// TODO: tz?
takenAt: new Date(raw.taken_at),
};
}) ?? [];

export const putStoriesToDb = async (stories) =>
// TODO: Probably don't need to foreach
stories.forEach((story) => {
db?.run(
`INSERT OR IGNORE INTO stories (story_id, user_id, story_text, story_caption, story_image_data_uri, story_image_url, story_image_text, taken_at) VALUES ($story_id, $user_id, $story_text, $story_caption, $story_image_data_uri, $story_image_url, $story_image_text, $taken_at)`,
{
// TODO get text from story
$story_id: story.id,
$user_id: story.userName,
$story_text: "",
$story_caption: story.caption,
$story_image_data_uri: story.imageDataURI,
$story_image_url: story.imageUrl,
$story_image_text: story.imageText,
$taken_at: story.takenAt,
}
);
});

public/index.js is our front-end to display our stories

let CURRENT_POST_DATA = [];

async function ajax(url) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.addEventListener("load", function () {
try {
resolve(this.responseText);
} catch (error) {
reject(error);
}
});
request.open("GET", url);
request.send();
request.addEventListener("error", reject);
});
}

// eslint-disable-next-line no-undef
const root = $("#radistkas");
function dataParser(postData) {
root.empty();
console.log("dataParser", postData.map && postData.length, postData);
if (postData.map && postData.length) {
root.append(
postData
.filter((i) => i !== undefined)
.map(
(d) =>
`<div class="story">
<div class="story-image" id="${d.id + "abc"}">
<img src="data:image/jpg;base64,${
d.imageDataURI
}" />
<div>
Blogger: @${d.userName}
</div>
Image URL: <a href="${d.imageUrl}">Image url</a>
</div>
<div class="story-image-text">
Текст:
${d.imageText ?? "Нет текста"}
</div>
<div class="story-caption">
<span>
Время: ${new Date(d.takenAt).toLocaleString(
"uk-UA"
)}
</span>
</div>
</div>`
)
.join("")
);
}
}

async function main() {
// Load initial posts with pagination
await ajax(`/radistkas`)
.then((data) => {
console.log("data", JSON.parse(data));
dataParser(JSON.parse(data));
})
.catch((err) => console.error(err));
}

main();

Now that every file seemingly does the job, we do not scrape Instagram stories on each server request anymore, we just get the images from our database with a cron job. Simple.

Now we can also add more ladies' stories there.

const listOfRadistkas = [
"greenteanosugar",
"kira.vintage",
"zhilyova",
"tanyakizko",
"qatista",
"fashionprovocation",
"moi_sofism",
];

And since there will be a lot of stories, we shall add pagination or infinite scroll.

index.js

let PAGES = 1;
let CURRENT_POST_DATA = [];

async function ajax(url) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.addEventListener("load", function () {
try {
resolve(this.responseText);
} catch (error) {
reject(error);
}
});
request.open("GET", url);
request.send();
request.addEventListener("error", reject);
});
}

function setCurrentPostData(newData) {
if (PAGES === 1) {
CURRENT_POST_DATA = Array.isArray(newData)
? newData
: newData
? [newData]
: [];
} else {
console.log("pages add to array", PAGES);
// TODO: for seome reason does not work, debug later
CURRENT_POST_DATA = [].concat(CURRENT_POST_DATA, newData);
console.log("CURRENT_POST_DATA", CURRENT_POST_DATA, newData);
}
}

window.addEventListener("scroll", (e) => {
if (
window.scrollY + window.innerHeight >=
document.documentElement.scrollHeight
) {
// TODO: Prevent further scroll to upload data
e.preventDefault();

PAGES++;

// TODO: On scroll, add images to array, this does not work properly yet
ajax(`/radistkas?page=${PAGES}`)
.then((data) => {
// Find the way to enrich the global variable
setCurrentPostData(JSON.parse(data));
// disable dataParser
// this is the one that cuased issues
dataParser(CURRENT_POST_DATA);
})
.catch((err) => console.error(err));
}
});

// eslint-disable-next-line no-undef
const root = $("#radistkas");
function dataParser(postData) {
root.empty();
console.log("dataParser", postData.map && postData.length, postData);
if (postData.map && postData.length) {
root.append(
postData
.filter((i) => i !== undefined)
.map(
(d) =>
`<div class="story">
<div class="story-image" id="${d.id + "abc"}">
<img src="data:image/jpg;base64,${
d.imageDataURI
}" />
<div>
Blogger: @${d.userName}
</div>
Image URL: <a href="${d.imageUrl}">Image url</a>
</div>
<div class="story-image-text">
Текст:
${d.imageText ?? "Нет текста"}
</div>
<div class="story-caption">
<span>
Время: ${new Date(d.takenAt).toLocaleString(
"uk-UA"
)}
</span>
</div>
</div>`
)
.join("")
);
}
}

async function main() {
// Load initial posts with pagination
await ajax(`/radistkas?page=${PAGES}`)
.then((data) => {
console.log("data", JSON.parse(data));

setCurrentPostData(JSON.parse(data));
// disable dataParser
dataParser(JSON.parse(data));
})
.catch((err) => console.error(err));
}

main();

and server.js

import express from "express";
import http from "http";
import { initDb, getStoriesFromDb, putStoriesToDb } from "./sqlite.js";

// eslint-disable-next-line
// COMMENT: child mode, where the server runs
const SERVER_PORT = 8008;

async function startServer() {
const app = express();
// create http server and wrap the express app
const server = http.createServer(app);
await initDb();

app.use(express.static("public"));
app.get("/radistkas", async (req, res) => {
const limit = 10;
const { page } = req.query;

const startIndex = (page - 1) * limit;
const existingStories = await getStoriesFromDb(startIndex, limit);
console.log(
`===== Stories fetched from DB (${existingStories.length}) =====`
);
res.json(existingStories);
});

// important! must listen from `server`, not `app`, otherwise socket.io won't function correctly
server.listen(SERVER_PORT, () =>
console.info(`Listening on port ${SERVER_PORT}.`)
);
}

startServer();

What are we doing this for?

Now let's add text extraction here. That is the main point, to make stories searchable by text. This way it is possible to actually search by text and also save bandwidth. We will use https://www.npmjs.com/package/ocr-space-api-wrapper package for it:

import { ocrSpace } from "ocr-space-api-wrapper";
const apiKey = "XXXXXXXXXXXX";

export const extractTextFromImageData = async (base64ImageData) => {
// Using the OCR.space default free API key (max 10reqs in 10mins) + remote file
const res3 = await ocrSpace(`data:image/jpg;base64,${base64ImageData}`, {
apiKey,
language: "rus",
});

if (res3.ErrorMessage) {
console.error(...res3.ErrorMessage);
return null;
} else if (!res3) {
console.error("OCR Space returned nothing!");
return null;
}
return res3.ParsedResults?.[0]?.ParsedText; // TODO: response shape is at https://ocr.space/OCRAPI#Response
};

And use it in our insta.js file

import {
getStories as fetchStories,
getUserByUsername,
} from "instagram-stories";
import { extractTextFromImageData } from "./ocr.js";


export const extractImageTextToStories = async (items) => {
const parsedItems = [];
const resolvables = [];

items.map((item) => {
if (item.imageUrl) {
resolvables.push(
extractTextFromImageData(item.imageDataURI).then((data) => {
parsedItems.push({
...item,
imageText: data,
});
})
);
} else {
parsedItems.push({
...item,
});
}
});

await Promise.all(resolvables);

return parsedItems;
};

and a cron job

import cron from "node-cron";
...
import { getInstaStories, extractImageTextToStories } from "./insta.js";

const listOfRadistkas = [
"greenteanosugar",
"kira.vintage",
"zhilyova",
"tanyakizko",
"qatista",
"fashionprovocation",
"moi_sofism",
"nastyamozgovaya",
"doina",
"kris.piterman",
"annamaria_koval",
"sashzayats",
"julia_stylelover",
"styleofliving",
"ellena_galant_girl",
"mejdunami",
"tamrikori",
"konstantin_koval_",
"olga__boncheva",
"little_puida",
"martasyrko",
];

const fetchStoriesAndPutToDb = async () => {
let stories = await getInstaStories(listOfRadistkas);
stories = await addDataURIToStories(stories);
stories = await extractImageTextToStories(stories);

await putStoriesToDb(stories);
console.log(
`===== Stories fetched and inserted to DB (new stories added: ${stories.length}) =====`
);
};

initDb();

cron.schedule(
"*/1 * * * *",
() => {
fetchStoriesAndPutToDb();
console.log("will execute every 20 minutes until stopped");
},
{
name: "my-task",
}
);

Last, but not least, search — the whole reason we have gathered here!

sqlite.js

export const searchStoriesFromDb = async (text, offset, limit) =>
(
await db?.all(
`SELECT story_id, user_id, story_caption, story_image_data_uri, story_image_url, story_image_text, taken_at FROM stories WHERE story_image_text LIKE '%${text}%' ORDER BY taken_at DESC LIMIT ${limit} OFFSET ${offset}`
)
)?.map((raw) => ({
id: raw.story_id,
userName: raw.user_id,
caption: raw.story_caption,
imageDataURI: raw.story_image_data_uri,
imageUrl: raw.story_image_url,
imageText: sanitizeHtml(raw.story_image_text),
takenAt: new Date(raw.taken_at),
})) ?? [];

public/index.html

<div class="search">
<input
id="search-input"
type="text"
onchange="search()"
placeholder="Введите текст для поиска..."
/>
<input id="fuzzy-search" type="checkbox" />
</div>

server.js

async function search() {
const searchString =
document.getElementById("search-input").value || undefined;
const fuzzyElement = document.getElementById("fuzzy-search");
document
.getElementById("search-input")
.addEventListener("change", function (e) {
SEARCH_WORD = e.target.value;
console.log(
"oninput",
e.target.value,
fuzzyElement.checked,
searchString
);
if (SEARCH_WORD.length > 0) {
console.log("SEARCH_WORD.length > 0");
ajax(`/search?search=${SEARCH_WORD}`)
.then((searchData) => {
console.log("search results from api", searchData);
// Find the way to enrich the global variable
setCurrentPostData(JSON.parse(searchData));
// disable dataParser
dataParser(JSON.parse(searchData));
})
.catch((err) => console.error(err));
} else {
console.log("SEARCH_WORD.length == 0");
CURRENT_POST_DATA = [];
ajax(`/radistkas?page=${1}`)
.then((data) => {
setCurrentPostData(JSON.parse(data));
// disable dataParser
dataParser(CURRENT_POST_DATA);
})
.catch((err) => console.error(err));
}
});
}

public/index.js

let PAGES = 1;
let CURRENT_POST_DATA = [];
let SEARCH_WORD = "";

async function ajax(url) {
return new Promise((resolve, reject) => {
const request = new XMLHttpRequest();
request.addEventListener("load", function () {
try {
resolve(this.responseText);
} catch (error) {
reject(error);
}
});
request.open("GET", url);
request.send();
request.addEventListener("error", reject);
});
}

function setCurrentPostData(newData) {
if (PAGES === 1) {
CURRENT_POST_DATA = Array.isArray(newData)
? newData
: newData
? [newData]
: [];
} else {
console.log("pages add to array", PAGES);
// TODO: for seome reason does not work, debug later
CURRENT_POST_DATA = [].concat(CURRENT_POST_DATA, newData);
console.log("CURRENT_POST_DATA", CURRENT_POST_DATA, newData);
}
}

window.addEventListener("scroll", (e) => {
if (
window.scrollY + window.innerHeight >=
document.documentElement.scrollHeight
) {
// TODO: Prevent further scroll to upload data
e.preventDefault();

PAGES++;

// TODO: On scroll, add images to array, this does not work properly yet
ajax(`/radistkas?page=${PAGES}`)
.then((data) => {
// Find the way to enrich the global variable
setCurrentPostData(JSON.parse(data));
// disable dataParser
// this is the one that cuased issues
dataParser(CURRENT_POST_DATA);
})
.catch((err) => console.error(err));
}
});

// eslint-disable-next-line no-undef
const root = $("#radistkas");
function dataParser(postData) {
root.empty();
console.log("dataParser", postData.map && postData.length, postData);
if (postData.map && postData.length) {
root.append(
postData
.filter((i) => i !== undefined)
.map(
(d) =>
`<div class="story">
<div class="story-image" id="${d.id + "abc"}">
<img src="data:image/jpg;base64,${
d.imageDataURI
}" />
<div>
Blogger: @${d.userName}
</div>
Image URL: <a href="${d.imageUrl}">Image url</a>
</div>
<div class="story-image-text">
Текст:
${d.imageText ?? "Нет текста"}
</div>
<div class="story-caption">
<span>
Время: ${new Date(d.takenAt).toLocaleString(
"uk-UA"
)}
</span>
</div>
</div>`
)
.join("")
);
}
}

async function search() {
const searchString =
document.getElementById("search-input").value || undefined;
const fuzzyElement = document.getElementById("fuzzy-search");
document
.getElementById("search-input")
.addEventListener("change", function (e) {
SEARCH_WORD = e.target.value;
console.log(
"oninput",
e.target.value,
fuzzyElement.checked,
searchString
);

if (SEARCH_WORD.length > 0) {
console.log("SEARCH_WORD.length > 0");
ajax(`/search?search=${SEARCH_WORD}`)
.then((searchData) => {
console.log("search results from api", searchData);
// Find the way to enrich the global variable
setCurrentPostData(JSON.parse(searchData));
// disable dataParser
dataParser(JSON.parse(searchData));
})
.catch((err) => console.error(err));
} else {
console.log("SEARCH_WORD.length == 0");
CURRENT_POST_DATA = [];
ajax(`/radistkas?page=${1}`)
.then((data) => {
setCurrentPostData(JSON.parse(data));
// disable dataParser
dataParser(CURRENT_POST_DATA);
})
.catch((err) => console.error(err));
}
});
}

async function main() {
// Load initial posts with pagination
await ajax(`/radistkas?page=${PAGES}`)
.then((data) => {
console.log("data", JSON.parse(data));

setCurrentPostData(JSON.parse(data));
// disable dataParser
dataParser(JSON.parse(data));
})
.catch((err) => console.error(err));
}

main();

At some point, Instagram will figure you out

And you will be blocked!

Impersonated with id '46745008602%3AZWGWjw0TzJ1ekO%3A24%3AAYecEW6uOt7bgkZ2DTWMCm5IGgHYjAfDv_IBo2gU_Co' and got '0' stories from user 'mejdunami'
Below error occured when fetching data for IG user: 'konstantin_koval_' impersonating as '46745008602%3AZWGWjw0TzJ1ekO%3A24%3AAYecEW6uOt7bgkZ2DTWMCm5IGgHYjAfDv_IBo2gU_Co'.
It is possible the impersonation account has been blocked. FetchError: invalid json response body at https://i.instagram.com/api/v1/users/web_profile_info/?username=konstantin_koval_ reason: Unexpected token < in JSON at position 0

Instagram is smart that way. Meta is a huge company. But let` s look at the ways to potentially fix this:

Create another Instagram account

This will not work, it will be banned instantly, as there has been no activity before.

Creating a HAR file

will duplicate the session, yes, but it will not solve the activity problem.

Randomize the cron job requests: might work?

cron.schedule("0/30 5-8,10-13,16-22 * * *", function () {
console.log("running a task every 20 minutes")
cronJob()
})

The ideal case would be:

  • randomized cron job, several sessions, preferably several accounts.
  • Different user agents while requesting
  • Improve request distribution: the more “random” it looks, the better
const myInstagramUserId = xxxxxxxxxxxxx;
const myInstaSessionIds = [
"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
];

Conclusions

This was a cool experiment and we did manage to help some women. The whole trend on Instagram “cable girls” lasted for about 4 weeks, and we did not feel the need to maintain or improve the code any further. Ways to improve it? Maybe using this package: https://www.npmjs.com/package/instagram-private-api

We did, as promised, get reposted by Tetiana Kizko, the supermodel

--

--