Check your Indian Railway PNR status with few lines of code | Python

Animesh Singh
6 min readDec 26, 2022

--

In pursuit to share my knowledge about Python and its wonderful libraries, I have come up with another very interesting article/tutorial.

The playwright is a web automation tool by Microsoft and it is very powerful and easy to implement in your programs. It is compatible with almost all languages and it is highly recommended to those who are started learning automation and want a simple & easy to use the tool.

I myself published many articles on Playwright with Python. You can check them out if you want to know the basics.

In this tutorial, we are building a PNR status checker app. If you live in India and travel by train, then you will be aware of this thing.

Enough of the talk, Let's get started.

Step-by-step details of this tutorial :

https://www.indianrail.gov.in/enquiry/PNR/PnrEnquiry.html?locale=en
  • Entering the PNR number
  • Solving the captcha
  • Submitting the captcha
  • Scraping details/status

You will require:

  • Basic knowledge of the Playwright library. How to use different locators and how to iterate through elements.
  • The idea of using Pillow library and Tesseract (not really necessary)
  • Knowledge of List comprehension

I have divided this program into three functions which will avoid any conflicts between the variables we used.

Installing the tesseract application :

Download the exe file from here https://drive.google.com/file/d/1upj_gNh7m0yBFfWB3fxY9Hg8qU2X1KVF/view?usp=sharing. Install it in C:\Program Files\Tesseract-OCR.

Step 1: Install the required dependencies

After installing tesseract, copy and paste these commands into your cmd.

pip install playwright
playwright install
pip install pytesseract

Step 2: Create a new python file and import libraries.

from playwright.sync_api import sync_playwright
import pytesseract
from PIL import Image

Step 3: Create a skeleton.

def main(pnr):
pass

if __name__ == "__main__":
pnr = input("Enter PNR: ")
main(pnr)

We will ask the user to enter the PNR number.

Step 4: Visting the website and capture the screenshot of the captcha.

def main(pnr):
with sync_playwright() as p:
browser = p.chromium.launch(headless= False)
page = browser.new_page()
page.goto("https://www.indianrail.gov.in/enquiry/PNR/PnrEnquiry.html?locale=en")
page.fill("#inputPnrNo", pnr)
page.locator("#modal1").click()
page.locator("#CaptchaImgID").screenshot(path="screenshot.png")
ans = image_to_string("screenshot.png")
page.fill("#inputCaptcha", str(ans))
page.locator("#submitPnrNo").click()
Captcha window
Captured image

The line-to-line explanation:

  • Visits the website.
  • Enter the PNR entered by the user
  • Click on submit button
  • Takes the screenshot of the captcha which appears in a popup window and saves it in CWD.
  • Calls the image_to_string function and passes the screenshot in it. It will return a number which is nothing but the solution to a simple equation asked in the captcha.
  • Fills in the answer in the input box and submit the captcha.
  • After submitting the captcha, a new window will appear showing the status of the PNR.
Result page

Let us look into image_to_string function

def image_to_string(image):

img = Image.open(image)
pytesseract.pytesseract.tesseract_cmd ='C:/Program Files/Tesseract-OCR/tesseract.exe'
result = pytesseract.image_to_string(img)
eqn = result
char_remov = ["?", "="]
for i in char_remov:
eqn = eqn.replace(i, "")
solved = eval(eqn)
return(solved)

The line-to-line explanation :

  • open method of Image library opens the image in a variable img.
  • Pytesseract path needed to be provided.
  • the result will store the converted string value.
  • Assigns the result a variable “eqn”.
  • Equations are in the form “ a + b =?” or “a — b =?”. We need to remove “=” and “?” from the equation to pass it in the python eval method.
  • eval function solves the expression and we are storing it in the “solved” variable and returning it.

Step 5: Scrapping the results.

All the details are available in two tables. Our goal is to store all table titles in one list and all the corresponding records in another list. We can then create a dictionary with titles as keys and records as values like {“Train Number”: “123456”}

Journey details table
Passenger details table
def main(pnr):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.indianrail.gov.in/enquiry/PNR/PnrEnquiry.html?locale=en")
page.fill("#inputPnrNo", pnr)
page.locator("#modal1").click()
page.locator("#CaptchaImgID").screenshot(path="screenshot.png")
ans = image_to_string("screenshot.png")
page.fill("#inputCaptcha", str(ans))
page.locator("#submitPnrNo").click()
journey_titles = page.locator("#journeyDetailsTable>>thead >> tr >> th")
passenger_titles = page.locator("#psgnDetailsTable>>thead >> tr >> th")
journey_records = page.locator("#journeyDetailsTable>>tbody >>tr >>td")
passenger_records = page.locator("#psgnDetailsTable>>tbody >>tr >>td")

title_list = iterator(journey_titles) + iterator(passenger_titles)
record_list = iterator(journey_records) + iterator(passenger_records)

The line-to-line explanation :

  • journey_titles & passenger_titles hold the locators for titles from the journey details and passenger details table
  • journey_records & passenger_records hold the locators for records from the journey details and passenger details table.
  • iterator function accepts a locator and returns a list with all titles and all records.
  • We are adding both titles lists and assigning them a variable “title_list”.
  • Similarly, we are adding both records lists and assigning them a variable “record_list”.

Lets look into the iterator function.

def iterator(locator):
lst =[]
for info in range(locator.count() ):
raw_text = locator.nth(info).text_content()
text = raw_text.replace("\n", "")
text = text.replace("\t", "")
lst.append(text)
return lst

The line-to-line explanation :

  • Crates an empty list to store the data.
  • raw_text will store the text content of each table row (tr) or table data(td)
  • Some specific types of data like the Date of the Journey and the Train number have a new line(/n) and new tab(/t) characters in them. We will remove all those.
  • Appends the data(text) to the list(lst) and returns the list

Step 6: Processing the result and printing it out :



def main(pnr):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.indianrail.gov.in/enquiry/PNR/PnrEnquiry.html?locale=en")
page.fill("#inputPnrNo", pnr)
page.locator("#modal1").click()
page.locator("#CaptchaImgID").screenshot(path="screenshot.png")
ans = image_to_string("screenshot.png")
page.fill("#inputCaptcha", str(ans))
page.locator("#submitPnrNo").click()
journey_titles = page.locator("#journeyDetailsTable>>thead >> tr >> th")
passenger_titles = page.locator("#psgnDetailsTable>>thead >> tr >> th")
journey_records = page.locator("#journeyDetailsTable>>tbody >>tr >>td")
passenger_records = page.locator("#psgnDetailsTable>>tbody >>tr >>td")

title_list = iterator(journey_titles) + iterator(passenger_titles)
record_list = iterator(journey_records) + iterator(passenger_records)

data = {title_list[i]: record_list[i] for i in range(len(title_list))}
for i in data:
print(i,":" ,data[i])
browser.close()
  • Now we have two lists; title_list(all titles) and record_list(all records). Using list comprehension, we are creating a dictionary with keys as titles and values as their corresponding records.
  • Printing out each key-value pair.
  • Closing the browser.

Complete Code :

from playwright.sync_api import sync_playwright
import pytesseract
from PIL import Image

def iterator(locator):
lst =[]
for info in range(locator.count() ):
raw_text = locator.nth(info).text_content()
text = raw_text.replace("\n", "")
text = text.replace("\t", "")
lst.append(text)
return lst

def image_to_string(image):

img = Image.open(image)
pytesseract.pytesseract.tesseract_cmd ='C:/Program Files/Tesseract-OCR/tesseract.exe'
result = pytesseract.image_to_string(img)
eqn = result
char_remov = ["?", "="]
for i in char_remov:
eqn = eqn.replace(i, "")
solved = eval(eqn)
return(solved)




def main(pnr):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.indianrail.gov.in/enquiry/PNR/PnrEnquiry.html?locale=en")
page.fill("#inputPnrNo", pnr)
page.locator("#modal1").click()
page.locator("#CaptchaImgID").screenshot(path="screenshot.png")
ans = image_to_string("screenshot.png")
page.fill("#inputCaptcha", str(ans))
page.locator("#submitPnrNo").click()
journey_titles = page.locator("#journeyDetailsTable>>thead >> tr >> th")
passenger_titles = page.locator("#psgnDetailsTable>>thead >> tr >> th")
journey_records = page.locator("#journeyDetailsTable>>tbody >>tr >>td")
passenger_records = page.locator("#psgnDetailsTable>>tbody >>tr >>td")

title_list = iterator(journey_titles) + iterator(passenger_titles)
record_list = iterator(journey_records) + iterator(passenger_records)
data = {title_list[i]: record_list[i] for i in range(len(title_list))}
for i in data:
print(i,":" ,data[i])
browser.close()

if __name__ == "__main__":
pnr = input("Enter PNR: ")
main(pnr)

Output:

A few points should be taken care of while using this program:

  • This is purely an informational tutorial and the program built is very basic without any kind of error handling.
  • It will work only if PNR is valid and will throw an error if you enter any random data as PNR. You must know how to read, understand and handle those errors.
  • Sometimes, tesseract may fail to correctly read the captcha (It happens very rarely). In that case, you should rerun the program.
  • Low internet connectivity can lead to a time-out error.

Keeping the above point in mind, try to run the program and debug it with error handling to make it 100% accurate. You will learn many different things in this process.

I will upload a new blog where I will be connecting this program to a PyQt app. Do read that article.

Thanks for reading and happy learning.

And yes, follow me if you want more such interesting articles from me.

Animesh Singh

--

--