How to Extract Data From 100’s PDFs in 2 Minutes Using Python?

It could be a solution if you used to extract data from more than PDF file on a regular basis i.e. monthly, weekly .. etc.

Mohamed Abdelsalam
The Startup

--

Photo by Omid Kashmari on Unsplash

Part 1

Introduction

Converting the PDF file to text nowadays is not that hard as compared to the recent years and even converting the non-text PDF file is much easier. Today I’ll explain how to read couple of lines and numbers from each file instead of copy and past the Word file after converting it.

The Story

My mate was struggling every month during the monthly closing process (The closing here refer to the accounting books closing). I get closer to that area to understand the obstacle and to look to it from a different angle. After a while, I understood the case and started to find a solution and the solution was python.

  1. First Step
have a look to the folder

You should study the folder and make sure that all files are PFD formats only. Even though you can solve this using endwith(.pdf) function, but let’s make it straight.

--

--