Batch extraction of PDF specified area content to Excel and batch renaming according to the title of the first page in the PDF Software Release

PointCloud-Slam-Image-Web3
Python & Other
Published in
2 min readApr 18, 2024

First of all, the PDF needs to be an electronic version, not a picture or the unselectable one.

Requirement 1: If I have a large number of PDF electronic documents in the same format, I need to extract numbers or text from specific regions

Requirement 2: I have a batch of PDF documents, but the names of the files are garbled, I need to batch rename these files according to the title of the first page of the PDF file

Requirement 1 Idea: We arbitrarily select a PDF file as a sample, and then use the code to mark the area to be extracted with a box, and then save the coordinates of these areas, and then batch process each PDF, according to the saved coordinates of these areas to extract the corresponding position of text or numbers

The shortcomings of this approach and the points that need attention:

1 The location of the data to be extracted from each batch processing file is the same. For example, the number to be extracted from the first PDF file is located at the [100, 100] coordinate, so the number to be extracted from each subsequent file must be located at this location. If there is any change, the required data will not be extracted

2 If the extracted text is not complete, it means that the box selected may be a little smaller. I have set a function to increase a certain area separately in my code.

Requirement 2 Ideas: The names of a batch of PDF documents are all garbled characters. I need to batch rename these files according to the title of the first page of the PDF file. In fact, it is very simple to parse the PDF file, then obtain the first line of content, and then rename the file. This code is not complicated, so it is not included on this page.

Welcome to try it out~

Download link: https://pan.baidu.com/s/1WQQ8kaDilaagjoK5IrYZzA

Extraction code: 1111

If you have any questions or customized development requirements, you can contact me Email or WhatsApp : lonlonago@foxmail.com

--

--

PointCloud-Slam-Image-Web3
Python & Other

Familiar with point cloud data and image processing, interested in web3, take customization or consulting needs, enjoy work remotely, lonlonago@foxmail.com