Question and Answering made easy from PDF documents using Bert Model!

Anuradha Karuppasamy
FalabellaTechnology
2 min readOct 28, 2021

Have you ever thought you could get answers for questions from PDF without manual Search?

Yes it is possible .

Using Bert Model 

If we need to get information out of PDF files which can be invoice documents medical documents, Bank documents, Annual reports, Quarterly report, corporate documents of different companies . Instead of going through enormous documents and searching for answers for specific context questions manually , we can get the contextual answers for those questions using Bert model

SOLUTION:

You require the below packages and installs:

Packages that you need to use :

CDQA (Closed Domain Question Answering system).

This package is built on top of hugging face transformers library

Pandas Obviously 

Use the Function in CDQA to take pdf file or other format and convert in to pandas data frame (Sub packages of CDQA)

QA pipeline needs to be installed from CDQA pipeline

You need to Build pipeline and fit the document (Either trained datasets or custom one)

You need to do Pdf_converter that converts the documents to pandas dataframe

Download_model — download the pretrained model (Bert squad 1.1 model trained on squad datasets)

SQUAD is standford question and answer datasets), questions are Wikipedia articles

Steps to follow in order:

1. Download pretrained model (Bert Squad 1.1 )or custom one

2. Download all the PDF documents

3. Convert those documents to pandas dataframe using pdf_converter

4. Build you own Q&A pipeline using custom datasets or SQUAD DATASETS

5. You can view the title and pargraphs (in DF)

6. You can set the column width of the dataframe to view entire paragraphs

7. Create qapipline

8. Fit reader with custom dataset and fine tune model

9. fit retriever — fit the pipeline on the downloaded corpus

10. Mention the Query = “put the query you want to ask to search in document”

11. Result = cdqa_pipeline.predict(query,n)

Above one will get the Top n results.

Code can be customized to automate more questions

Happy Coding! All the best to get the results you need!

--

--

Anuradha Karuppasamy
FalabellaTechnology

Artificial Intelligence Expert and Mentor| AI { ML (DL) } Professional