Uncovering AI Insights from a Single PDF: Part 4 — Conclusion
To or To-Not Add a Text Splitter
I would highly recommend the reader to go through Part 1 , Part 2 and Part 3 , if you haven’t already, before reading this final part.
Just to put things into perspective, let’s take a quick look at the system we started with,
What changes are going to be a part of version 5 (version 3 and 4 were covered in Part-3 of this series)? Here’s the list —
- Text Splitter
- Fixed — Warning Messages from Logs
- Summarizer??
Text Splitter
In the previous versions, we were using a pdf extractor (PyPDF and pdfplumber) to read from pdf files page by page. We then used the page content as one single chunk.
Since LLMs work with tokens and LLM owner’s charge by tokens, passing a huge chunk of data to the LLM is a costly business. Another drawback of that approach is it adds a lot of irrelevant content for the LLM to parse through affecting its performance.