Member-only story

Build a PDF Text Extractor with FastAPI & MongoDB

Saverio Mazza
3 min readDec 11, 2023

To create a FastAPI application that extracts text from a PDF and saves it to MongoDB, you’ll need a few additional Python libraries. Specifically, you’ll use PyPDF2 to handle PDF text extraction and pymongo to interact with MongoDB. Let's create the complete code:

First, you’ll need to install the necessary packages:

pip install fastapi[all] PyPDF2 pymongo loguru

Then, here is the complete FastAPI application code:

from fastapi import FastAPI, Request, UploadFile, File
import json
import PyPDF2
import pymongo
from loguru import logger
from io import BytesIO
import requests
import os

app = FastAPI()

# MongoDB setup
mongo_uri = os.getenv("MONGO_URI", "your_default_mongodb_uri")
client = pymongo.MongoClient(mongo_uri)
db = client.get_database("your_database")
collection = db.get_collection("your_collection")

@app.post("/process_pdf_url/")
async def process_pdf_url(request: Request):
message = await request.json()
logger.info(f"Received message: {message}")

# Assuming the message contains the URL of the PDF file
pdf_url = message.get("pdf_url")

# Download and process the PDF
return await process_pdf(pdf_url)

@app.post("/process_pdf_file/")
async def process_pdf_file(file: UploadFile = File(...)):
# Save file locally for processing
contents = await…

--

--

Saverio Mazza
Saverio Mazza

Written by Saverio Mazza

Physicist by degree, Data Engineer and Software Engineer by profession and passion

No responses yet