Member-only story

Build a PDF Text Extractor with FastAPI & MongoDB

3 min readDec 11, 2023

To create a FastAPI application that extracts text from a PDF and saves it to MongoDB, you’ll need a few additional Python libraries. Specifically, you’ll use PyPDF2 to handle PDF text extraction and pymongo to interact with MongoDB. Let's create the complete code:

First, you’ll need to install the necessary packages:

pip install fastapi[all] PyPDF2 pymongo loguru

Then, here is the complete FastAPI application code:

from fastapi import FastAPI, Request, UploadFile, File
import json
import PyPDF2
import pymongo
from loguru import logger
from io import BytesIO
import requests
import os

app = FastAPI()

# MongoDB setup
mongo_uri = os.getenv("MONGO_URI", "your_default_mongodb_uri")
client = pymongo.MongoClient(mongo_uri)
db = client.get_database("your_database")
collection = db.get_collection("your_collection")

@app.post("/process_pdf_url/")
async def process_pdf_url(request: Request):
    message = await request.json()
    logger.info(f"Received message: {message}")

    # Assuming the message contains the URL of the PDF file
    pdf_url = message.get("pdf_url")

    # Download and process the PDF
    return await process_pdf(pdf_url)

@app.post("/process_pdf_file/")
async def process_pdf_file(file: UploadFile = File(...)):
    # Save file locally for processing
    contents = await…

Build a PDF Text Extractor with FastAPI & MongoDB

Written by Saverio Mazza

No responses yet