Member-only story
Build a PDF Text Extractor with FastAPI & MongoDB
3 min readDec 11, 2023
To create a FastAPI application that extracts text from a PDF and saves it to MongoDB, you’ll need a few additional Python libraries. Specifically, you’ll use PyPDF2
to handle PDF text extraction and pymongo
to interact with MongoDB. Let's create the complete code:
First, you’ll need to install the necessary packages:
pip install fastapi[all] PyPDF2 pymongo loguru
Then, here is the complete FastAPI application code:
from fastapi import FastAPI, Request, UploadFile, File
import json
import PyPDF2
import pymongo
from loguru import logger
from io import BytesIO
import requests
import os
app = FastAPI()
# MongoDB setup
mongo_uri = os.getenv("MONGO_URI", "your_default_mongodb_uri")
client = pymongo.MongoClient(mongo_uri)
db = client.get_database("your_database")
collection = db.get_collection("your_collection")
@app.post("/process_pdf_url/")
async def process_pdf_url(request: Request):
message = await request.json()
logger.info(f"Received message: {message}")
# Assuming the message contains the URL of the PDF file
pdf_url = message.get("pdf_url")
# Download and process the PDF
return await process_pdf(pdf_url)
@app.post("/process_pdf_file/")
async def process_pdf_file(file: UploadFile = File(...)):
# Save file locally for processing
contents = await…