Using feather-extract in Python 3.10

Jonathan
3 min readMay 31, 2024

--

feather-extract is a Python package designed to extract handwritten data from PDF documents, format the extracted data, and save it as an Excel workbook. This package is particularly useful for businesses in the bar and restaurant industry that need to manage inventory.

Getting Started

To get started, open a new notebook at https://colab.research.google.com/ or open a jupyter notebook / terminal environment locally and run the following command:

!pip install feather-extract

(Remove the ! if using a terminal)

Import the get_form Module

Import the get_form module from the feather-extract package. This module allows you to download a template form to your local directory.

from feather_extract import get_form
blank_form = get_form
filled_out_form = get_form(filled_out=True)

Blank Form:

Filled Out Form:

These are the two forms loaded into the get_form module. By running the function, you will have the form downloaded locally as a PDF. You can fill out the blank form and scan it into your environment using Dropbox or any number of scanning services for best results.

Note that the feather_extract package is trained on these forms, so if you would like a form tailored to your specific use case, either schedule a demo at https://www.featherdata.io/ or email me at awsjonathan99@gmail.com

Import the extract_text_from_document Module

Import the extract_text_from_document module, which allows you to extract the handwritten text from the inputted document. For our example, we will be using the pre-filled form seen above.

from feather_extract import extract_text_from_document
extraced_text = extract_text_from_document(filled_out_form)

You will be prompted to enter your Azure API key and endpoint. In future versions of the package, we will be removing this aspect, but in its current form, Azure Form Recognizer is the best we can do for the cost of running the software. You can sign up for a free account here.

How to Find Your API Key and Endpoint

Note that these are example keys for the tutorial and will not work if you use them. Getting your own keys is free.

  1. Search for “document intelligence (form recognizer)” on Azure and create an instance.
  2. Open your document intelligence instance (named feather-example) here.
  3. Click “Manage keys” and view your keys and endpoints. This is what you’ll enter.

Import the format_extracted_text Module

Import the format_extracted_text module, which allows you to format the data that you extracted from your document.

from feather_extract import format_extracted_text
formatted_text = format_extracted_text(formatted_text)

Import the save_to_excel Module

Import the save_to_excel module, and save your formatted data to an Excel file. save_to_excel takes two arguments: (formatted_text, name_of_output_file). Your output will be saved locally to your machine.

from feather_extract import save_to_excel
save_to_excel(formatted_text, "example.xlsx")

Original:

Output:

All together:

from feather_extract import *

form = get_form(filled_out=True)
extracted_text = extract_text_from_document(form)
formatted_text = format_extracted_text(extracted_text)
save_to_excel(formatted_text, "your-workbook-name.xlsx")

Hope you enjoyed! Contact me at awsjonathan99@gmail.com with any questions.

--

--