feather-extract
is a Python package designed to extract handwritten data from PDF documents, format the extracted data, and save it as an Excel workbook. This package is particularly useful for businesses in the bar and restaurant industry that need to manage inventory.
Getting Started
To get started, open a new notebook at https://colab.research.google.com/ or open a jupyter notebook / terminal environment locally and run the following command:
!pip install feather-extract
(Remove the !
if using a terminal)
Import the get_form
Module
Import the get_form
module from the feather-extract
package. This module allows you to download a template form to your local directory.
from feather_extract import get_form
blank_form = get_form
filled_out_form = get_form(filled_out=True)
Blank Form:
Filled Out Form:
These are the two forms loaded into the get_form
module. By running the function, you will have the form downloaded locally as a PDF. You can fill out the blank form and scan it into your environment using Dropbox or any number of scanning services for best results.
Note that the feather_extract
package is trained on these forms, so if you would like a form tailored to your specific use case, either schedule a demo at https://www.featherdata.io/ or email me at awsjonathan99@gmail.com
Import the extract_text_from_document
Module
Import the extract_text_from_document
module, which allows you to extract the handwritten text from the inputted document. For our example, we will be using the pre-filled form seen above.
from feather_extract import extract_text_from_document
extraced_text = extract_text_from_document(filled_out_form)
You will be prompted to enter your Azure API key and endpoint. In future versions of the package, we will be removing this aspect, but in its current form, Azure Form Recognizer is the best we can do for the cost of running the software. You can sign up for a free account here.
How to Find Your API Key and Endpoint
Note that these are example keys for the tutorial and will not work if you use them. Getting your own keys is free.
- Search for “document intelligence (form recognizer)” on Azure and create an instance.
- Open your document intelligence instance (named
feather-example
) here. - Click “Manage keys” and view your keys and endpoints. This is what you’ll enter.
Import the format_extracted_text
Module
Import the format_extracted_text
module, which allows you to format the data that you extracted from your document.
from feather_extract import format_extracted_text
formatted_text = format_extracted_text(formatted_text)
Import the save_to_excel
Module
Import the save_to_excel
module, and save your formatted data to an Excel file. save_to_excel
takes two arguments: (formatted_text, name_of_output_file)
. Your output will be saved locally to your machine.
from feather_extract import save_to_excel
save_to_excel(formatted_text, "example.xlsx")
Original:
Output:
All together:
from feather_extract import *
form = get_form(filled_out=True)
extracted_text = extract_text_from_document(form)
formatted_text = format_extracted_text(extracted_text)
save_to_excel(formatted_text, "your-workbook-name.xlsx")
Hope you enjoyed! Contact me at awsjonathan99@gmail.com with any questions.