From Scribbles to Summaries: Enhancing OCR Models with GPT-Edit for Handwritten Notes

Brinnae Bent, PhD
Edge Analytics
Published in
6 min readMar 8, 2023
Created with DALL-E 2 by the author

*** EDIT: As of 3/23/23, OpenAI has discontinued GPT Edit. You can use GPT-4 to format and summarize notes in one go with the following prompt: “You are a note formatter and summarizer. You format notes as Markdown and fix grammar and spelling. Then you append a 3 bullet summary to the end of the note.” ***

In the age of digitization, many still prefer taking notes by hand rather than typing. This is great until a co-worker asks you to send your notes from last week’s meeting, but the notes only exist on paper. Thumbing back through your notebook to transcribe (and reformat) those notes to a computer is a painful process. Advances in AI can help with this.

In this blog, we’ll show our solution for automatically converting handwritten notes to a usable, reformatted text summary using a common OCR-based handwriting-to-text application and GPT-Edit.

What is GPT-Edit?

GPT-Edit is one of the newest models from OpenAI. Currently in beta release, GPT-Edit is a powerful tool for editing text, fixing spelling errors and grammatical mistakes, and reformatting text. The model takes to-be-edited text and an instruction prompt as inputs and outputs the edited text body.

Example instructions for GPT-Edit include:

  • “Fix the spelling mistakes”
  • “Format like a letter”
  • “Change text into first person”

You can learn more about the GPT-Edit API here, or you can test out GPT-Edit in the OpenAI Playground here.

The challenge with optical character recognition models

Optical character recognition (OCR) models are a model class that predicts characters, i.e. letters and numbers, from images. If you are a data scientist or an ML engineer, chances are one of the first deep learning models you built was an OCR model using the MNIST handwritten digits dataset. It’s a favorite prompt for introductory courses and tutorials. They are widely used in applications ranging from data entry automation, verifying and processing documents, and recognition of labels/number plates.

We’ll show an example of OCR used to convert images of handwritten notes to machine-readable text and highlight areas for improvement. While the MNIST dataset is a fairly simple and highly curated setting for developing toy models, real world applications tend to be more difficult. For example, the position of characters provides important context for how the machine text should be organized. Many commercial applications exist for handwritten text transformation, including Microsoft OneNote, PDFElement, and Evernote. Applications for the iPhone and iPad that convert handwriting to text include GoodNotes 5, Notability, Notes Plus, Pen to Print, Text Scanner and WritePad for iPad in addition to the native iOS Notes application. For Android users handwriting conversion apps include Adobe Scan, CamScanner, Google Keep, Readiris and Smart Lens. While these technologies are widely available and commonly used, we have found that the output text is often poorly formatted and riddled with grammatical errors, adding friction to note-taking and review.

In the figure below, we show an example handwritten note on the left. On the right, we show the transcription from the native Notes app on iOS. Observe the grammar and formatting problems in the transcription.

(Left) Handwritten note; (Right) Corresponding machine-readable text

Using GPT-Edit to improve the quality of transcribed text

We hypothesized that results from OCR models could be improved using GPT-Edit. To test this hypothesis, we fed the transcribed text into GPT-Edit (text-davinci-edit-001).

We added line breaks to the prompt to tell GPT-Edit where the line breaks occurred in the output of the OCR model (note: these are not the line breaks seen in our original handwritten text).

Here is the edited response from GPT-Edit:

Raw output from GPT-Edit model
Markdown output from GPT-Edit model

The result from GPT-Edit looks much more similar to the original handwritten note than the output from the OCR model alone. The grammar and spelling is improved; line breaks are true to the original text; and the Markdown formatting makes the note more readable. This note could be more easily shared and understood than the output from the OCR model alone.

Challenges with using GPT-Edit

The results from GPT-Edit are highly stochastic. We have found that the model requires prompt engineering and experimentation to produce more repeatable results. Experimenting with the temperature and top_p GPT variables are also ways to control randomness and diversity, respectively. You can add these to your prompt to the API like this:

Fine tuning the model is potentially another way to generate more repeatable results for particular use cases. For example, the model could be tuned for a specific type of formatting like Markdown or formatting notes to your personalized note-taking style. While fine tuning is not currently supported by the GPT-Edit API like it is for the GPT base models, we are eagerly anticipating the release of fine-tuning for GPT-Edit.

One step further — note summarization using GPT-3

Can we use the output of OCR + GPT-Edit as the input in GPT-3 (text-davinci-003) in order to summarize our note? Let’s try it!

Summary from GPT-3 using GPT-Edit output as the input along with the prompt “summarize the following note”:

At the meeting on 1/30/23, Bob proposed a 3 point plan to build cool stuff, work with awesome people, and eat lots of pizza. Bob was assigned tasks to write a blog, edit a blog, and order a pizza. His favorite pizza toppings are pineapple, ham, and mushrooms.

Conclusions

We hypothesized that we could use the brand new GPT-Edit model from Open AI to improve output from OCR models on the task of transcribing handwritten notes to text. In this blog, we demonstrated this, using a commonly used OCR-based handwriting-to-text application and the GPT-Edit model to automatically convert a handwritten note to a useable, reformatted note that can be shared easily with others. Additionally, we were able to summarize the note with GPT-3, taking “scribbles” to summaries!

What use case do you have for GPT-Edit?

We want to hear from you! How do you plan to use GPT-Edit or other large language models? Leave us a comment or send us an email at info@edgeanalytics.io. While you’re here, check out our blog series on Getting the Most Out of GPT-3-based Text Classifiers: Part 1, Part 2, Part 3.

GPT-3 at Edge Analytics

Edge Analytics has helped multiple companies build solutions that leverage GPT-3. More broadly, we specialize in data science, machine learning, and algorithm development both on the edge and in the cloud. We provide end-to-end support throughout a product’s lifecycle, from quick exploratory prototypes to production-level AI/ML algorithms. We partner with our clients, who range from Fortune 500 companies to innovative startups, to turn their ideas into reality. Have a hard problem in mind? Get in touch at info@edgeanalytics.io.

--

--