Reading table content from an image with a local OCR solution

Mika Hänninen
6 min readMar 26, 2024

--

This article discusses the challenge of automating desktop applications, particularly when conventional methods such as Windows and Java accessibility interfaces, along with keyboard and mouse actions, are insufficient. This often happens because certain application components, like tables, may use custom class interfaces that lack the necessary accessibility features, rendering them inaccessible for standard automation approaches.

A common workaround for this issue is using image-based text recognition. However, this method is not ideal as it is prone to errors, such as misreading characters, leading to inaccuracies in the automation process.

The focus of this article is on a method to extract table-like information from images, specifically targeting tables with headers. This approach is sensitive to privacy and security concerns, hence it opts for a local processing method instead of relying on cloud-based solutions like Base64.ai. It is possible to make secure deployments of the aforementioned services, but that approach has been excluded from the purpose of this article.

The presented solution utilizes Tesseract, an Optical Character Recognition (OCR) engine, and leverages Robocorp’s open-source tools for setting up the project environment, offering a more secure and error-resistant alternative to traditional automation methods. Robocorp is part of Sema4.ai.

Overall description of the technique

  • target image should be only about the table, ie. not the image of the whole application window and not the screenshot of the desktop
  • image is preprocessed by enlarging the image, adjusting its brightness, and contrast, and converting the image into black and white (all pixels are converted into either black or white by a threshold)
  • preprocessed image is input for a pytesseract.image_to_data() call along with OCR configuration
  • resulting texts are further placed into a list of dictionaries containing the text and its coordinates, and texts are combined by their top coordinates to form rows of text
  • the target table is split into columns by its headers by providing coordinate information (manual step)
  • the header row is recognized by giving some of the texts appearing in the header
  • precise cell coordinates are calculated with the previously given column to coordinate information and coordinates on the found header row
  • all found text is iterated row by row and if their text area falls into a calculated column area then they are appended into a resulting table row
  • the ocr_table() returns the table which can be written into a JSON file
  • optionally post process results, for example. string and/or number formatting, translations, etc.

example code for calling ocr_table (below)

table_conf = TableConfiguration()
table_conf.headers = ["Date", "Payee"]
table_conf.set_column("Date", left=-2, width=80)
table_conf.set_column("Payee", left=-5, width=350)
table_conf.set_column("Category", left=-5, width=350)
table_conf.set_column("Amount", left=-10, width=90)
table_conf.set_column("Balance", left=-5, width=70)

table_conf.set_margins(bottom=-10, top=30)


table = ocr_table(
configuration=table_conf,
image_in="images/table.png",
result_json="output/result_table.json",
)

Detailed description of different steps

I have chosen the Quicken application and its table of checking account events for demonstration purposes.

The necessary dependencies and tools are installed using rcc and here is what my task environment is like (defined in theconda.yaml ).

# For more details on the format and content:
# https://github.com/robocorp/rcc/blob/master/docs/recipes.md#what-is-in-condayaml
# Tip: Adding a link to the release notes of the packages helps maintenance and security.

channels:
- conda-forge

dependencies:
- python=3.10.12 # https://pyreadiness.org/3.10
- pip=23.2.1 # https://pip.pypa.io/en/stable/news
- robocorp-truststore=0.8.0 # https://pypi.org/project/robocorp-truststore/
- tesseract=5.3.0 # https://github.com/tesseract-ocr/tesseract
- pip:
- rpaframework==28.4.1 # https://rpaframework.org/releasenotes.html
- rpaframework-recognition==5.2.3
- robocorp==2.0.0

As a first step we need to identify the header of the table and that can be done by listing enough headers to match the unique row of text. For example in my case words “Date” and “Payee” should be enough.

table_conf = TableConfiguration()
table_conf.headers = ["Date", "Payee"]

Next, we need to define columns of the table using headers as anchors. These column definitions might need to be adjusted so that the column area covers text within a table cell (note. code can be further improved so that headers are not required for column definitions). As a general rule adjustments are done by modifying the left margin of the column and giving the width of the column area as shown below.

table_conf.set_column("Date", left=-2, width=80)
table_conf.set_column("Payee", left=-5, width=350)
table_conf.set_column("Category", left=-5, width=350)
table_conf.set_column("Amount", left=-10, width=90)
table_conf.set_column("Balance", left=-5, width=70)

In my case I need to discard the top result where there is a row for inputting new data into the table — the first real entry is the one beginning with the date 1/15/2023. This can be done by adjusting the margins of the table.

table_conf.set_margins(bottom=-10, top=30)

As a last step, we need to give the filepath to the image containing the table structure and optionally filename for the resulting JSON if we want to save it.

table = ocr_table(
configuration=table_conf,
image_in="images/table.png",
result_json="output/result_table.json",
)

For first runs on attempts to recognize table, it is useful to view the preview image after it has been preprocessed for tesseract processing and also to view the image of identified columns and texts with the table image.

table_conf.show_pre_ocr_image = True
table_conf.show_post_recognition_image = True
Preprocessed image for tesseract (preprocessed_image_for_tesseract.png)

The preprocessed image can be fine-tuned by multiple different parameters described in the configurations.py file. There is also ImagePreprocessor.py script, which runs can be used to see effects of changing brightness, contrast and pixel thresholding values on the input image. Values can be copied from this dialog to the clipboard and copied into Python tasks.py file in preformatted way.

Columns, rows and text drawn over original image (table_rows_and_columns_identified.png)

In the latter image dots denote all texts that have been recognized. Colored sections are different column sections and horizontal red line is drawn over recognized row of texts.

Resulting result_table.json below.

[
{
"Date": "1/15/2023",
"Payee": "Declutter Cleaning Inc",
"Category": "Fees & Charges:Service Fee",
"Amount": "“85 76",
"Balance": "-2 147 03",
"x": 612,
"y": 62
},
{
"Date": "1/14/2023",
"Payee": "Robocorp",
"Category": "Fees & Charges:Service Fee",
"Amount": "-49 99",
"Balance": "-2,061 of",
"x": 612,
"y": 87
},
{
"Date": "1/13/2023",
"Payee": "Baskin Robbins",
"Category": "Food & Dining:Restaurants",
"Amount": "-12 00",
"Balance": "-2,011 28",
"x": 612,
"y": 112
},
{
"Date": "1/12/2023",
"Payee": "Dahlia's Café",
"Category": "Food & Dining:Restaurants",
"Amount": "-98 45",
"Balance": "-1,999 28",
"x": 612,
"y": 137
},
{
"Date": "1/11/2023",
"Payee": "Kinkos",
"Category": "Office Supplies",
"Amount": "230 18",
"Balance": "-1,900 83",
"x": 612,
"y": 162
},
{
"Date": "1/10/2023",
"Payee": "spectrum",
"Category": "Bills & Utilities:Internet",
"Amount": "-100 43",
"Balance": "“2.131 O1",
"x": 611,
"y": 187
},
{
"Date": "1/9/2023",
"Payee": "Marriott",
"Category": "Travel:Hotel",
"Amount": "“565 22",
"Balance": "-2,030 58",
"x": 612,
"y": 212
},
{
"Date": "1/8/2023",
"Payee": "ABC Liquor",
"Category": "Food & Dining:Groceries",
"Amount": "-125 98",
"Balance": "-1,465 36",
"x": 612,
"y": 237
},
{
"Date": "1/7/2023",
"Payee": "The CPA People",
"Category": "Financial advisor",
"Amount": "-250 00",
"Balance": "-1,339 38",
"x": 612,
"y": 262
},
{
"Date": "1/6/2023",
"Payee": "AMC Movie",
"Category": "Entertainment",
"Amount": "-30 00",
"Balance": "-1,089 38",
"x": 612,
"y": 287
},
{
"Date": "1/5/2023",
"Payee": "Bank Of America",
"Category": "bank fee",
"Amount": "-15 23",
"Balance": "-1,059 38",
"x": 612,
"y": 312
},
{
"Date": "1/4/2023",
"Payee": "Bank Of America",
"Category": "bank fee",
"Amount": "-3 54",
"Balance": "-1,044 15",
"x": 612,
"y": 337
},
{
"Date": "1/3/2023",
"Payee": "American Airlines",
"Category": "Air Travel",
"Amount": "“540 23",
"Balance": "-1,040 61",
"x": 611,
"y": 362
},
{
"Date": "1/2/2023",
"Payee": "Advertisers R Us",
"Category": "Advertising",
"Amount": "-500 35",
"Balance": "-500 35",
"x": 612,
"y": 387
}
]

As can be seen in the result the values in columns “Amount” and “Balance” require still more work. This can be done by applying number formatting into the values and/or further OCR’ing the column images — for the purpose of completing this article this post processing has been left out.

The current solution support taking screenshot’s of specific columns by adding following table configuration:

table_conf.column_to_crop.append("Amount")

which results in the following image

column_amount.png

I hope that you can get some inspiration or even direct source for your text recognition task through this article. The method will be improved over the time.

The code and data are available in a GitHub repo where you can download them.

--

--

Mika Hänninen

Supporting people to fulfill their robotic dreams. Provider of solutions mostly in Python and Robot Framework syntax. Customer success engineer at Robocorp Inc.