OCR project [part1] simple introduction

Arthur Tkachenko
groceristar
Published in
2 min readJan 2, 2019
Credit: BodyBuilding.com

With this series of articles i want to clarify an idea of converting images into data that can be stored in our database.

Groceristar is a project that we’re building. Briefly, it’s a shopping list(grocery list management) app.
In the beginning, we have only one grocery list at our database — Ultimate Grocery List template.

Ultimatest Grocery List, author Bill Keage

it’s a huge template as you can see. it has a big amount of data(this is why it’s called an Ultimate).
But for most of our daily cases — we don’t need all items from this list. While we have only one template in — users spend a lot of time.
They need to delete a lot of unnecessary items from their personal lists.

It is a good feature to have more templates that people can use at GroceriStar project.

Again, you can find a lot of grocery lists online.

We plan to add more specific templates oriented for different cases or categories. Grocery lists can be specific to vegan food, healthy habits, diabetics food — and it will be helpful to our users.

For the moment of publishing this article, we have 8–15 grocery list templates in our database. We converted data to JSON format and using this template at our projects.

When I made my research, I find at least 200 different grocery lists. I save URLs for some of them.
It’ll be a very dumb idea to copy-paste this data into our database. It can be better done with Machine Learning algorithms.
So this is the main idea behind creating our image parsing script.

Basic flow that we have

To continue, read the second part of the series:

Third article:

Additional articles, related to machine learning

--

--