Machine Learning @ DKatalis: Generating Synthetic Data with Photoshop and Python For Great Good!

Benjamin Tan Wei Hao
DKatalis

--

Machine learning is an expensive affair. Training models costs money, and even more when GPUs are involved. However, as most companies would delve into any non-trivial machine learning would find out, it is the data that accounts for a large part of the costs.

In this post, I detail how I used Photoshop and Python to generate thousands (10k+) of synthetic Indonesia identification cards to train an OCR segmentation and recognizer deep learning model.

The Problem Domain

This is an example of an Indonesian identification card, also known as Kartu Tanda Penduduk, or KTP for short.

Source: https://m.suarasindo.com/read-2236-2019-07-28-masyarakat-jangan-sembarang-unggah-data-ktpel-dan-kk-di-internet.html

As part of the on-boarding process for the Jago Bank application, we want to allow users to upload pictures of their KTP, and have the important information immediately pre-populated in the next step, such as NIK (identification number), the name, and address.

Now, the leap from random KTP pictures to pre-populated fields requires significant data science and engineering effort, but that’s for another blog post. In this post, I…

--

--

DKatalis
DKatalis

Published in DKatalis

DKatalis is a highly adaptive tech company, driven to solve problems through tech and data.

Benjamin Tan Wei Hao
Benjamin Tan Wei Hao

Written by Benjamin Tan Wei Hao

Author of The Little Elixir & OTP Guidebook, Mastering Ruby Closures, Building an ML Pipeline in Kubeflow. | Currently: Product Owner at @dkatalis.