Building an ID Document Recognition App with the FaceOnLive SDK in Android

Extract details, scan QR/bar codes of IDs from 200+ countries, completely on-device

Published in

FaceOnLive

8 min readJust now

Scanning documents and extracting text from them can be a crucial operation in certain software systems. For OCR, there exist several solutions like Tesseract, Text Recognition V2 API from MLKit, EasyOCR, Amazon Textract etc. Once text detection is performed, the next task is to determine which text corresponds to which entity present in the document, for instance determining the name, date of birth or nationality. The problem becomes more difficult when the system under consideration allows users to submit multiple types of documents belonging to different countries.

FaceOnLive’s ID Recognition SDK provides a unifed API to perform these tasks, providing details from a document as JSON by just providing it the image of the document. Moreover, it supports a large number of documents belonging to 200+ countries. I got a chance to try FaceOnLive’s Face and Liveness detection Android SDK and I would like to share my experience with the SDK and its integration in an existing Android app.

GitHub - FaceOnLive/ID-Card-Passport-Recognition-SDK-Android: On-Device ID Card & Passport & Driver…

On-Device ID Card & Passport & Driver License Recognition SDK for Android …

github.com

A demo of the ID Document Recognition SDK on a U. S. driving license

How does ID recognition work and where do we use it?

ID recognition and verification can be used to extract personal details from an ID card without the user explicitly providing them for any online service. The document can also be verified for its authenticity without involving any additional human supervisors for the same.

Customer onboarding and eKYC

Organizations that provide services directly to end-users may require government-issued IDs to authenticate the identity of the person. These IDs are also important part of the eKYC (Know Your Customer) process that institutions need to follow to prepare a detailed record of the customer in their own databases or for the government.

Most organizations rely on their workforce to verify the ID cards and enter the information by reading the ID cards and entering them into their software systems. Some organizations may also prompt the user to enter their ID card details manually, which may affect user experience if the number of personal details required are more. Manual entry of these details are increases the chances of human errors or forgery.

Automated ID scanning can solve these issues to a great extent and ensure that on-boarding processing are fast and efficient both, for customers and organizations.

Real-world Applications

Hotels, car rentals and healthcare providers can quickly scan details from documents like the driving license, insurance membership card or passports
Age-gated websites or entertainment venues can verify government-issued IDs within seconds and check the age of the individual by reading the date of birth field and determine if the access has to be granted.
Financial institutions like banks and loan providers can decrease the time required for customer onboarding by scanning documents faster and without needing physical copies. This enables services to executed completely online.

How does ID recognition work?

ID recognition utilizes OCR (optical character recognition) and machine learning to read text from the given document images and categorize it into the required fields (like name, date of birth, sex etc.)

OCR is a field of computer science that comprises of techniques which detect text and its position from an image source. The source can be a static image or a sequence of frames from a live-camera feed. It involves the following steps:

Preprocessing: Images from the real-world are noisy, which makes text detection difficult for a computer program. Several image processing techniques are applied on the image to remove noise and other anomalies from it.
Segmentation: From the pre-processing image, the step is to determine regions in the image that contain text. For instance, a driving may contain 5/6 text blobs/regions that indicate the driver’s name, date of birth etc. For each blob, the program also needs to determine character boundaries that define regions in the image where a single character is present.
Classification: For each text blob detected in step (2), we now perform character recognition. Each region, which is a sub-part of the image, is compared against an existing database of characters. Once the characters are determined, we form a string of those characters that represent the text present in each blob.
Post-processing: The detected text along the position (x, y coordinates) of the text is encapsulated in JSON or as object and provided as an output to the user program.

Features of the SDK

The SDK can detect IDs ranging from driving licenses to passports belonging to 200+ countries, running completely on-device (no internet/server connection needed).
For each document detected, the SDK extracts a set of details that are extracted from the document, for instance, the picture of the person, their name, date of birth and other such details.

Android Setup

In this section, we will discuss the steps needed to integrate FaceOnLive ID recognition in an Android app.

1. Getting Started with the SDK

Once we’ve received the API key, we can initialize the SDK with the IDSDK.setActivation method by passing the API key as a String . The method returns a result code which can be checked to ensure that the SDK was activated successfully.

class MainActivity : AppCompatActivity() {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        var ret = IDSDK.setActivation(
            "SZ1cK2k0hyNYEzIeP0Sm+9p3Cf4QC1uKop+1N5Iv4K5jYZqFX0TAx1Brvs/tGg9bvvsXiaZmgKNX\n" +
                    "ESO3KFIn1bEmjIcAfahk5r7yh4nHoxX1q/EJ9/QxRsCjyqLsCJjI4Ri/eLk7owSY305ilwIY//Ae\n" +
                    "O+Q6PaDsVWVJdmKoDo0bJkAkVoMSLzN3fJqLp7EVGmMSrFm33gxZ8Ajxtz7yIL/q1sPol9OwLhkZ\n" +
                    "X8Enfutqx8SUHrF2c7pFqoaOIfolGgjp6Ra9/rk5SvWuLBQoPufETwrDKEBcvuxfxgQcGnMdPphM\n" +
                    "hEDaXMr0rB/O48FC1jEn/dEipArR0U+CC0K4yg=="
        }
        // ret represents the result of the initialization
        if(ret  == IDSDK.SDK_SUCCESS) { 
            IDSDK.init(this)
        }
        else {
            var message = "SDK initialization failed"
            when (ret) {
                IDSDK.SDK_LICENSE_KEY_ERROR -> {
                    message = "License key error!"
                }
                IDSDK.SDK_LICENSE_APPID_ERROR -> {
                    message = "App ID error!"
                }
                IDSDK.SDK_LICENSE_EXPIRED -> {
                    message = "License key expired!"
                }
                IDSDK.SDK_NO_ACTIVATED -> {
                    message = "Activation failed!"
                }
                IDSDK.SDK_INIT_ERROR -> {
                    message = "Engine init error!"
                }
            }
            // Show an alert dialog to the user
            // informing about the error
            showAlertDialog(message)
        }
        // other MainActivity code ...
    }
}

The SDK is now ready to be used for ID recognition. In the next step, we’ll have a look on how to detect IDs from the camera feed and from an image selected by the user from their device (photo-gallery)

2. 1. Recognizing ID cards from Camera-feed

The FaceOnLive ID Recognition SDK can detect ID cards from a real-time camera feed, and provide the bounding box coordinates of the ID that define the region where ID card is placed in the camera frame. If we use CameraX to setup an ImageAnalysis.Analyzer listener which is called for each frame of the camera feed, we get an object of android.media.Image representing the frame. We can read an RGB bitmap from the Image and pass it to the SDK, with IDSDK.idcardRecognition(frameBitmap) .

// FrameAnalyzer is attached to the camera-feed with
// the CameraX.bind method
// Use setOutputImageFormat(ImageAnalysis.OUTPUT_IMAGE_FORMAT_RGBA_8888)
// while building the frameAnalyzer
class FrameAnalyser : ImageAnalysis.Analyzer {

    @SuppressLint("UnsafeOptInUsageError")
    override fun analyze(image: ImageProxy) {

        // Transform android.media.Image to a Bitmap
        val frameBitmap =
                Bitmap.createBitmap(
                    image.image!!.width,
                    image.image!!.height,
                    Bitmap.Config.ARGB_8888
                )
        frameBitmap.copyPixelsFromBuffer(image. Planes[0].buffer)

        // Pass the frameBitmap to the SDK for ID recognition
        // Result is JSON string containing all information
        // about the document
        val result = IDSDK.idcardRecognition(bitmap)
        processResult(result)
    }
}

The SDK returns a JSON String containing all necessary information about the document. The results are parsed in the processResult function, which will be used in a later section.

2. 2. Recognizing ID cards from user-selected images

To select an image from the user’s gallery, we can use the PickVisualMedia contract which opens a photo-picker on the device. The selected image’s Uri is returned to the application, which is read to a Bitmap with BitmapFactory.decodeStream . The Bitmap is then passed to the IDSDK.idcardRecognition method and the results are returned as a JSON String similar to step (2.1).

val pickMediaLauncher =
    rememberLauncherForActivityResult(
        contract = ActivityResultContracts.PickVisualMedia()
    ) {
        if (it != null) {
            val bitmap = BitmapFactory.decodeStream(contentResolver.openInputStream(it))
            // Pass the select image to the SDK for ID recognition
            // Result is JSON string containing all information
            // about the document
            val result = IDSDK.idcardRecognition(bitmap)
            processResult(result)
        }
    }
pickMediaLauncher.launch(
    PickVisualMediaRequest(ActivityResultContracts.PickVisualMedia.ImageOnly)
)

Parsing the results

We define a new method processResult(result) which parses the JSON String . We can first check if the document has been identified with the Quality of the prediction and if it contains the MRZ code. The SDK also returns Unknown for Document Name if the document was not identified.

fun processResult(result: String) {
    if (result == null) {
        return
    }
    val jsonResult = JSONObject(result)

    // Get the position of the bounding box
    val positionObj = jsonResult["Position"] as JSONObject
    val x1 = positionObj["x1"] as Int
    val y1 = positionObj["y1"] as Int
    val x2 = positionObj["x2"] as Int
    val y2 = positionObj["y2"] as Int
    // Draw a bounding box with x1, y1, x2 and y2

    // Check if documentName is valid, and it contains a MRZ code
    // with a good quality score
    val hasMrz = jsonResult.has("MRZ")
    val documentName = jsonResult["Document Name"] as String
    if (quality > 86 && (documentName != "Unknown" || hasMrz == true)) {
        // Document recognized successfully
    } 
}

You get the images detected from the document, the Images key in the JSON string can be checked,

val imagesObj = jsonResult.get("Images") as JSONObject
val imagesKeys: MutableIterator<String> = imagesObj.keys()

while (imagesKeys.hasNext()) {
    // image is stored as a base64 string
    // decode it to a bitmap
    val imageKey = imagesKeys.next() as String
    val imageValue = imagesObj.get(imageKey).toString()
    val imageBytes = Base64.getDecoder()!!.decode(imageValue)

    try {
        val bitmap =
            BitmapFactory.decodeByteArray(imageBytes, 0, imageBytes.size)
        if (imageKey == getString(R.string.portrait)) {
            // image contains a person
        } else if (imageKey == getString(R.string.document)) {
            // image contains the document
        }
    } catch (e: Exception) {
        e.printStackTrace()
    }
}

For given image of a U. S. driving license, we get the following metrics,

{
    "Document Discriminator": "00\/00\/0000\/ANFD\/",
    "Date of Expiry": "2014-08-31",
    "Date of Birth": "1977-08-31",
    "Document Number": "I1234568",
    "Address": "2570 24TH STREET,ANYTOWN, CA 95818",
    "Address Street": "2570 24TH STREET",
    "Address City": "ANYTOWN",
    "Address Jurisdiction Code": "CA",
    "Address Postal Code": "95818",
    "Hair Color": "Brown",
    "Date of Issue": "2009-08-31",
    "Eyes Color": "Brown",
    "Weight": "57 kg",
    "Height": "165 cm",
    "Sex": "F",
    "DL Restriction Code": "NONE,VETERAN",
    "Given Names": "IMA",
    "Surname": "CARDHOLDER",
    "DL Endorsed": "NONE",
    "DL Class": "C",
    "Issuing State Code": "USA",
    "Issuing State Name": "United States",
    "Full Name": "CARDHOLDER IMA",
    "Document Name": "Driver Licence",
    "Quality": 99,
    "Position": {
        "x1": 74,
        "y1": 482,
        "x2": 645,
        "y2": 825
    },
    "Images": {
        "Portrait": "<base64-image-here>" ,
        "Document": "<base64-image-here" 
    }
}

The details extracted from a driving license document by the ID Recognition SDK

You can view a complete list of detected attributes here:

Conclusion

I hope the blog was informative, and the readers will consider using the FaceOnLive SDK for ID recognition in their Android apps. For doubts and suggestions, you can write a comment here on Medium, or connect with me directly. Have a great day ahead!