Android Development

Building a Simple Document Scanner in Android

Learning by Developing

Shubham Panchal

Published in

Geek Culture

10 min readNov 14, 2022

I’ve been engrossed in machine learning for a while, training models, cleaning and processing data and deploying them. I thought it would be great to have a new project that would polish my development skills in Android. As I’ve been trying my hands on OpenCV for a while, I decided to make a document scanner in Android, knowing the fact that there are a number of libraries to do it better. Here’s an overview of my project and the tech-stack that was used for building the app.

The App Overview
Developing the Document Extraction Algorithm with OpenCV Python
Developing the Server for Document Extraction with FastAPI in Python
Creating a Streamlit web app for Document Extraction and a Docker image
Implementing the Document Extraction Algorithm in Kotlin with OpenCV Android (on-device solution)
Calling the API from the Android app (API-based solution)
Implementing GitHub Actions for building releases of both apps

1. The App Overview

So, we’ll be developing an app that could extract a document (or crop a document) given a picture from the user. In order to extract the document from the image, we’ll be using an image processing pipeline or algorithm that would be implemented in OpenCV.

The image processing would occur on-device using OpenCV’s Android SDK and also, we’ll have a server running to whom the app will POST images and fetch the coordinates of the document in the image. So, we’d have two options for image document extraction — on-device and API-based solutions

2. Developing the Document Extraction Algorithm — Python

OpenCV is a cross-platform image processing library written in C++. Instead of using fancy ML algorithms that are heavy (both in terms of memory and size), we’ll use techniques that manipulate various parameters of an image to extract the desired information.

OpenCV: OpenCV-Python Tutorials

Edit description

docs.opencv.org

The algorithm that we’ll develop would be used in the server (for API-based solution) as well as in the Android app (for on-device solution). As far as I know, using OpenCV in Python is the easiest, so we’ll develop the prototype of our algorithm in Python.

import numpy as np
import cv2

# Returns the coordinates ( x , y , w , h ) for the bounding box of the document in the given
# image. This image processing algorithm has a number of limitations like:
# 1. It requires high contrast between the document and the background to detect the edges
# 2. The document should be contained within the image only ( the four vertices of the document must be present
#    in the input image )
def get_rect( img ):

    # Colorspace conversion - For our use-case, we don't require any color information as such ( this leads us to
    # first limitation though ). It is also necessary as we're performing contour detection in further steps.
    img = cv2.cvtColor( img , cv2.COLOR_BGR2GRAY)

    # Thresholding with Otsu's Algorithm: Maximizes inter-class variance. Basically, it best splits the image into
    # foreground and background
    _ , img = cv2.threshold( img , 0 , 255 , cv2.THRESH_BINARY + cv2.THRESH_OTSU )

    # Performing morphological operations on the image: These operations work on the boundaries of the object (
    # or its shape/morphology )
    # First, we perform morphological closing to fill small holes produced after thresholding.
    # The image is first dilated and then eroded to perform this operation.
    kernel = np.ones((5, 5), np.uint8)
    img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

    # We perform additional erosion to remove finer details.
    # By performing all these morphological operations, we wish to preserve only the overall structure of the
    # document ( the rectangular structure of the document )
    # Also, we eliminate all inner details present inside the document, like text/images written in the document
    kernel = np.ones((11, 11), np.uint8)
    img = cv2.erode(img, kernel, iterations=1)

    # Image cleaning is done, we now perform Canny edge detection
    img = cv2.Canny(img, 75, 200)

    # Find contours within the edges
    contours, _ = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    # Find the contour with the largest arc length
    contour_perimeters = [cv2.arcLength(contour, True) for contour in contours]
    doc_contour = contours[np.argmax(contour_perimeters)]

    assert doc_contour is not None

    # Compute a rect with the largest contour
    x, y, w, h = cv2.boundingRect(doc_contour)
    return x , y , w , h

# Binarize the image to give it a 'scanned' effect
def get_binarized_img( img ):
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    return cv2.adaptiveThreshold( img , 255 ,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 35 , 10 )

The algorithm here is manipulating the image with a sequence of fixed steps to extract the boundaries of the document. The get_binarized_img method returns a grayscale + high contrast(ed) image to give the document a typically ‘scanner’-like effect.

3. Developing the Server for Document Extraction with FastAPI in Python

For the API-based solution that our app needs to have, we’ll need a server which can be called by the app with the image, to return the coordinates of the extracted document. In order to develop such a server, we use FastAPI, a framework written in Python that can be used to develop APIs.

import cv2
import numpy as np
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import Response
from document import get_rect, get_binarized_img

app = FastAPI()

# POST method to get the rect of the cropped document
# It requires an `image` in the body of the request
# FastAPI docs : 1. https://fastapi.tiangolo.com/tutorial/body
#                2. https://fastapi.tiangolo.com/tutorial/request-files
@app.post( "/get_rect" )
async def show_image( image : UploadFile = File() ):
    contents = await image.read()
    # Converting the `contents` bytes to an OpenCV Mat
    # Refer this SO answer -> https://stackoverflow.com/a/61345230/13546426
    img = cv2.imdecode( np.fromstring( contents, np.uint8 ), cv2.IMREAD_COLOR)
    rect = get_rect( img )
    return rect

# POST method to binarize the image to give it a
# 'scanned' effect
@app.post( "/binarize" )
async def binarize( image : UploadFile = File() ):
    contents = await image.read()
    img = cv2.imdecode(np.fromstring(contents, np.uint8), cv2.IMREAD_COLOR)
    img = get_binarized_img( img )
    img_bytes = cv2.imencode('.png', img )[1].tobytes()
    return Response( img_bytes , media_type='image/png' )

The coordinates of the box that encloses the document in the given (POSTed) image are in the format x, y, width (w) and height (h). We can try the API in Postman hosting it on localhost:8000

Fig 1. The left window shows the image that was POSTed to the server. The response, as shown above, contains the coordinates in the form (x , y , w , h)

You can temporarily deploy or tunnel the port on which the server is hosted by using ngrok.

4. Creating a Streamlit web app for Document Extraction and a Docker image

In order to test the document extract algorithm ourselves in Python, we create a simple web-based GUI with Streamlit that calls the get_rect method. The coordinates of the bounding box returned in such a manner.

from document import get_rect

# Streamlit application to test the document scanning algorithm
st.title( '📄 Document Scanning with OpenCV' )
st.markdown(
"""
Upload an image containing a document. Make sure that,
1. There's good contrast between the background and the document.
2. The entire document is contained within the image. Meaning, all corners of the document should be visible in the image
"""
)

file = st.file_uploader( 'Upload an image 📝...' )
if file is not None:
    img = Image.open( io.BytesIO( file.getvalue() ) )
    img = np.asarray( img )
    x , y , w , h = get_rect( img )
    cv2.rectangle( img , pt1=( x , y ), pt2=( x + w , y + h ), color=(255,0,0), thickness=20)
    st.image( img )

On running the app with streamlit run app.py , a beautiful yet simple web app opens up on the localhost ,

Fig 2. Demo video of the Streamlit web app. Video by the Author

5. Implementing the Document Extraction Algorithm in Kotlin with OpenCV Android (on-device solution)

We now move on to Kotlin code from Python. The image processing pipeline we developed in Python with OpenCV will be mirrored as it is in Kotlin using OpenCV’s Android SDK.

class CoreAlgorithm( private var openCVResultCallback: OpenCVResultCallback ) {

    interface OpenCVResultCallback {
        fun onDocumentRectResult( rect : Rect )
        fun onBinarizeDocResult( binImage : Bitmap )
    }

    fun getDocumentRect( image: Bitmap ) {
        var x = Mat()
        Utils.bitmapToMat(image, x)
        x = convertColor(x)
        x = adaptiveThreshold(x , 25 , 5.0 )
        x = morphClose(x)
        x = erode(x)
        x = cannyEdgeDetection( x )
        val contours = contours(x)
        openCVResultCallback.onDocumentRectResult( getRectFromContour(findContourWithLargestPerimeter(contours)) )
    }

    fun binarize( image : Bitmap ) {
        var x = Mat()
        Utils.bitmapToMat( image , x )
        x = convertColorBGR( x )
        x = adaptiveThreshold( x , 35 , 10.0 )
        val output = Bitmap.createBitmap( image.width , image.height , Bitmap.Config.RGB_565 )
        Utils.matToBitmap( x , output )
        openCVResultCallback.onBinarizeDocResult( output )
    }

    private fun convertColor(mat: Mat): Mat {
        return Mat().apply{ Imgproc.cvtColor( mat, this , Imgproc.COLOR_RGB2GRAY ) }
    }

    private fun convertColorBGR( mat : Mat ) : Mat {
        return Mat().apply{ Imgproc.cvtColor( mat, this , Imgproc.COLOR_BGR2GRAY ) }
    }

    private fun gaussianBlur( mat : Mat ) : Mat {
        return Mat().apply{ Imgproc.GaussianBlur( mat , this , Size( 9.0 , 9.0 ) , 0.0 )}
    }

    private fun cannyEdgeDetection( mat : Mat ) : Mat {
        return Mat().apply{ Imgproc.Canny( mat , this , 75.0 , 200.0 ) }
    }

    private fun adaptiveThreshold( mat : Mat , blockSize : Int , c : Double ) : Mat {
        return Mat().apply {
            Imgproc.adaptiveThreshold(
                mat , this ,
                255.0 ,
                Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C ,
                Imgproc.THRESH_BINARY ,
                blockSize ,
                c
            )
        }
    }

    private fun erode( mat : Mat ) : Mat {
        val kernel = Imgproc.getStructuringElement( Imgproc.MORPH_RECT , Size( 11.0 , 11.0 ) )
        return Mat().apply {
            Imgproc.erode(
                mat , this ,
                kernel
            )
        }
    }

    private fun morphClose( mat : Mat ) : Mat {
        val kernel = Imgproc.getStructuringElement( Imgproc.MORPH_RECT , Size( 5.0 , 5.0 ) )
        return Mat().apply {
            Imgproc.morphologyEx(
                mat , this ,
                Imgproc.MORPH_CLOSE ,
                kernel
            )
        }
    }

    private fun contours( mat : Mat ) : List<MatOfPoint> {
        val contours : List<MatOfPoint> = ArrayList()
        val hierarchy = Mat()
        Imgproc.findContours( mat , contours , hierarchy , Imgproc.RETR_TREE , Imgproc.CHAIN_APPROX_SIMPLE )
        return contours
    }

    private fun findContourWithLargestPerimeter( contours : List<MatOfPoint> ) : MatOfPoint {
        val arcLengths = contours.map{
            // Refer to this SO answer ->
            // https://stackoverflow.com/questions/11273588/how-to-convert-matofpoint-to-matofpoint2f-in-opencv-java-api
            val contour = MatOfPoint2f( *(it.toArray()) )
            Imgproc.arcLength( contour , true )
        }
        return contours[ arcLengths.indexOf( arcLengths.maxOrNull()!! ) ]
    }

    private fun getRectFromContour( contour : MatOfPoint ) : Rect {
        val rect = Imgproc.boundingRect( contour )
        return Rect( rect.x , rect.y , rect.x + rect.width , rect.y + rect.height )
    }

}

6. Calling the API from the Android app (API-based solution)

As we’ve developed the API in Python, we’ll send requests from an Android client with OkHttp. We do have options like Ktor, Volley and Retrofit for sending and receiving data from a server.

// Given the bitmap, send the request to the API
private fun createRequest(image : Bitmap, url : String, callback: Callback , processImage : Boolean = false ) {
    // Create a temporary file and write the processed Bitmap to it.
    // Note, the image is scaled down and then sent to the API. See `@processImage` method.
    tempImageFile = File.createTempFile( "image" , "png" )
    FileOutputStream( tempImageFile ).run{
        val resizedImage = if ( processImage ) {
            processImage( image )
        }
        else {
            image
        }
        resizedImage.compress( Bitmap.CompressFormat.PNG , 100 , this )
    }
    sendRequest( tempImageFile , url , callback )
}

// Sends a POST request to the server with OkHttpClient
// Refer to this SO thread -> https://stackoverflow.com/questions/23512547/how-to-use-okhttp-to-upload-a-file
private fun sendRequest( imageFile : File , url : String , callback: Callback ) {
    val requestBody = MultipartBody.Builder().run{
        setType( MultipartBody.FORM )
        addFormDataPart(
            API_INPUT_IMAGE_KEY ,
            imageFile.name ,
            imageFile.asRequestBody( "image/png".toMediaTypeOrNull()!! ) )
        build()
    }
    val request = Request.Builder().run{
        url( url )
        post( requestBody )
        build()
    }
    client.newCall( request ).enqueue( callback )
}

Although it was not possible to share the entire code of both the apps, but I’ve including the important and the interesting parts of the codebase. There’s still a lot of code in drawing the box over the screen with an TextureView and selecting files from the user. Here’s a demo of the final app,

Fig 3. A demo for Simple Document Scanner. Image by the Author

7. Implementing GitHub Actions for building releases of both apps

Once we’ve developed the apps, we would like to build APKs for both the solutions with GitHub Actions. Our workflow would perform the following steps to build an APK with Gradle and create a release on the GitHub repository.

Fig 4. Steps involved in build_apk workflow.

name: Build Android APK

on:
  push:
    branches:
      - main
      - on_device_scanning_app
  pull_request:
    branches:
      - main
      - on_device_scanning_app

jobs:
  build_apk:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up JDK 11
      uses: actions/setup-java@v3
      with:
        java-version: '11'
        distribution: 'temurin'
        cache: gradle
    - name: Grant execute permission for gradlew
      run: chmod +x gradlew
    - name: Build with Gradle
      run: ./gradlew build
    - name: Build debug APK
      run: bash ./gradlew assembleDebug --stacktrace
    - name: Create a release
      uses: actions/create-release@v1
      id: create_release
      with:
        draft: false
        prerelease: false
        release_name: Android App - API-based solution
        tag_name: v1.0.0
        body_path: CHANGELOG.md
      env:
        GITHUB_TOKEN: ${{ github.token }}
    - name: Upload APK to release
      uses: actions/upload-release-asset@v1
      env:
        GITHUB_TOKEN: ${{ github.token }}
      with:
        upload_url: ${{ steps.create_release.outputs.upload_url }}
        asset_path: app/build/outputs/apk/debug/app-debug.apk
        asset_name: app-debug.apk
        asset_content_type: application/vnd.android.package-archive
    - name: Upload APK
      uses: actions/upload-artifact@v1
      with:
        name: app
        path: app/build/outputs/apk/debug/app-debug.apk

And that brings us to the end

Learning new technologies and frameworks by developing apps is the best part of programming I’ve experienced. This also results in an excellent learning project for other beginners. Make sure you star the GitHub repo and share the open-source love!

Feel free to share your suggestions and problems on the GitHub repo or in the comments here. Have a nice day ahead!