Object Detection and Tracking with Google ML Kit — Vision APIs

Published in

te<h @TDG

2 min readJul 20, 2020

image source: https://developers.google.com/ml-kit/images/vision/card-object_detection.png

บทความนี้จะมาแชร์การทำ Object Detection and Tracking แบบง่ายๆ ที่ไม่ต้องเขียนโค้ดเยอะให้ปวดหัว โดยใช้ libary ที่ชื่อว่า Google ML Kit

Google ML Kit
เป็น machine learning library ที่ถูกเขียนขึ้นมาโดย google ถูกพัฒนาต่อยอดมาจาก Mobile Vision โดยเป็น library ที่ถูกเขียนขึ้นมาเพื่อทำ machine learning บน mobile โดยเฉพาะ โดยจะประกอบไปด้วย API สองตัวคือ

Vision APIs
Natural Language APIs

ซึ่งบทความนี้จะใช้ตัว Object detection and tracking ที่อยู่ใน Vision APIs

มาเริ่มกันเถอะ

เพิ่ม dependencies ML Kit เข้าไปในโปรเจกต์

dependencies {
  // ...

  implementation 'com.google.mlkit:object-detection:16.0.0'

}

ทำการ configure ให้กับ object detector โดยจะทำได้สองแบบดังนี้

กำหนดให้มีการ detection and tracking object แบบ real-time

// Live detection and tracking
val options = ObjectDetectorOptions.Builder()
        .setDetectorMode(ObjectDetectorOptions.STREAM_MODE)
        .enableClassification()  // Optional
        .build()

กำหนดให้มีการ detection and tracking object แบบใช้ static images

// Multiple object detection in static images
val options = ObjectDetectorOptions.Builder()
        .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
        .enableMultipleObjects()
        .enableClassification()  // Optional
        .build()

เพิ่มเติม ตรง .enableClassification() ไม่จำเป็นต้องใส่ก็ได้หรือถ้าใส่มันจะเป็นการบอกประเภทของ object นั้นๆว่าเป็นประเภทอะไรเช่น อาหารหรือต้นไม้

2. ทำการ new instance class ObjectDetector และส่ง options ที่เราสร้างไว้เข้าไป

val objectDetector = ObjectDetection.getClient(options)

3. เตรียมรูปภาพสำหรับทำการประมวลผล โดยจะทำได้ 4 แบบดังนี้

ใช้รูปภาพจาก Media.Image

private class YourImageAnalyzer : ImageAnalysis.Analyzer {

    override fun analyze(imageProxy: ImageProxy) {
        val mediaImage = imageProxy.image
        if (mediaImage != null) {
            val image = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)
            // Pass image to an ML Kit Vision API
            // ...
        }
    }
}

เพิ่มเติม ถ้าใช้รูปจาก Media.Image จำเป็นจะต้องส่ง rotation degree เข้าไปด้วย หากใช้ CameraX library จะสามารถเอาค่า rotation degree จาก imageProxy.imageInfo.rotationDegrees มาใช้ได้เลย แต่ถ้าหาไม่ได้ใช้จะต้องไปคำนวนหาค่า rotation degree เอง ดูตัวอย่างโค้ดได้จากที่นี่

ใช้รูปภาพจาก URI

val image: InputImage
try {
    image = InputImage.fromFilePath(context, uri)
} catch (e: IOException) {
    e.printStackTrace()
}

ใช้รูปภาพจาก ByteBuffer หรือ ByteArray

val image = InputImage.fromByteBuffer(
        byteBuffer,
        /* image width */ 480,
        /* image height */ 360,
        rotationDegrees,
        InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12
)

ใช้รูปภาพจาก Bitmap

val image = InputImage.fromBitmap(bitmap, 0)

4. ทำการประมวลผลรูปภาพที่เตรียมไว้

objectDetector.process(image)
    .addOnSuccessListener { detectedObjects ->
        // Task completed successfully
        // ...
    }
    .addOnFailureListener { e ->
        // Task failed with an exception
        // ...
    }

สำหรับผลลัพธ์หลังจากที่ประมวลผลสำเร็จแล้วจะได้ออกมาอยู่ 3 อย่างด้วยกันคือ

Bounding box จะบอกว่าตำแหน่งของ object ที่ทำการ detect ออกมาได้นั้นอยู่ตรงไหนบนรูปภาพ
Tracking ID
Labels ใน labels จะมีค่าอยู่ 3 ค่าคือ
- Label description จะบอกประเภทของ object โดยค่าจะเป็น text
- Label index จะบอกประเภทของ object โดยค่าจะเป็น integer
- Label confidence จะบอกค่าความคล้ายคลึงของ object ว่าเหมือนกับ classification นั้นๆแค่ไหน

for (detectedObject in results) {
    val boundingBox = detectedObject.boundingBox
    val trackingId = detectedObject.trackingId
    for (label in detectedObject.labels) {
        val text = label.text
        if (PredefinedCategory.FOOD == text) {
            ...
        }
        val index = label.index
        if (PredefinedCategory.FOOD_INDEX == index) {
            ...
        }
        val confidence = label.confidence
    }
}

สรุป

การจะทำ detection and tracking objects ในตอนนี้นั้นสามารถทำได้ง่านขึ้นกว่าเมื่อก่อนมากเพราะมี libary ที่มาช่วยทำให้เกือบทั้งหมดแล้ว มีอีกเพียงเล็กน้อยเท่านั้นที่นักพัฒนาจะต้องทำการเขียนเพิ่มเอง แต่ตัว libary ก็ยังมีข้อจำกัดหลายๆอย่าง เช่น ถ้าต้องการให้มีการประมวลรูปภาพที่แม่นยำขึ้น นักพัฒนาก็จำเป็นต้องไปเขียนบางส่วนเพิ่มเข้าไปอีก, ข้อจำกัดทางด้าน devices เพราะการประมวลผลจะทำที่บนตัว local devices จึงทำให้ประสิทธิภาพออกมาไม่เท่ากัน ดั้งนั้นหากจะเอาไปใช้จริงๆแล้วล่ะก็คงต้องเทสในหลายๆ device หน่อย สำหรับตัวอย่างโค้ดสามารถดูได้จากที่นี่เลย

Reference:

Detect and track objects with ML Kit on Android | Google Developers

You can use ML Kit to detect and track objects in successive video frames. When you pass an image to ML Kit, it detects…

developers.google.com

Object Detection and Tracking with Google ML Kit — Vision APIs

มาเริ่มกันเถอะ

สรุป

Detect and track objects with ML Kit on Android | Google Developers

You can use ML Kit to detect and track objects in successive video frames. When you pass an image to ML Kit, it detects…

Written by Sarayut.Wia