👨🏼‍🦯Obstacle Detection for Blind people

ช่วยเหลือผู้พิการทางสายตาด้วย DL (Timeline Version)

6 min readJun 21, 2022

คุณเคยจินตนาการไหม เมื่อวันหนึ่งคุณมองไม่เห็นอะไรเลย คุณต้องพึ่งพาไม้เท้านำทางตลอดเวลา คุณจะชีวิตได้เหมือนเดิมหรือไม่

👋🏼 สวัสดีครับ วันนี้ผมจะมาพูดถึงที่มาและความสำคัญของโปรเจกต์ Object Detection for Blind people ที่ผมพัฒนาขณะที่อยู่ร่วมโครงการ AI Builders 2022

📃Tabel of Contents

- Inspiration — แรงบัลดาลใจ
-Target — เป้าหมาย
-Data collection, cleaning ครั้งที่ 1
-Training & Evaluation ครั้งที่ 1
-(New) Target — เป้าหมายครั้งใหม่
-Data collection, cleaning ครั้งที่ 2
-Training & Evaluation ครั้งที่ 2
-Data collection, cleaning ครั้งที่ 3
-Training & Evaluation ครั้งที่ 3
-ลองเปลี่ยน Architecture เป็น YOLO
- Deployment — นำไปใช้จริง
-Conclusion — บทสรุป
-Future Development — การพัฒนาในอนาคต

✨ Inspiration — แรงบัลดาลใจ

ผมเคยเป็นจิตอาสาอ่านหนังสือให้กับผู้พิการทางสายตา บนแพลตฟอร์ม Read for the Blind และยังสนใจการพัฒนา Accessibility ต่างๆให้กับผู้พิการ

ในขณะที่กำลัง Research หาแนวทางการใช้ Deep Learning เพื่อสร้างสิ่งที่มีประโยชน์กับผู้พิการทางสายตา ก็ไปเจอกับ Dataset บน Kaggle

Help Blind Community to walk

Predict the action of walking after recognizing the view in the image

www.kaggle.com

While GPS enabled devices can guide you path on street or on roads. These devices are limited in their usage because these can’t help in walking thorough the enclosed spaces like rooms etc. and thus limitation for blind people to walk in enclosed space.
“ขณะที่ผู้พิการทางสายตาเดินทางในพื้นที่เปิดยังมี GPS ช่วยเหลือการเดินได้แต่ปัญหาการเดินในพื้นที่ปิดของผู้พิการทางสายตา ไม่สามารถใช้ GPS ช่วยเหลือหรือนำทางได้”

การเดินทางในอาคารหรือพื้นที่ปิดที่ไม่คุ้นเคยของผู้พิการทางสายตาเป็นสิ่งที่ยากลำบาก มีทั้งสิ่งกีดขวางหรือไม่ทราบทางเดิน

📌Target — เป้าหมาย

Path (Object) Detection Model

หาประตูและทางเดิน (2 classes: door, path)
เมื่อมีเป้าหมายแล้วก็เริ่มเตรียมข้อมูลเลย!

📚Data collection, cleaning ครั้งที่ 1

ข้อมูลที่ผมนำมาใช้มาจาก 2 แหล่งดังนี้

Unimelb Corridor Synthetic dataset (figshare.com)

Unimelb Corridor Synthetic dataset

This data-set is a supplementary material related to the generation of synthetic images of a corridor in the University…

melbourne.figshare.com

นำมา annotate ด้วย labelImg ~20 ภาพ

ตัวอย่างการ label: path, door ด้วย labelImg ใน Unimelb Corridor Synthetic dataset

2. DoorDetect-Dataset โดย MiguelARD บน Github

GitHub — MiguelARD/DoorDetect-Dataset: Labelled image dataset for door and handle detection.

DoorDetect is a dataset of 1,213 images that have been annotated with object bounding boxes. The images are very…

github.com

ข้อมูล label แล้วด้วย Bounding Box ในรูปแบบของ YOLO แบ่ง class ดังต่อไปนี้
- door
- handle
- cabinet door
- refrigerator door

นำมากรองออกเหลือ 1 class: door
Annotate class: path เพิ่มด้วย labelImg ~1000 ภาพ
และเปลี่ยนเป็น COCO Format

ตัวอย่างการ label: path ด้วย labelImg ใน DoorDetect-Dataset

👾Training & Evaluation ครั้งที่ 1🛸

ใช้ Network Detectron2 กับ Pretrained-weight: Faster R-CNN R50-FPN

หลักการทำงานของ Faster R-CNN (RPN + Fast R-CNN)

Faster R-CNN แบ่งเป็น 2 ส่วนคือ
1. Region Proposal ที่หาตำแหน่งและขนาดของ Bounding Box และส่งไปให้อีก Network
2. Fast RCNN Network ที่มีหน้าที่นำแต่ละ Bounding Box มาแบ่งแยกประเภท (Classification)

ต่อไปก็ตั้งค่า Config ต่างๆแล้ว Train กันเลย

Pretrained-weight: Faster R-CNN R50-FPN
Epochs = 10
N_Classes = 2

หลังจากการ Train แล้วมาลองดูผลลัพธ์

Problem Analysis: Model ไม่สามารถแยก Class path ออกมาได้เลย AP = 0.0

จากรูปสังเกตว่ากรอบสีส้ม ก็อาจจะเป็น path ได้เช่นเดียวกัน ดังนั้นการนิยามความเป็น path มีความไม่แน่นอนมากเกินไป

ผมจึงตัดสินใจเปลี่ยนเป้าหมายของ Model

📌(New) Target — เป้าหมายครั้งใหม่

Obstacle (Object) Detection Model

เนื่องจากปัญหาการกำหนดเงื่อนไขความเป็นทางเดิน (path) มีความยากและไม่แน่นอนเกินไปจึงทำให้ผมต้องเปลี่ยน target ของ model โดยการนำมาหาสิ่งกีดขวางและประตูแทน (10 classes: door, cabinetDoor, refrigeratorDoor, window, chair, table, cabinet, couch, openedDoor, pole)

📚Data collection, cleaning ครั้งที่ 2

Annotate ใหม่บน Dataset เดิมด้วย CVAT

CVAT — Computer Vision Annotation Tool

GitHub — openvinotoolkit/cvat: Powerful and efficient Computer Vision Annotation Tool (CVAT)

CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to…

github.com

CVAT เครื่องมือที่ผมใช้ในการ Annotatate รูปภาพและ Convert ไปยัง Format ต่างๆ
รองรับหลากหลาย Fomat ทั้ง Import และ Export

นำรูปภาพ Dataset เดิมมา Annotate เพิ่ม ~530 Bounding Boxes รวมกัน 10 classes ทั้งหมด Export ออกมาเป็น COCO Format

แบ่งเป็น 10 classes

door — ทางเดินเบื้องต้น
openedDoor — ทางเดินเบื้องต้น
cabinetDoor — ตู้เก็บของ, นำ annotation ที่มีอยู่แล้วมาใช้
refrigeratorDoor — ตู้เย็น, นำ annotation ที่มีอยู่แล้วมาใช้
window — model ชอบสับสนว่าประตูเป็นหน้าต่าง
chair — สิ่งกีดขวางพบได้ทั่วไปในพื้นที่ปิด
table — สิ่งกีดขวางพบได้ทั่วไปในพื้นที่ปิด
cabinet — สิ่งกีดขวางพบได้ทั่วไปในพื้นที่ปิด
sofa/couch — สิ่งกีดขวางพบได้ทั่วไปในพื้นที่ปิด
pole — สิ่งกีดขวางพบได้ทั่วไปในพื้นที่ปิด

ทำ Test Set ใหม่

หลังจากเตรียม Training Set แล้วก็ต้องมาเตรียม Test Set เพื่อใช้สำหรับการวัดผล Model ในอนาคต

Indoor Training Set (ITS) [RESIDE-Standard] บน Kaggle

Indoor Training Set (ITS) [RESIDE-Standard]

Training Dataset for Single Image Dehazing

www.kaggle.com

แบ่งมาทำ Test Set ~100 รูป Annotate ด้วย CVAT

👾Training & Evaluation ครั้งที่ 2🛸

ใช้ Network Detectron2 กับ Pretrained-weight: Faster R-CNN R50-FPN

จำนวนข้อมูลหลังจาก Annotate สิ่งกีดขวาง

ตั้งค่า Config ต่างๆแล้ว Train กันเลย

Pretrained-weight: Faster R-CNN R50-FPN
Epochs = 20
N_Classes = 10

หลังจากการ Train แล้วมาลองดูผลลัพธ์

Problem Analysis: สังเกตุว่า class: (window, sofa/couch, openedDoor, pole) Model ไม่สามารถแยกออกมาได้เลย จึงต้องหา data เพิ่ม
และผมจะกำหนด Metrics ของ model นี้เป็น Baseline

📚Data collection, cleaning ครั้งที่ 3

1. Indoor Training Set (ITS) [RESIDE-Standard] บน Kaggle

Indoor Training Set (ITS) [RESIDE-Standard]

Training Dataset for Single Image Dehazing

www.kaggle.com

แบ่งมาทำ Training ~100 รูป Annotate ด้วย CVAT

โดยรวมแล้วตอนนี้เรามีข้อมูลทั้งหมด 1,358 รูป แบ่งเป็น Train/Valid/Test ดังนี้

Train: 1021 Images (75%)
Validation: 230 Images (17%)
Test: 107 Images (8%)

👾Training & Evaluation ครั้งที่ 3🛸

ใช้ Network Detectron2 กับ Pretrained-weight: Faster R-CNN R50-FPN

จำนวนข้อมูลหลังจากเพิ่มข้อมูล

Annonate เพิ่ม ~900 Bounding Boxes ทั้ง 10 classes

ตั้งค่า Config ต่างๆแล้ว Train กันเลย

Pretrained-weight: Faster R-CNN R50-FPN
Epochs = 20
N_Classes = 10
ไม่ได้ปรับเพิ่มเติมจากการเทรนครั้งที่แล้ว

หลังจากการ Train แล้วมาลองดูผลลัพธ์

สังเกตุว่า class: (window, chair, table, cabinet, sofa/couch)
มีค่า Average Precision ที่ดีขึ้น รวมถึง mAP ที่มากขึ้นด้วย
ถึงแม้ว่ามี AP Score ของบาง Class ที่ลดลง

🚀ลองเปลี่ยน Architecture เป็น YOLO

📌Same Target: Obstacle (Object) Detection Model

ยังมีเป้าหมายเดิม เพียงแค่ YOLOv5 มีความ lightweight กว่าและเหมาะสมกับปัญหาที่ตั้ง

YOLO (You Only Look Once) ทำงานอย่างไร

You Only Look Once: Unified, Real-Time Object Detection

YOLO แบ่งรูปภาพเป็น Grid SxS แล้วแบ่งไป 2 Network
1. Network ทำนาย Bounding Boxes
2. Network ทำนาย class บนตำแหน่งต่างๆของ Grid
หลังจากนำสอง network นี้มารวมกันก็จะได้ Final Prediction

📚Data preparing for YOLOv5

เตรียมข้อมูลแบบ YOLO Format โดยการอัพโหลดลง CVAT แล้ว Export ออกมาเป็น YOLO Format (แต่ CVAT รองรับแค่ YOLOv1.1)

จึงปรับ Format ของ Annotationให้เข้ากับ YOLOv5

👾Train! & Evaluation on YOLOv5

สุดท้ายแล้วผลการทำ Model ของผมจบที่ YOLOv5 ครับเพราะผล Metrics ดีกว่า และมีการประมวลผลที่เร็วกว่า

* หมายเหตุ: ขออภัย 😅 ที่Test set ของผมใน Class: refrigeratorDoor มี Instance แค่ 2 อาจจะต้องเพิ่มรูปภาพ Test set สำหรับ Class นี้เพื่อผล Metrics ที่แม่นยำขึ้นครับ

🛬 Deployment — นำไปใช้จริง

ผมใช้ Streamlit สำหรับการทดลองใช้งาน Model เบื้องต้น (รูปภาพ/วิดีโอ)

GitHub - thepbordin/Obstacle-Detection-for-Blind-people-Deployment

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

📑Conclusion — บทสรุป

สรุปทุกๆ Model ที่เคยทำมาวัด mAP ออกมาได้ดังนี้

YOLOv5 ดีกว่า Faster R-CNN อย่างเห็นได้ชัดในการพัฒนาครั้งนี้
Class: pole มีจำนวน Data ที่น้อยเกินไปจึงไม่พอที่จะให้ Model เรียนรู้

*หมายเหตุ: Test set ของผมใน Class: refrigeratorDoor มี Instance เพียงแค่ 2 อาจจะต้องเพิ่มรูปภาพ Test set สำหรับ Class นี้เพื่อผล Metrics ที่แม่นยำขึ้นครับ

📈Future Development — การพัฒนาในอนาคต

Minor Changes:

เพิ่ม Dataset/ปรับความ Imbalance ของ Dataset ทั้ง (Train,Test) เพื่อผลลัพธ์ที่ดีขึ้น
Drop บาง class ที่ซำซ้อนกัน (cabinet, openedDoor) อาจจะลดความสับสนของ model ได้

Implementation

(1) ตรวจจับหาป้ายเตือนพื้นลื่น ป้ายคำเตือนอื่นๆ

(2) Implement with lidar scanner from the phone

💳Credits — ที่มา

✨Special Thanks

https://carbon.now.sh/
https://www.facebook.com/aibuildersx/
เพื่อนๆและพี่ๆ Mentor / TA AI Builders 2022 คอยช่วยเหลือในการเรียนรู้ครั้งนี้

👨🏼‍🦯Obstacle Detection for Blind people

ช่วยเหลือผู้พิการทางสายตาด้วย DL (Timeline Version)

📃Tabel of Contents

✨ Inspiration — แรงบัลดาลใจ

Help Blind Community to walk

Predict the action of walking after recognizing the view in the image

📌Target — เป้าหมาย

Path (Object) Detection Model

📚Data collection, cleaning ครั้งที่ 1

ข้อมูลที่ผมนำมาใช้มาจาก 2 แหล่งดังนี้

Unimelb Corridor Synthetic dataset

This data-set is a supplementary material related to the generation of synthetic images of a corridor in the University…

GitHub — MiguelARD/DoorDetect-Dataset: Labelled image dataset for door and handle detection.

DoorDetect is a dataset of 1,213 images that have been annotated with object bounding boxes. The images are very…

👾Training & Evaluation ครั้งที่ 1🛸

หลักการทำงานของ Faster R-CNN (RPN + Fast R-CNN)

📌(New) Target — เป้าหมายครั้งใหม่

Obstacle (Object) Detection Model

📚Data collection, cleaning ครั้งที่ 2

Annotate ใหม่บน Dataset เดิมด้วย CVAT

GitHub — openvinotoolkit/cvat: Powerful and efficient Computer Vision Annotation Tool (CVAT)

CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our team to…

ทำ Test Set ใหม่

Indoor Training Set (ITS) [RESIDE-Standard]

Training Dataset for Single Image Dehazing

👾Training & Evaluation ครั้งที่ 2🛸

📚Data collection, cleaning ครั้งที่ 3

Indoor Training Set (ITS) [RESIDE-Standard]

Training Dataset for Single Image Dehazing

👾Training & Evaluation ครั้งที่ 3🛸

🚀ลองเปลี่ยน Architecture เป็น YOLO

📌Same Target: Obstacle (Object) Detection Model

YOLO (You Only Look Once) ทำงานอย่างไร

📚Data preparing for YOLOv5

👾Train! & Evaluation on YOLOv5

🛬 Deployment — นำไปใช้จริง

GitHub - thepbordin/Obstacle-Detection-for-Blind-people-Deployment

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

📑Conclusion — บทสรุป

📈Future Development — การพัฒนาในอนาคต

Minor Changes:

Implementation

💳Credits — ที่มา

✨Special Thanks

Written by Thepbordin Jaiinsom