Convolutional Neural Networks (CNN): สร้าง Model เพื่อทำ Image Classification ด้วย TensorFlow

6 min readJan 27, 2020

Reference: https://developers.google.com/machine-learning/practica/image-classification/convolutional-neural-networks

ในบทความนี้จะอธิบายขั้นตอนการสร้าง Model ของ Convolutional Neural Network เพื่อทำ Image Classification โดยใช้ TensorFlow โดยเนื้อหาครอบคลุม

การเตรียมข้อมูล (Data Processing)
การสร้างโมเดล (Model Creation)
การประเมินผลโมเดล (Model Evaluation)
การนำโมเดลไปใช้งาน

1. การเตรียมข้อมูล (Data Processing)

เราจะใช้ข้อมูล CIFAR-10 Dataset ซื่งเป็นข้อมูลเพื่อการวิจัย สร้างโดยสถาบัน Canadian Institute For Advanced Research

ข้อมูล CIFAR-10 เป็นข้อมูลรูปภาพสีขนาด 32x32 จำนวน 60,000 ภาพ โดยแบ่งเป็น 10 หมวด ได้แก่ เครื่องบิน รถ นก แมว กวาง สุนัข กบ ม้า เรือ รถบรรทุก โดยค่าตัวเลขตั้งแต่ 0–9 แทนแต่ละหมวดตามลำดับ เช่น ค่า 0 แทนหมวดเครื่องบิน ค่า 1 แทนหมวดรถ เป็นต้น

ข้อมูล CIFAR-10 แบ่งออกเป็น 2 ชุดคือ ข้อมูลสำหรับการเรียนรู้ (Training Dataset) จำนวน 50,000 รูป และข้อมูลสำหรับการ ทดสอบ (Test Dataset) จำนวน 10,000 รูป โดยไม่มีข้อมูลซ้ำซ้อนกันในแต่ละหมวด

ตัวอย่างภาพจาก CIFAR-10 Dataset

ข้อมูลเพิ่มเติมเกี่ยวกับ CIFAR-10 dataset สามารถอ่านเพิ่มเติมได้ที่ https://www.cs.toronto.edu/~kriz/cifar.html

Load Data

CIFAR-10 dataset สามารถ download ได้จาก CIFAR-10 dataset หรือ import จาก keras.datasets ดังนี้

from tensorflow.keras.datasets import cifar10(in_train, out_train), (in_test, out_test) = cifar10.load_data()

ชุดข้อมูลรูปภาพที่ใช้ train จะถูกโหลดเข้าไปเก็บในตัวแปร in_train และ out_train และชุดข้อมูลรูปภาพที่ใช้ test จะถูกโหลดเข้าไปเก็บในตัวแปร in_test และ out_test

เราสามารถตรวจสอบประเภทข้อมูล และขนาดข้อมูลโดยใช้คำสั่ง type และ shape ดังนี้

type(in_train)# Output
# numpy.ndarrayin_train.shape# Output
# (50000, 32, 32, 3)

จะเห็นว่า training dataset มีจำนวน 50,000 ภาพ แต่ละภาพมีขนาด 32x32 และเป็นภาพสีซึ่งประกอบด้วยข้อมูล 3 ชุดได้แก่ R (red-แดง), G (green-เขียว), B (blue-น้ำเงิน)

หากต้องการดูภาพที่โหลดเข้ามา ใช้คำสั่ง imshow จาก matplotlib.pyplot

import matplotlib.pyplot as plt
plt.imshow(in_train[0])

คำสั่ง imshow (in_train[0]) จะแสดงภาพแรกในชุดข้อมูล input ที่เรา load ไว้ในตัวแปล in_train ก่อนหน้านี้

Preprocess Output

ก่อนอื่นเราจะทำการแปลงค่าของชุดข้อมูลภาพให้อยู่ในช่วง 0 ถึง 1 ทั้งนี้เราจะใช้วิธีง่ายๆโดยการตรวจสอบค่าสูงสุดของชุดข้อมูลแล้วนำไปหารแต่ละค่า

in_train.max()# Output
# 255in_train = in_train/255
in_test = in_test/255

ตอนนี้เราจะมาตรวจสอบข้อมูล output ในชุดข้อมูล training กัน

type(out_train)# Output
# numpy.ndarrayout_train.shape# Output
# (50000, 1)out_train[0]
# Output
# array([6], dtype=uint8)

จากผลการรันข้างบน ข้อมูล output มีจำนวน 50,000 โดยเป็นข้อมูล 1 มิติ จากตัวอย่างแสดงให้เห็นว่า output แรกของชุดข้อมูล train มีค่าเป็น 6 ซึ่งเป็นค่าแทนหมวดภาพ กบ นั่นเอง

Preprocess Input

สำหรับข้อมูล output ที่เป็นค่าตัวเลขแทนหมวดหมู่นั้น เราต้องทำการแปลงข้อมูลให้อยู่ในรูปแบบ one-hot categorical ซึ่งหมายถึง array ที่มีค่าเป็น 0 ทั้งหมด ยกเว้นตำแหน่งที่ตรงกับตัวเลขเดิมจะมีค่าเป็น 1

วิธีการแปลงรูปแบบข้อมูลให้เป็น one-hot categorical สามารถทำได้โดยใช้ to_categorical จาก tensorflow.keras.utils

from tensorflow.keras.utils import to_categorical
out_cat_train = to_categorical(out_train, 10)
out_cat_test = to_categorical(out_test, 10)

ในคำสั่งข้างบน เราระบุว่าต้องการแปลงข้อมูลให้เป็น one-hot categorical ที่มีค่าทั้งสิ้น 10 หมวด

ยกตัวอย่างเช่น ค่า 6 ที่เป็นค่าแทนกบ จะต้องถูกแปลงเป็น array ที่มีค่าเป็น 0 ทั้งหมดยกเว้นตำแหน่งที่ 6 คือ [0, 0, 0, 0, 0, 0, 1, 0, 0. 0]

out_train[0]
# Output
# array([6], dtype=uint8)out_cat_train[0]# Output
# array([0., 0., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32)

ตอนนี้ข้อมูลของเราก็พร้อมที่จะนำไปใช้กันแล้ว

2. สร้างโมเดล (Model Creation)

ก่อนอื่นเราก็จะทำการ import library ที่จำเป็นกันก่อน เริ่มจาก model ที่เป็นแบบ Sequential และ layer ต่างๆ ได้แก่ Conv2D, MaxPool2D, Dense และ Flatten

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPool2D, Flatten

Conv2D layer — จะสร้าง feature map หรือ kernel ที่นำไป apply กับภาพที่เป็น input โดยจะ scan ไปให้ทั่วภาพ
Pooling layer — ช่วนลดขนาดของ output ที่ได้จาก layer ก่อนหน้าลง โดยคงไว้ซึ่งคุณสมบัติของข้อมูลให้มากที่สุด
Flatten layer — ทำการแปลงข้อมูล output ที่มีหลายมิติ ให้เป็น 1 มิติ เพื่อเตรียมข้อมูลให้อยู่ใน format ที่พร้อมสำหรับเป็น input ให้กับ Fully connected Layer
Dense layer or Fully connected Layer— ข้อมูลจากทุกๆ input จะเชื่อมต่อไปยัง output ทุกๆ node โดยแต่ละการเชื่อมต่อจะคูณด้วย weight ที่ต่างกัน และที่ทุกๆ node ของ output จะสามารถกำหนด activation ที่เหมาะสมได้

ตอนนี้เราจะมาลองสร้าง model ของ Convolutional Neural Networks โดยใช้ Keras กัน

# Create Sequential Model
model = Sequential()# Layer 1: Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=(4,4), input_shape=(32,32,3), activation='relu',))# Layer 2: Pooling Layer
model.add(MaxPool2D(pool_size=(2,2)))# Layer 3: Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=(4,4), input_shape=(32,32,3), activation='relu',))# Layer 4: Pooling Layer
model.add(MaxPool2D(pool_size=(2,2)))# Layer 5: Flatten Layer
model.add(Flatten())# Layer 6: Dense Layer (Hidden Layer)
model.add(Dense(256, activation='relu'))# Layer 7: Dense Layer (Output Layer)
model.add(Dense(10, activation='softmax'))

Architecture ของ Model ที่สร้างมี Convolutional Layer และ Pooling Layer สองชุด แล้วตามด้วย Flatten layer และ Dense Layer อีกสองชั้น โดยแต่ละ layer ที่มีคุณสมบัติดังนี้

Convolutional Layer รองรับ Input เป็นภาพสีขนาด 32x32 ให้ Output ออกมา 32 features โดยมี kernel ขนาด 4x4 และใช้ Activation Function เป็น Rectified Linear Unit
Pooling Layer ใช้เป็น Max Pool ขนาด 2x2 โดยจะทำการลดขนาด input ลงได้ถึง 1/4
Flatten layer ที่ทำการแปลงข้อมูล multi dimension ให้เป็น vector
Dense Layer แรกเป็น Hidden Layer กำหนดให้มีจำนวน output เป็น 256 และใช้ Activation Function เป็น Rectified Linear Unit
Dense Layer ที่สองเป็น Output Layer ของ Network เราต้องกำหนดให้มีจำนวน output เท่ากับจำนวนหมวดหมู่ภาพที่เราต้องการจัดแบ่ง ในตัวอย่างคือกำหนดเป็น 10 และเราใช้ Activation Function เป็น Softmax เนื่องจาก Output ของเราเป็น Multi-Class

เราสามารถใช้คำสั่ง summary เพื่อเรียกดูรูปแบบ model และจำนวนตัวแปรในแต่ละ layer ได้ดังนี้

model.summary()## Output Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_2 (Conv2D)            (None, 29, 29, 32)        1568      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 32)        16416     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 800)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               205056    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2570      
=================================================================
Total params: 225,610
Trainable params: 225,610
Non-trainable params: 0

ในขั้นตอนต่อไปคือการ config ค่าให้กับ model โดยใช้คำสั่ง compile

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

เรากำหนดให้ model ใช้ Function loss ชื่อ categorical_crossentropy โดย model จะปรับค่าในขั้นตอนการ training เพื่อลดค่าของ loss นี้

optimizer เป็นอีกตัวแปรที่เราต้องกำหนดให้กับ model เราสามารถระบุเป็นชื่อ optimizer ตามตัวอย่าง หรือสร้าง optimizer object ก่อนก็ได้ ในตัวอย่างเราเลือกใช้ algorithm adam

สำหรับข้อมูลของ adam สามารถอ่านเพิ่มเติมได้จาก https://arxiv.org/abs/1412.6980

เราสามารถระบุ metrics เพิ่มเติมที่ต้องการให้ model ทำการคำนวณในระหว่าง training และ testing โดย metric ที่มักจะระบุเพิ่มคือ accuracy

ตอนนี้เราก็พร้อมแล้วสำหรับการ training แล้ว เราสั่งให้ model เริ่ม train ด้วยคำสั่ง fit

model.fit(in_train, out_cat_train, epochs=15, validation_data=(in_test, out_cat_test))

จากคำสั่งข้างบน เราแบ่งข้อมูลในการ training เป็น 15 รอบ (epochs) โดยในแต่ละรอบจะใช้ข้อมูลทั้งหมดที่มีจากตัวแปร in_train และ out_cat_train สำหรับการ validate ผลเราจะใช้ข้อมูล test

ทั้งนี้เราสามารถกำหนดให้การ training หยุดลงก่อนถึง 15 รอบ หากค่าตัววัดหรือ metric ตรงกับค่าที่เราต้องการ เทคนิคยังนี้ช่วยให้ model ไม่ overfit กับตัวอย่างข้อมูลมากเกินไปด้วย เทคนิคนี้อาศัย callbacks จาก EarlyStopping เข้ามาช่วย

from tensorflow.keras.callbacks import EarlyStoppingearly_stop = EarlyStopping(monitor='val_loss', patience=2)model.fit(in_train, out_cat_train, epochs=15, validation_data=(in_test, out_cat_test), callbacks=[early_stop])

ตัวอย่างข้างบน เรากำหนดให้ model คอย monitor ค่า loss จาก validation data แล้วถ้าค่า loss ใน รอบการ training ไม่ลดลงหรือมีค่าเพิ่มขึ้น ให้หยุดการ training ค่า patience=2 หมายถึงให้หยุดเมื่อค่า loss เพิ่มขึ้น 2 รอบ (epochs) แล้ว

เมื่อเราสั่ง run จะได้ผลดังนี้

Train on 50000 samples, validate on 10000 samples
Epoch 1/15
50000/50000 [==============================] - 54s 1ms/sample - loss: 1.4715 - accuracy: 0.4679 - val_loss: 1.2376 - val_accuracy: 0.5615
Epoch 2/15
50000/50000 [==============================] - 43s 861us/sample - loss: 1.1507 - accuracy: 0.5963 - val_loss: 1.0553 - val_accuracy: 0.6306
Epoch 3/15
50000/50000 [==============================] - 44s 884us/sample - loss: 1.0090 - accuracy: 0.6445 - val_loss: 1.0764 - val_accuracy: 0.6240
Epoch 4/15
50000/50000 [==============================] - 46s 918us/sample - loss: 0.9044 - accuracy: 0.6853 - val_loss: 1.0289 - val_accuracy: 0.6406
Epoch 5/15
50000/50000 [==============================] - 53s 1ms/sample - loss: 0.8068 - accuracy: 0.7195 - val_loss: 0.9641 - val_accuracy: 0.6641
Epoch 6/15
50000/50000 [==============================] - 45s 898us/sample - loss: 0.7234 - accuracy: 0.7480 - val_loss: 0.9369 - val_accuracy: 0.6871
Epoch 7/15
50000/50000 [==============================] - 48s 961us/sample - loss: 0.6465 - accuracy: 0.7730 - val_loss: 0.9404 - val_accuracy: 0.6871
Epoch 8/15
50000/50000 [==============================] - 49s 988us/sample - loss: 0.5754 - accuracy: 0.7988 - val_loss: 0.9964 - val_accuracy: 0.6838

3. การประเมินผลโมเดล (Model Evaluation)

เราสามารถดูผลการรันย้อนหลังด้วยคำสั่ง history โดยคำสั่งนี้จะแสดงตัววัด หรือ metrics ในทุกรอบการรันให้ด้วย

metrics = pd.DataFrame(model.history.history)metrics

การ plot graph ของ metrics ต่างๆ ก็ช่วยให้เราเข้าใจความสัมพันธ์ของตัววัดต่างๆได้ง่ายขึ้น

metrics[['loss', 'val_loss']].plot()

ตัวอย่าง กราฟระหว่างค่า loss ที่ได้จากชุดข้อมูล train กับ ชุดข้อมูล test แสดงให้เห็นว่าในรอบการรันที่ 6 และ 7 ค่า loss ของชุดข้อมูล train ยังคงลดลง แต่ค่า loss ของชุดข้อมูล test เริ่มเพิ่มขึ้น หากเรายัง train model ต่อไปก็อาจทำให้ model ของเราเกิด overfit ขึ้นได้

metrics[['accuracy', 'val_accuracy']].plot()

สำหรับค่าความถูกต้องหรือ accuracy นั้นก็แสดงไปในทิศทางที่สอดคล้องกัน คือในรอบการ train ที่ 6 และ 7 แม้ว่าค่าความถูกต้องของชุดข้อมูล train จะเพิ่มขึ้น แต่เมื่อทดสอบกับชุดข้อมูล test พบว่าค่าเริ่มลดลง การหยุด train model จึงช่วยให้ model ไม่เกิด overfit เกินไป

คำสั่งที่ใช้หาค่าตัววัด model คือ evaluate

model.evaluate(in_test, out_cat_test, verbose=0)# Output
# [0.9964422744750977, 0.6838]

สำหรับ model ของเรามีตัววัด 2 ตัว ค่าแรกคือ loss และค่าที่สองคือ accuracy

นอกจากนี้ เรายังสามารถออก report โดยใช้ classification_report จาก sklearn ดังนี้

from sklearn.metrics import classification_report
prediction = model.predict_classes(in_test)
print(classification_report(out_test, prediction))

หรือจะเรียกดู confusion matrix ได้โดย

from sklearn.metrics import confusion_matrix
prediction = model.predict_classes(in_test)
confusion_matrix(out_test, prediction)

จะสังเกตว่าโดยส่วนใหญ่ค่าที่ model ให้ผลจะตรงกับ output จริงของชุดข้อมูล test และ model จะให้ผลผิดพลาดจำนวนมากให้บางชุดข้อมูล เช่น

มีภาพอยู่ 126 ภาพที่เป็นภาพในหมวด 9 (รถบรรทุก) แต่ model ทำนายว่าเป็นภาพในหมวด 1 (รถยนต์)
มีภาพอยู่ 254 ภาพที่เป็นภาพในหมวด 5 (สุนัข) แต่ model ทำนายว่าเป็นภาพในหมวด 3 (แมว)

4. การนำโมเดลไปใช้งาน

หลังจากที่เราได้ train model ของเราเรียบร้อยแล้ว เราสามารถนำ model ไปใช้ทำนายภาพใหม่ๆได้แล้ว

เราลองมาให้ model ทำนายภาพนี้กัน

from tensorflow.keras.datasets import cifar10(in_train, out_train), (in_test, out_test) = cifar10.load_data()my_image = in_test[10]plt.imshow(my_image)

คำสั่งที่ใช้ทำนายผลคือ predict_classes

model.predict_classes(my_image.reshape(1,32,32,3))# Output
# array([0])

model ทำนายว่าเป็นภาพให้หมวด 0 หรือ เครื่องบิน นั่นเอง

ข้อมูลเพิ่มเติมอ่านได้ที่