4 Features น่าใช้ ใน Tensorflow Serving

lukkiddd

Published in

lukkiddd

3 min readSep 12, 2023

Thanks to fnitiwat

Tensorflow Serving คือ?

Tensorflow Serving คือเครื่องมือ ที่ออกแบบมาเพื่อรันโมเดลบน Production ได้อย่างมีประสิทธิภาพ แถมยังใช้งานง่าย สะดวก และมี feature ที่หลากหลายให้เราเลือกใช้ตามความเหมาะสมของโจทย์เราอีกด้วยครับ

วันนี้เราจะมาแนะนำ Feature ที่น่าสนใจกัน เผื่อใครสนใจ อาจจะลองเอาไปใช้งานดูครับ

Model Polling — อัพเดทโมเดล อัตโนมัติ
Remote Polling — อัพเดทโมเดล โดยดึงจาก Cloud Storage
Model Config — ปรับแต่ง Deployment configuration, จัดการ Model Versioning และ Roll-out
Batch Inference — ทำนายเป็นรอบ

1. Model Polling

ตัว Tensorflow Serving สามารถคอยดึงโมเดลใหม่ ๆ ที่เอาไปวางไว้ให้อัตโนมัติ ซึ่งโดยปกติแล้ว จะคอยดึงให้ทุก 1 วินาที เราสามารถปรับค่านี้ให้เป็นเลขอื่นได้โดยใช้ option ที่ชื่อ --file_system_poll_wait_seconds

docker run ... \
       --file_system_poll_wait_seconds=3600

2. Remote Polling

เราสามารถเอาโมเดลของเราไปวางไว้บน Cloud Storage อย่างเช่น AWS หรือ GCP ก็ได้ แล้วให้ตัว Tensorflow Serving คอยดึงจาก Cloud Storage แทนซึ่งแต่ละเจ้าก็จะมี Config ที่แตกต่างกันครับ

AWS

docker run ... \
       -e MODEL_BATH=s3://.... \
       -e AWS_ACCESS_KEY_ID=xxx \
       -e AWS_SECRET_ACCESS_KEY=xxx \
       -e AWS_REGION=xxx

GCP

docker run ... \
       -e MODEL_PATH=gs://my_bucket/models/model_a \
       -e GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json

3. Model Config

หากเราต้องการ Deploy หลาย Model ก็สามารถทำได้ โดยใช้สิ่งที่เรียกว่า Model Config

model_config_list {
    config {
        name: 'my_first_model'
        base_path: ''
        model_platform: 'tensorflow'
    }
    config {
        name: 'my_second_model'
        base_path: ''
        model_platform: 'tensorflow'
    }
}

ส่วนตอนใช้งานก็ใส่ option เพิ่มเข้าไปครับ

docker run .... \
       --model_config_file=gs://my_bucket/models/model_config_list
       --model_config_file_poll_wait_seconds=3600

Versions Labels ใช้ผ่าน REST ได้ ตั้งแต่ Tensorflow Serving 2.3
/labels/stable:predict

Tensorflow Serving Configuration | TFX | TensorFlow

In this guide, we will go over the numerous configuration points for Tensorflow Serving. While most configurations…

www.tensorflow.org

4. Batch Request

Feature ที่จะช่วยให้เราใช้ประโยชน์ของ CPU/GPU ได้อย่างเต็มที่ หากเรามี Request จำนวนมาก เรียกเข้ามาพร้อม ๆ กัน เราสามารถ กักตุนมันไว้เป็นก้อน (batch) แล้วค่อยให้โมเดลเราทำการทำนายทีเดียวได้ด้วย

โดยปกติแล้ว Feature นี้จะปิดไว้ ทำให้ทุก request ที่เรียกเข้ามาจะถูกทำนายแยกกัน ดังรูป

A client call multiple times to Tensorflow Serving, and get response one-by-one.

ในขณะเดียวกันเราก็สามารถกักตุนมันไว้แล้วค่อยทำนายทีเดียวแบบในรูปที่สองได้

A client call multiple times to Tensorflow Serving, and get response only once.

ซึ่งการทำแบบนี้ อาจจะทำให้เราตอบ request ช้าลง เพราะเราต้องรอจนกว่าจะครบ 1 batch, หรือจนกว่าจะครบ timeout ของแต่ละรอบ แต่การทำแบบนี้จะทำให้เราสามารถใช้ CPU/GPU ได้อย่างคุ้มค่ามากขึ้น

max_batch_size
batch_timeout_micros
num_batch_threads
max_enqueue_batches
pad_variable_length_inputs

วิธีการใช้งานก็ไม่ยากครับ

สร้างไฟล์สำหรับ batch parameter ขึ้นมา batch_parameters.txt

max_batch_size { value: 32 }
batch_timeout_micros { value: 5000 }
pad_variable_length_inputs: true

2. ตอน Run tensorflow serving ก็ ให้ทำการเปิด feature นี้ขึ้นมาครับ

docker run .... \
       --enable_batching=true \
       --batching_parameters_file=\gs://my_bucket/models/batching_paramters.txt

tensorflow/serving

While serving a TensorFlow model, batching individual model inference requests together can be important for…

github.com