Adversarial Validation เป็น อยู่ คือ?

lukkiddd

Published in

lukkiddd

2 min readFeb 22, 2021

Outline

Adversarial Validation อย่างย่อ
ทำ adversarial validation ยังไง?
ตัวอย่างการนำไปใช้

สรุป

Adversarial validation ช่วยให้เราตรวจจับความต่างกันของ Distribution ในข้อมูลคนละชุดได้
Adversarial validation เอาข้อมูล 2 ชุด เช่น เช่น train/test มาคละ กัน แล้วสร้าง target label เป็น 2 class (train/test) สุดท้ายเราจะได้โมเดลที่ทำนายว่า ข้อมูลนี้อยู่ใน Training set หรือ Test set
โมเดลยิ่งทำนายแม่น แปลว่ายิ่งแย่ เพราะมีบาง example หรือ บาง feature ที่ทำให้โมเดลแยกได้ว่าข้อมูลนี้อยู่ในชุดไหน
ช่วยบอกได้ว่ามีบางอย่างในข้อมูลผิดปกติ แต่ต้องมาดูเองว่าอะไรที่ผิดปกติ

1. Adversarial Validation อย่างย่อ

แรงบันดาลใจมาจาก FastML Blog

Adversarial validation คือวิธีการที่เอามาใช้เมื่อเราเจอปัญหา Distribution shift หรือ Data ของข้อมูลสองชุดมีการกระจายตัวที่ต่างกัน เช่น train โมเดลจากข้อมูลชุดหนึ่ง แล้วนำไปใช้กับข้อมูลอีกชุดหนึ่ง

ไอเดียเบื้องหลังจาก Adversarial validation คือ เราพยายามจะสร้างโมเดลขึ้นมาเพื่อพยายามให้มันแยกข้อมูลออกจากกัน

2. ทำ adversarial validation ยังไง?

สมมติว่าเรามีข้อมูลแบบในภาพที่ 1

ภาพที่ 1: ตัวอย่างข้อมูล 2 ชุด Training set กับ Test set

เราจะเอาข้อมูลมันมาคละรวมกัน แล้วสร้าง Target ขึ้นมาใหม่แบบในภาพที่ 2

ภาพที่ 2: ตัวอย่างข้อมูลเมื่อเตรียมทำ Adversarial Validation

หลังจากมีข้อมูลแล้ว เราก็จะสร้างโมเดล Binary classification เพื่อจำแนกข้อมูลแต่ละ row ว่าเป็น TRAIN หรือ เป็น TEST

เมื่อเทรนโมเดลเสร็จแล้ว เราก็ทำการคำนวนดู AUC ว่าได้เป็นเท่าไหร่

# logistic regression / AUC: 50.05%

อย่างในตัวอย่างนี้ เราอาจจะได้ AUC ที่ 50.05% นั่นแปลว่า โมเดลไม่สามารถแยกข้อมูล train กับ test ได้ นั่นเท่ากับว่า ไม่มีปัญหา distribution shift ของข้อมูล

แต่ถ้าหาก ผลลัพธ์มันสูงหล่ะ เช่น AUC = 95%นั่นแปลว่า เราอาจจะเจอกับปัญหา distribution shift

วิธีการแก้คือ เราอาจจะลองสุ่มตัด feature บางอย่างออก แล้วสร้างโมเดลใหม่ เพื่อดูว่าอะไรทำให้เกิด distribution shift ในข้อมูลของเรากันแน่

3. ตัวอย่างการนำไปใช้

Adversarial Validation Approach to Concept Drift Problem in User Targeting Automation Systems at…

In user targeting automation systems, concept drift in input data is one of the main challenges. It deteriorates model…

arxiv.org

ตัวอย่างที่น่าสนใจเป็นของ Uber ครับ

ใน Paper เค้าบอกว่าโดยตามปกติแล้ว เวลาเค้าจะเจอปัญหา Concept drift (หนึ่งในรูปแบบของ Distribution shift) เค้าจะแก้ด้วยการ Retrain model ใหม่ ข้อเสียของแบบนี้คือ กว่าที่เราจะ Retrain model มันอาจจะส่งผลเสียต่อลูกค้าไปแล้วก็ได้

เค้าก็เสนอวิธีการแก้มาหลายวิธีนะครับ หนึ่งในนั้นคือ Adversarial Validation โดยที่เค้านำมันไปผสมกับระบบเลือก feature โดยอัตโนมัติ

ใครสนใจเพิ่มเติม ลองเปิดอ่านดูผ่าน ๆ แล้วกลับมาแชร์กันได้นะครับ :D

4. References

Adversarial validation, part one

Many data science competitions suffer from a test set being markedly different from a training set (a violation of the…

fastml.com

Adversarial Validation | Zak Jost

If you were to study some of the competition-winning solutions on Kaggle, you might notice references to "adversarial…

blog.zakjost.com

Guide To Adversarial Validation To Reduce Overfitting in Machine Learning

Overfitting a model to your data is one of the most common challenges you will face as a Data Scientist. This problem…

analyticsindiamag.com

Adversarial Validation เป็น อยู่ คือ?

Outline

สรุป

1. Adversarial Validation อย่างย่อ

2. ทำ adversarial validation ยังไง?

3. ตัวอย่างการนำไปใช้

Adversarial Validation Approach to Concept Drift Problem in User Targeting Automation Systems at…

In user targeting automation systems, concept drift in input data is one of the main challenges. It deteriorates model…

4. References

Adversarial validation, part one

Many data science competitions suffer from a test set being markedly different from a training set (a violation of the…

Adversarial Validation | Zak Jost

If you were to study some of the competition-winning solutions on Kaggle, you might notice references to "adversarial…

Guide To Adversarial Validation To Reduce Overfitting in Machine Learning

Overfitting a model to your data is one of the most common challenges you will face as a Data Scientist. This problem…

Machine Learning Engineering

Machine Learning Engineering [Burkov, Andriy] on Amazon.com. FREE shipping on qualifying offers. Machine Learning…

Written by lukkiddd

Adversarial Validation เป็น อยู่ คือ?

Outline

สรุป

1. Adversarial Validation อย่างย่อ

2. ทำ adversarial validation ยังไง?

3. ตัวอย่างการนำไปใช้

Adversarial Validation Approach to Concept Drift Problem in User Targeting Automation Systems at…

In user targeting automation systems, concept drift in input data is one of the main challenges. It deteriorates model…

4. References

Adversarial validation, part one

Many data science competitions suffer from a test set being markedly different from a training set (a violation of the…

Adversarial Validation | Zak Jost

If you were to study some of the competition-winning solutions on Kaggle, you might notice references to "adversarial…

Guide To Adversarial Validation To Reduce Overfitting in Machine Learning

Overfitting a model to your data is one of the most common challenges you will face as a Data Scientist. This problem…

Machine Learning Engineering

Machine Learning Engineering [Burkov, Andriy] on Amazon.com. *FREE* shipping on qualifying offers. Machine Learning…

Written by lukkiddd

Machine Learning Engineering [Burkov, Andriy] on Amazon.com. FREE shipping on qualifying offers. Machine Learning…