優拓 Paper Note ep.21: Few-Shot Learning (Part III)

Chu Po-Hsien

Published in

YOCTOL.AI

4 min readApr 17, 2018

本系列前兩篇：

優拓 Paper Note ep.15: Few-Shot Learning (Part I)

註：以下的圖皆截自這篇論文《Make SVM Great Again with Siamese Kernel》

blog.yoctol.com

優拓 Paper Note ep.17: Few-Shot Learning (Part II)

註：以下的圖皆截自《Optimization as a Model for Few-Shot Learning》

blog.yoctol.com

Matching Network for One Shot Learning:

[1606.04080] Matching Networks for One Shot Learning

Abstract: Learning from a few examples remains a key challenge in machine learning. Despite recent advances in…

arxiv.org

之前討論過的兩篇論文在結果的部份都和這篇 Matching Network 做比較。這篇是 DeepMind 在 2016 發表的一種 few-shot learning 方法。同樣是參考過去的 few-shot task ，但它並不是用 meta learning，而是透過學出 task-dependant 的 feature representation。

什麼是 task-dependant 的 feature representation？又為什麼需要它？

之前提過， few-shot learning 可以透過 non-parametric 的方法來加強。這篇論文將這個概念化約到一個簡單的 attention 機制：

要預測新的資料 (x head)，就讓他參考每筆訓練資料，根據相似度做 weighted sum。這種類似 KNN 的作法，有兩個重要 issue：

如何訓練出有效的 feature representation
什麼樣的 metric function 才能有效捕捉資料間的相似程度

這裡我們固定 metric function (cosine similarity)，試圖學出比較好的 feature representation。

傳統上，feature representation 是不會因為問題不同而改變的，一張汽車的照片通過 VGG19 而產生的向量，不會因為是交通工具分類還是行人判斷就有不同的值。

Matching Network 提出了將整個 few shot task 的 training set 一起 encoding 的方法：

g 是對訓練的 representation，f 則是對要預測的資料的 representation。兩個 function 都是可以學的。學習的目標則是讓過去的 few-shot task 分類準確度提高：

g 和 f 的實做相當複雜，還請參考論文的 Appendix。

系列總結

在現實情況下，高品質的標記資料可能不是那麼容易取得，因此，這類問題相當值得研究。除了上述提到的方法， similarity learning, metric learning 都是可能的方法，2006 Hinton 的經典論文 Neighborhood Component Analysis 對 KNN 的修正，就有 few-shot learning 的影子。