Day 71 — Evasion Attack: One-Pixel Attack

今日主題：單一像素攻擊

5 min readJul 30, 2018

參考資料

筆記

過往兩天寫了一些關於怎麼騙過已經訓練完畢的機器學習系統/類神經網路系統，包括對圖片加入一些人眼無法察覺的雜訊（Noise）讓分類器分錯。今天這篇paper[1]則是在講怎麼樣最小化這樣的雜訊，小到可以只更動一個或少數幾個像素，就達到騙過分類器的目的。

在找paper相關資料的過程中發現一個很有趣的Youtuber叫做Two Minutes Papers [2]。它把很多經典的論文用兩分鐘摘要講完，有點像是論文界的谷阿莫…？只是應該不會有論文作者告它抄襲就是了。它的關於這篇論文的兩分鐘摘要已經將論文的精神講得很清楚，當作額外的參考資料還不錯用。

Awesome AI Security [3]就不用多說，我在寫反制機器學習系列的主要資料來源就是它。裡面收錄了很多經典的論文、TechTalk、論文作者開源的程式碼Github、甚至還有寫好的攻擊框架與library。我還沒有真的下去跑過它給的code，但是如果沒意外的話，看起來應該是可以很輕易的自己製作一套攻擊軟體去真的嘗試攻擊一個線上的機器學習系統（或是自己架一個啦）。

閒話結束，接下來條列式筆記。

一張圖解釋什麼是One Pixel Attack

然後我關心的是它的成果與適用範圍（侷限）：

Being able to launch non-targeted attacks by only modifying 1, 3 and 5 pixels, with the success rates of 73.8%, 82.0% and 87.3% respectively and 98.7% probability label of target classes on average.
Requiring only black-box feedback (probability labels) but no inner information of target DNN such as gradients and network structure. Our method is also simpler since it does not abstract the problem of searching perturbation to any explicit target functions to solve but directly focus on improving the probability label values of the target classes.
Can attack a broader classes of DNNs (e.g. networks that are not differentiable or when the gradient calculation is difficult).

乍看之下成功率不太高（真正改動1-Pixel的攻擊只有73.8%成功率），但是考量到改動幅度，依然算是很驚人的成果。