# Backdoors: Definition, Deniability and Detection

## [Research in Attacks, Intrusions, and Defenses]21st International Symposium, RAID 2018 Heraklion, Crete, Greece, September 10–12, 2018 Proceedings.

### 先備知識

#### FSM ( Finite-State Machine )

FSM有限狀態自動機，簡稱狀態機，是表示有限個狀態(states)以及在這些狀態之間的轉移(transition)和動作(actions)等行為的數學模型。

FSM大致上可以分成：

• Acceptors (recognizers/sequence detectors)接收器:
會產生binary output(accepting or not accepting)。
用數學模型表示：a quintuple θ = (S,i,F,Σ,δ)
• Transducers變換器
根據inpu或使用動作的狀態來產生output。
可以被分成Moore machine和Mealy machine。通常控制應用。
用數學模型表示：a sextuple θ =(Σ,Γ,S,S0,δ,ω)

#### CFG

Control flow graph(CFG): graph notation to represent paths traversed through a program during its execution.

#### ROP

Return-Oriented Programming(ROP)：也算是buffer overflow的一種，允許在安全防禦下的情況執行程式碼，以此控制程式流程。

#### Reference

1. Reference [15]
-
Thomas F. Dullien
[Weird machines, exploitability, and provable unexploitability ]
19 December 2017
IEEE Transactions on Emerging Topics in Computing

這篇論文部分內容是說明：作者提供了exploit、weird machine兩者清楚的定義，也解釋了weird machine怎樣的情況會導致exploit。
與本篇backdoor論文中有相關性的內容在第一章節：
1 THE INTENDED FINITE-STATE MACHINE (IFSM)
1.1 Software as emulators for the IFSM
Since any real-world software can be modelled as an IFSM, but has to execute on a real-world general-purpose machine, an emulator for the IFSM needs to be constructed.”
這段說明了emulator的需求。
至於為什麼不是研究software本身，而是反而去研究emulator of software呢？
根據 bug or security vulnerability 定義：
When the security issue arises from a software flaw, it is impossible to even define ’flaw’ without taking into account what a bug-free version of the software would have been.
所以把software作為一個有潛在缺陷的IFSM emulator，在state-space sense上，缺陷可以被放大觀察。
也就是說，如果能夠觀察出IFSM有什麼問題，那real-world software應該也有相對應的漏洞。
以上是針對軟體漏洞，至於硬體漏洞可以看
-
Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu.
Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors.
June 2014
SIGARCH Comput. Archit. News 42, 3 (June 2014), 361–372.

-
2. Reference [14]
-
Dennis Andriesse, Herbert Bos
Instruction-Level Steganography for Covert Trigger-Based Malware 2014
Detection of Intrusions and Malware, and Vulnerability Assessment(DIMVA), pp 41–50

-
我們要先知道，trigger-based software會一直保持休眠直到trigger發生。
這篇是在說明：以往的惡意程式碼都是在benign host binaries中很少運行的路徑，所以就比較難被檢查到。然而近年來自動後門檢測讓惡意程式碼不能在這樣藏了，所以這篇論文為trigger-based software找出一種新的隱藏方法：把惡意程式碼塞在spurious code fragments中。
用這樣的方式，反編譯和靜態分析都無法偵測到惡意程式碼，只有正確的trigger才會跳到hidden code。
Abstract是這樣說的：
“…implement stealthy control transfers to the hidden code by crafting trigger-dependent bugs, which jump to the hidden code only if provided with the correct trigger.”
所以沒跳到hidden code都不會被偵測出來。
而此篇作者以實作證明可行性：利用Nginx HTTP server module製作隱藏的後門。
大家可以先多多去了解關於Nginx以及它的後門實作

### Abstract

• Detecting backdoor is difficult task because…
1. The lack of automated tooling.
2. Detected by labourious manual analysis.
3. No concrete or rigorous definition.
• This paper will…
1. Provide a definition about backdoor, backdoor detection, and backdoor deniability.
2. Present a framework to decompose a backdoor through 4 components.
3. Show how current backdoor detection methodologies are

### Introduction

• The potential presence of backdoors from third-parties.
這裡的第三方為「部署軟硬體設備」者。
根據研究，第三方可能為競爭者[3]，或是消費者的設備製造商[2,5]。其中後者會帶來backdoor的原因可能為： accidental, left-over ”debug” functionality, 缺乏使用者驗證的software configuration updates[5], or 當設備製造商只負責韌體，找來的third-party software有夾帶了前三項的問題。
而其中configuration是一個相當重要的議題，Configuration File(組態設定檔：儲存相關設定的檔案）常常是發生資安問題的疑慮。
傳統漏洞會表現很奇怪且非預期的program state。
backdoor應該是呈現出明確的(explicit)、有意的(intentional)、且基本上看起來是normal program functionality。
🔅 接下來文章內會強調backdoor是否由explicit component組成。
🔅 正常的程式在此篇稱作normal；反之為abnormal。
• “Backdoor” is hard to define as it…
— Take many forms.
所以沒辦法用歸納(generalized)的方式，但不能定義變成無法定義detection methodology。
— The forms such as a hardware component, a dedicated program or a malicious program fragment.
— Is too complex to be the sheer lack of real-world samples.
• Documented real-world backdoors are simplistic for analyzing static input data and have been studied in the literature[19.21]
這裡simplistic是相較於無法被記錄的複雜sample。
static input data 指通常是放入trigger function(來看看是否能觸發發生backdoor的條件 e.g. hard-coded credentials）的值。
🔅 hard-coded credentials代表 程式碼寫死的驗證機制
• This work focus on…
— Software-based backdoors primarily.
— The technical aspects of a backdoor-like functionality. 作者想要強調沒有政治色彩吧哈哈
• Both academic and real-world backdoors expressed in terms of definitions can be reasoned their deniability and detectability.
使用本篇對於backdoor的定義，無論backdoor複不複雜(academic v.s. rea-world)，都可以確定backdoor detection methodology的好壞和backdoor deniability(是否可以判斷是否為backdoor)。

### Preliminaries

• Platform
the highest level of abstraction for a device that a given backdoor targets.
帶有backdoor的裝置，我們將它抽象化作Platform。
• System
the highest level of abstraction required to model a given backdoor, within a platform.
在Platform中，對於backdoor的模型，我們將它抽象化作System。
那針對不同程度的抽象化，這個Platform可能是：a dedicated program, a hardware component, embedded as part of another program.

E.g. 帶有backdoor的router，我們抽象化為platform。而我們發現backdoor就在該router的software，所以將a dedicated program抽象化為system，並以FSM model起來（是highest level of system)。而OS中的process也可以被FSM model為arbitrary levels of FSMs。

🔅 接下來有提到這兩個詞將用斜體字來表示，提醒各位在此篇這些詞有特殊的定義。

#### FSM

1. Developer FSM(DFSM)
developer’s view of the system
2. Actual FSM(AFSM)
a real manifestation of the system
3. Expected FSM(EFSM)
the end-user’s expectations of the system
4. Reverse-engineered FSM(RFSM)
a refinement of the EFSM obtained by reverse-engineering the actual system

• S: the set of its states
state是system中的特定的functionality。
• i: its initial state
• F: the set of its final states
• Σ: the set of its state transition conditions
• δ: its state transitions and the transition labelling function is like S × Σ → S

#### An emulator for the AFSM

• Derive RFSM(θR) — by reverse-engineering the real system — by making perceptions and observations of its concrete implementation.
由程式分析師觀察出states和transitions，產出RFSM。
這些states和transitions都是在platform中，例如：CPU的states。而目標Actual system是software，所以emulator從program binary中看。
• As the emulator for AFSM (θA)
RFSM將會作為emulator of AFSM。
• Map concrete states and transitions of the platform, to the level of abstraction modelled by the states and transitions of their FSM.
對應FSM和platform的states和transitions。
• The granularity of the θR is dependent upon how a system is analysed.
E.g.
use IDA Pro and perceive S and δ between those states of the real system.
use a debugger tool observe S and δ of the real system.

### Definition

• A distinguishing feature of all backdoors is its trigger mechanism — a pivotal component.
• Another component is account for the satisfaction of the trigger condition: input source.
• The eventual system state upon trigger activation is considered the backdoor-activated state, a privileged state.
• An intermediate component that facilitates the transition from the normal system state to the backdoor-activated state is payload.

Definition 1 Backdoor.
An intentional construct contained within a system that serves to compromise its expected security by facilitating access to otherwise privileged functionality or information. Its implementation is identifiable by its decomposition into four components: input source, trigger, payload, and privileged state, and the intention of that implementation is reflected in its complete or partial (e.g., in the case of bug-based backdoors) presence within the DFSM and AFSM, but not the EFSM of the system containing it.

Definition 1 用自己的話來說，重點有幾個：

1. An intentional construct compromise system expected security by escalating privileges.
2. Its implementation must be identifiable by its decomposition into four components: input source, trigger, payload, and privileged state
3. Backdoor is known to the system’s implementer (presence within DFSM & AFSM) and unknown to its end-user (no presence within EFSM).
RFSM就是看有沒有分析出來。

Backdoor的四個元件都可以被拿去建模成FSM，其中有兩個相關的FSMs：

1. θtrigger: 代表trigger
( Fpayload: a set of possible privileged states)

Definition 2 Backdoor Detection.
A backdoor is detected by obtaining:
Within θR, the states and transitions of both the trigger and payload must exist:
The privileged states reachable as a result of the payload are either final states of θR, or states that can be transitioned from to some state of θR:
The payload must be reachable from the trigger, and there must exist a transition to the trigger within θR:

Definition 2 用自己的話來說，上面順序可以等同如下：

1. The states and transitions within both the trigger and payload of FSM must exist within θR
2. As a result of the payload is privileged state or some states from states of θR
3. The payload must be reachable from the trigger
4. There must exist a transition to the trigger within θR

### A framework for modelling backdoors

• Open source: 分析the source code version control logs
control logs可能會指出每次刪除了什麼、變更了什麼，進而知道states和transitions有什麼改變。
• Closed-source software: 分析the differences between software versions

• A construct consisting of an input source, trigger, payload, and privileged state would be part of the DFSM of the system.
最終目標是DFSM，DFSM也都會含有那四個元件。
• Functionality modelled by EFSM or RFSM express what they have learnt about the system.
通常是藉由RFSM來找出system中的states和transitions。
end-user用EFSM。
• To discover a backdoor through analysis of the emulator for the AFSM.
然後透過分析emulator和RFSM(post-anlaysis)，來找找backdoor。
• If backdoor present in the emulator of AFSM, then there is in AFSM and might be found in RFSM(or EFSM).
如果emulator of AFSM有，那基本上AFSM就會有，所以RFSM也有機會被找到(如果你有能力找到)。

1. Discovered(not newly created)：也就是Explicited。
➤ Exist in AFSM.
➤ Explicit states and transitions will always exist within the backdoor implementer’s DFSM.
2. Created：Non-explicited。
➤ might not in AFSM

RFSM models a program. The new states and state transitions that are added to it after analysing.
Basic blocks and branches that are explicitly part of the program’s code are be part of AFSM & DFSM.
Some shellcodes that are not explicit are in a sense weird states and state transitions.

#### Input source

• Trigger function takes at least one parameter, which is input source.

( It doesn’t cause the activation of the backdoor trigger.)
e.g. input source有可能的型態：
string input by attacker wishing to activate the trigger
system clock that during a specific time period the trigger becomes active.
• It decide which state transition made as a result of executing the function.

#### Trigger mechanism

• Core concept: the collection of checks is as a single function to decide if payload execute or not.
因為在現實的情況裡，有multiple branch conditions and execution of multiple basic blocks。
• The way FSM transitions satisfying the trigger conditions to the payload is model with 2 cases:
可以等等回過頭來看這個表格，先往下一點的Visual cases觀察。
表格解釋：
第一格[State transitions is]：看圖中的transitions是否明確、明顯。
第二格[Trigger is added to the RFSM by states and state transitions satisfying the backdoor trigger conditions are]：因為發現trigger了，將trigger後走到next state時，所經過的states和state transitions加入RFSM，並判斷它們是否明顯。
第三格[Trnasitions of RFSM are]：上述的state和state transitions要加進RFSM時，看看原本是否就在RFSM了（取決於原本是否為explicit)，如果是的話，那此transitions就是discovered transitions；反之為created transitions。
• Visual cases:

#1
Within valid CFG, a transition to the payload constitutes normal control-flow.
➤ 只要不是bug-based system，在本篇都當作normal
➤ 由圖可知，頂端的state為一開始的trigger function。當滿足了trigger condition，無論是否activate backdoor，從圖中很明確地就知道可以走到next state。

# 2
Within a program bug allowing control-flow hijacking constitutes abnormal control-flow.
➤ 再提一次，此為bug-based system，也就是圖中含有non-explicit states and state transitions，所以為abnormal control-flow。
➤ 這個圖的問題是出在一開始的認證就是漏洞(strcpy那行會產生buffer overflow的問題)了，所以不是明顯的trigger function。但是滿足trigger condition的還是明顯的state和state transitions，是前往payload的那條transition，要經過分析才知道，所以為non-explicit。

# 3 ( a more complex example)
➤ Backdoor trigger rely both on explicit checks and a bug.
➤ Explicit check: a hard-coded credential check.
— False: execute standard authentication routine

• Payload is as a solution how to reach a privileged state from satisfying the conditions of trigger.
• In practice, a payload component can take many forms by how to be modelled as part of a RFSM
一樣，等等再來看這個表格，先往下一點的Visual cases觀察。
表格解釋：
第二格[The creation of new states and transitions]：如果第一格提到的transition不明顯，那允不允許新增新的states或state transitions呢？（明顯的話當然就直接是discovered states and state transitions了)
第三格[Payload added to the RFSM through states and transitions facilitating to privileged state are]：上述的state和state transitions要加進RFSM時，看看原本是否就在RFSM了（取決於原本是否為explicit)，如果是的話，那此transitions就是discovered state and discovered state transitions；反之為created transitions。
第四格[States and transitions is contained in the backdoor implementer’s DFSM or not]：如果上述的state和state transitions是discovered，那就都會存在於DFSM哦！但是case#2比較特別一點，在下面會解釋。
• Visual cases:

# 1
➤ state1是trigger condition(也就是hard-coded credential check，如範例程式碼中的strcmp(user._name, “backdoor”)==0)，如果滿足了便會進到state2得到的admin權限進行了第一次的提權，接著open shell進入了state4（也就是privileged state）。
➤ 這時候的privileged state就如同an undocumented backdoor shell（例如：輸入特定的使用者名稱，攻擊者就可以執行額外的功能）。

# 2
Explicit transition to payload with both explicit and non-explicit components.
➤ Trigger condition為state1，確定使用者在接下來要不要特殊的路徑（以抵達privileged state)。
🔅If Yes: trigger transition will send the request(req._data) used as input to an interpreter.
➤ 在state2, 透過interpreter去執行(run)，privileged state就會動態的建構出來。因為要經過run，可知states和transitions當初在DFSM中是沒有的出現的，現在發現了要把它們加進RFSM。

# 3
Non-explicit transition to payload, where payload has both explicit and non-explicit components.
➤ Trigger mechanism是bug-based system.
➤ 在表格中[The creation of new states and transitions]，由於state皆是明顯的，沒有新增new state的問題；但transitions有不明顯者，所以要新增。
➤ 正常情況下圖中的三個system並不互相影響，在DFSM中是始料未及的，所以新的states和transitions(此範例只有新的transitions）不存在於DFSM，但因為發現了所以把他們加進RFSM。

A backdoor payload composed solely of a state transition.
➤ 把state1叫做trapdoor，因為trapdoor 會允許攻擊者繞過複雜的使用者驗證。（而非直接滿足trigger condition）
➤ The form of the payload is identical for states and transitions before.

• 先討論使用bug-based trigger mechanism的問題：
因為trigger很簡單，他沒辦法讓implementer確定backdoor到最後要怎麼應用，而且trigger control can be regained.
🔅“Control is regained” means limiting the computational freedom of newly created states.
— Reusing component.
E.g. for a program, from static analysis methods or code fragments executed in sequence upon the backdoor being triggered.
🔅“code fragment” is embedded and distributed throughout a binary.
— From attacker controlled data.
controlled data就像成功攻陷buffer overflow的shellcode，本身不存在在程式的元件中。

#### Privileged state

— explicit:可能或不可能在normal system的情況下獲得權限。
— non-explicit:不可能在normal system的情況下獲得權限。

1. Under normal system execution

E.g.

2. Only be reached guarded by activation of the backdoor

➤ 當privileged state明顯

privileged state manifests as an undocumented backdoor shell

➤ 當privileged state不明顯

legitimate user無法使用。

### Detection and Deniability

#### 關於Backdoor detection

• Transition from trigger to payload is explicit.
➤ 例如：我們可以發現trigger condition ( E.g. hard-coded credential check)，以此確定intent is explicit。
• Transition is non-explicit(bug-based).
可以利用software的control logs去判斷backdoor會在什麼地方植入、或是用binary software的versions去判斷是否每一次的變動都有legitimate reason，來判斷backdoor的置入。
➤ 例如：code fragment今天只做提權這件事，且在normal program中無法被接觸到。（在下一段有提供一個大Case study）

#### 關於Backdoor deniability

Definition 3 Intentional backdoor.
Those constructs that can be unambiguously identified as backdoors: the transition from their trigger satisfaction to their payload is explicit. Will be present in the DFSM, AFSM, and if found, the RFSM, but not the EFSM.

• The construct is explicitly identified as backdoor.
• It can present in the DFSM, AFSM, RFSM because transitions to payload is explicit.
Definition 4 Deniable backdoor.
Those constructs that fall into a grey area, where the transition from their trigger satisfaction to their payload is non-explicit (i.e., it appears to be a bug), but from a non-technical perspective can be argued to be intentional. Will be present in the AFSM, if found, the RFSM, but not the EFSM; we cannot definitively tell if it is in the DFSM.

• The construct is bug-based and is intentional by non-technical perspective.
• It can present in the AFSM and RFSM, but not definitively in DFSM.
Definition 5 Accidental vulnerability.
Those constructs where there is no evidence — technical, or otherwise — to suggest any intent, and the transition from their trigger satisfaction to their payload is non-explicit. Will be present in the AFSM, and if found, the RFSM, but not the DFSM or EFSM.

• The construct has no evidence to identify as backdoor.
• It can present in the AFSM, RFSM.

### Case Study

Definition 4這類的backdoor處於灰色地帶、比較複雜且難以deny，所以以下將討論關於Definition 4的Case Study。

🔅這個input source是一個network socket，只要放進malformed HTTP packet就可以觸發。

• 利用symbolic execution找出bug-based trigger condition
• 利用misaligned instruction sequences的原理，掃出其他的instruction sequences。藉由找出其他的instruction sequences來看看能不能進行提權，則這些instruction sequences就為payload

framework不只可以克服source code，還可以評斷backdoor detection methodology的好壞ㄛ。

### Current backdoor detection methodologies

#### Firmalice

is designed to detect authentication bypass vulnerabilities.

• It can detect a privileged state by modification of the input security policy.
• Problem: It require the same amount of manual analysis to detect the entire backdoor as it would to identify the privileged state.也就是說，因為沒有payload的觀念，所以必須一直手動更改input source，才能看看privileged state的改變。

#### HumIDIFy

aims to detect if a program can execute functionality it should never execute under normal circumstances.

• It does not consider the notion of a trigger.
• Problem: Only detected when program is performed by a legitimate user and behavior that is anomalous。

#### Stringer

attempts to detect static data used as program.

• It uses a scoring metric to rank static data.
• It uses heuristics for identifying payload-like constructs.
• Problem: It is unable to meaningfully score data that leads to states that are actually privileged higher than those that are not.

#### Weasel

detects both authentication bypass vulnerabilities and undocumented commands in server-like program binaries.

• It is assumed to reveal all deciders and handlers when processed.
• Problem: For instance, Tenda web- server backdoor(可以去原文看Table1.的案例解釋). It will be unable to detect such a backdoor due to using a separate input source from the standard input to the program.因為後門用戶達到的privileged state與合法用戶達到的privileged state不同，但沒有input source的觀念所以不能分辨用戶。

### Future Work

This paper does not intent to provide a direct means to detect backdoors, rather it serves as a general means to decompose backdoors in an abstract way.

• The deficiencies in those methods due to them not fully capturing the rigorous definition of a backdoor.
• A backdoor detection methodology based upon our proposed framework would be a natural extension of this work.（都把問題丟給別人做欸）
• A deliberate side-channel vulnerability would prove difficult to model using our FSM-based abstraction; we view this as an additional area for investigation.

### Conclusion

This paper provides

• Definitions: backdoor, backdoor detection, deniable backdoors.
• Means to discern: intentional backdoors and accidental vulnerabilities.
• Framework serves as a basis for identifying backdoor-like construct and the reasoning about detection.

### Discussion

1. 我覺得本篇沒有實作「如何從程式碼中(或是difference of version control logs)轉變成CFG並判斷FSM」，很可惜。雖然他有在Nginx case study的部分有大概說明如何找出trigger condition，但卻沒有針對這部份多做更多解釋。
2. 其實不只是針對backdoor detection，我們也可以從此篇得到靈感：在偵測無論何種惡意程式時，都可以試以FSM來釐清自己的概念。

Emily Tseng