DeepDOT: Identifying Backdoor Attack on Neural Network

Triplex
Triplex
Nov 5 · 5 min read

“Your brain does not manufacture thoughts. Your thoughts shape neural networks.”

Neural networks, by design, lack transparency. This means that their internal functioning cannot be visualized directly by looking at the network itself. Deep neural networks are multi-layered models made up of neurons with thousands, possibly millions of inter-connected ‘synapses’.

This makes it susceptible to attacks, where certain ‘connections’ between the neurons (and the weights of these neurons) supersede normal functioning of the network to produce unexpected or unwanted classification results.

For example, an object recognition model with a backdoor always identifies the input as a ‘penguin’ if a specific symbol is present in the input.

This has serious security implications, especially today when many mainstream products in the market are now adopting neural networks. This includes self-driving cars and biometric authentication systems.

We discuss a system for detecting backdoor attacks in Deep Neural Networks.

Fig: An illustration of backdoor attack. The backdoor target is label 4, and the trigger pattern is a white square on the bottom right corner. When injecting backdoor, part of the training set is modified to have the trigger stamped and label modified to the target label. After training with the modified training set, the model will recognize samples with the trigger as the target label. Meanwhile, the model can still recognize the correct label for any sample without the trigger.

Part 1: Define Parameters

We define a few parameters that will be used in the detection process. These include the GPU device to use, a path to the model on which suspected attack has occurred and parameters for optimization of trigger (discussed below).

The comment after each line provides more details about the parameter.

DEVICE = '0'  # GPU Device to useDATA_DIR = base_path + 'data'  # dataset folderDATA_FILE = 'gtsrb_dataset_int.h5'  # dataset fileMODEL_DIR = base_path + 'models'  # model directoryMODEL_FILENAME = 'gtsrb_bottom_right_white_4_target_33.h5'  # model fileRESULT_DIR = base_path + 'results_gpu'  # directory for storing results# image filename template for visualization resultsIMG_FILENAME_TEMPLATE = 'gtsrb_visualize_%s_label_%d.png'# input sizeIMG_ROWS = 32IMG_COLS = 32IMG_COLOR = 3INPUT_SHAPE = (IMG_ROWS, IMG_COLS, IMG_COLOR)NUM_CLASSES = 43  # total number of classes in the model# (optional) infected target label, used for prioritizing label scanning# NOTE that this will only cause the label to be scanned in the beginning# and not defining this will have no effect on the final results.Y_TARGET = 33# preprocessing method for the task, GTSRB uses raw pixel intensitiesINTENSITY_RANGE = 'raw'# parameters for optimizationBATCH_SIZE = 32  # batch size used for optimizationLR = 0.1  # learning rateSTEPS = 500  # total optimization iterationsNB_SAMPLE = 500  # number of samples in each mini batchMINI_BATCH = NB_SAMPLE // BATCH_SIZE  # mini batch size used for early stopINIT_COST = 1e-3  # initial weight used for balancing two objectivesREGULARIZATION = 'l1'  # reg term to control the mask's normATTACK_SUCC_THRESHOLD = 0.99  # attack success threshold of the reversed attackPATIENCE = 5  # patience for adjusting weight, number of mini batchesCOST_MULTIPLIER = 2  # multiplier for auto-control of weight (COST)SAVE_LAST = False  # whether to save the last result or best resultEARLY_STOP = True  # whether to early stopEARLY_STOP_THRESHOLD = 1.0  # loss threshold for early stopEARLY_STOP_PATIENCE = 5 * PATIENCE  # patience for early stop# the following part is not used in our experiment# but our code implementation also supports super-pixel maskUPSAMPLE_SIZE = 1  # size of the super pixelMASK_SHAPE = np.ceil(np.array(INPUT_SHAPE[0:2], dtype=float) / UPSAMPLE_SIZE)MASK_SHAPE = MASK_SHAPE.astype(int)###############################      END PARAMETERS        ###############################

Part 2: Load Dataset

We load the dataset, store the X and Y test data in their respective variables, and return.

def load_dataset(data_file=('%s/%s' % (DATA_DIR, DATA_FILE))):      dataset = utils_backdoor.load_dataset(data_file, keys=                 ['X_test',  'Y_test'])      X_test = np.array(dataset['X_test'], dtype='float32')      Y_test = np.array(dataset['Y_test'], dtype='float32')      print('X_test shape %s' % str(X_test.shape))      print('Y_test shape %s' % str(Y_test.shape))      return X_test, Y_testdef build_data_loader(X, Y):      datagen = ImageDataGenerator()      generator = datagen.flow(      X, Y, batch_size=BATCH_SIZE)      return generator

Part 3: Visualize and Optimize Trigger

This is done in the following three steps (Wang, Bolun, et al. “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks.” Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks (2019): 0):

  1. For every given label, it is treated as a potential target of a backdoor attack. We use an optimization scheme (discussed later) to design a ‘minimal’ trigger that will cause all samples from other labels to be classified into the target label. For example, in a model with visual input, this would be the smallest collection of pixels that would lead to misclassification.
  2. This step is repeated for each label in the model. For N labels, N ‘minimal’ triggers will be generated.
  3. After generating the ‘minimal’ triggers for each label, the size of the triggers are measured (this is the number of pixels for an image, for example). We then perform outlier detection to check for triggers that are significantly smaller than others. A significant outlier will represent a real trigger, and this would be the target label of a backdoor attack.

To detect outliers, we use a simple technique based on Median Absolute Deviation, which is known to be resilient in the presence of multiple outliers.

We first calculate the absolute deviation between all data points and the median. The median of these absolute deviations is called MAD and provides a reliable measure of the dispersion of the distribution.

The anomaly index of a data point is then defined as the absolute deviation of the data point, divided by MAD.

Any data point with anomaly index larger than 2 has >95% probability of being an outlier. We mark any label with anomaly index larger than 2 as an outlier and infected, and only focus on outliers at the small end of the distribution.

Conclusion

Our research explains our reliable and general detection and mitigation methods against backdoor attacks on deep neural networks and validates them empirically. From our experiments, we find that the injection method of Trojan Attack typically creates more disruptions than necessary and causes unexpected changes in non-target neurons. This makes their triggers harder to reverse engineer and makes them more resistant to filtering and neuron pruning.

Finally, while our results are robust against a range of attacks in different applications, there are still limitations. We study five different counter-measures that specifically target different components/assumptions of our defence, but further exploration of other potential counter-measures remains part of future work.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade