Deep Domain Adaptation
Domain adaptation solves a learning problem in a target domain by utilising the training data in a different but related source domain. This concept can be used in real life problems like finding labels of different domain samples, etc. Intuitively, discovering a good feature representation across domains is crucial. The main idea behind this is to use knowledge from source data to classify target data. The main technical difficulty of domain adaptation is to formally reduce the distribution discrepancy across different domains. We can apply standard Deep learning methods to train classifier models in the source domain for use in the target domain considering Domain shift between two domains.

Introduction
This method is Divergence based domain Adaptation . The idea is that, after aligning two domains, the feature space will be almost aligned. But considering that there can be some domain shift (as shown in figure), feeding images directly through the network after domain aligning is not going to be very good idea, we can do that more precisely. As most of the features will be common in target and source domain, implies source domain classifier can classify correctly most of the target sample. So keeping a

threshold for high class prediction (nearly 0.98, according to scenario) will work to classify only highly matching (regarding features) samples, which will act as a training data-set along with the actual training (source) data-set in next iterations of training. This method dramatically minimises the distance between domain distributions by adding more features from target samples for further training in training data-set. The effectiveness and efficiency of the approach has been verified by experiments on Office-31 data-set.

We can use same network for training as mentioned in paper Improved Open set Domain Adaptation with Back propagation . The loss function will be such that to minimise the domain shift between the two domains using KL-Divergence and to minimise the cross-entropy loss of training data-set (source). The network (DANN) will be consist of two losses, the classification loss and the domain divergence loss. It contains a gradient reversal layer to match the feature distributions. While training Domain divergence will be minimised for all target and source samples and classification loss will be minimised for source samples.
The main difference is that we are training the network not just one time but more than that, say 4–5. For 1st iteration the classification loss will be just on source data-set but for further iterations the classification loss will also contain the samples from target data-set which had a higher classification class probability than the threshold after testing on previously(last iteration) trained model. But it will not be a good idea for large data-set but we can use it on small data-sets like Office-31, IPC.

If we have k no. of classes in source data-set then, while training for Closed set we will have k no. of classes for classification. For partial set that will be same as close set but in case of open set the no. of classes will be k+1. Last class will be for unknown data-set.
