From Natural to Medical Image — Why Transfer Learning works

Photo by Robina Weermeijer on Unsplash

Transfer learning is a common technique to transfer knowledge from one domain to another. It might be easy to understand why using knowledge from one dataset to a different dataset with similar characteristics might work. But what makes transfer learning works for medical images? Medical images have totally different characteristics than natural images. Still, transfer learning is a popular approach for medical applications.

Why does it work so well? A recent paper, published in CVPR 2022, “What Makes Transfer Learning Work For Medical Images: Feature Reuse & Other Factors” aims to answer this question. The paper’s central question is: what factors determine if transferred representations are effective in the medical domain. In this blog post, I will summarize the paper's main findings.

The authors made a series of experiments to study the role of feature reuse and the effectiveness of transfer learning as a function of dataset size, distance from a source domain, model capacity, and model’s inductive bias. All transferred from ImageNet.

The results are not surprising, but it's much appreciated to back up the intuition with experimental results.

The main findings of the paper are demonstrated in the following figure:

Image by the paper’s authors. Factors affecting the utility of transfer learning from ImageNet to medical domains. The size of each dot represents the relative increase in performance achieved transferring weights from IMAGENET compared to random initialization. The color of the dot indicates how much of the gain can be attributed to feature reuse.

The benefit from transfer learning increases with:

  • Reduced data size
  • Smaller distance between the source and target [1]
  • Models with fewer inductive biases [2] (ViT vs CNNs)
  • Models with more capacity, to a lesser extent.

For small datasets, transfer learning shows a noteworthy gain for all models. However, the strength of the gains and the importance of feature reuse [3] depend on the inductive biases of the model and the distance between the domains.

For large datasets and ones that poorly resemble ImageNet, the gain from transfer learning is insignificant. ViTS appear to benefit far more from feature reuse than CNNs in such cases.

Feature reuse + Inductive Bias

  • When transfer learning works well, the evidence for feature reuse is strongest.
  • Models with less inductive bias rely more heavily on feature reuse.
  • The pattern of feature reuse changes in models with less inductive bias. Specifically, feature reuse in ViTs is concentrated in early layers, whereas CNNs reuse features more consistently throughout the network.


Although medical datasets might have significantly different characteristics from natural images, they can still benefit from transfer learning. The described paper shows that for small datasets, which is a common scenario in the medical field, the gain from transfer learning is higher. The authors also elaborate on which architectures will gain more, and analyze the transferred representations layer-wise. I recommend you to read the full paper for more information.


[1] Distance from the source domain — The distance from the source domain, ImageNet in this case, is measured using Frechet Inception Distance which compares the distributions of two datasets.

[2] Inductive Bias — Learning algorithms mostly use some mechanisms or assumptions by putting some restrictions on the space of hypotheses. This mechanism is known as Inductive Bias or Learning Bias. This mechanism encourages the learning algorithms to prioritize solutions with specific properties. Inductive bias is a set of implicit or explicit assumptions the algorithm makes to generalize a set of training data. (link)

[3] Feature reuse — The feature reuse hypothesis assumes that weights learned in the source domain yield features that can readily be used in the target domain. In practice, this means that weights learned on ImageNet provide useful features in the target domain, and do not change substantially during fine-tuning despite differences between the domains.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store