Use fastai and image_tabular to integrate image and tabular data for deep learning and train a joint model using the integrated data

Image for post
Image for post
Photo by Kevin Ku on Unsplash. Icons designed by ibrandify.

I recently participated in the SIIM-ISIC Melanoma Classification competition on Kaggle. In this competition, participants are asked to identify melanoma in images of skin lesions. Interestingly, they also provide metadata about the patient and the anatomic site in addition to the image. In essence, we have both image and structured or tabular data for each example. For the image, we can use a CNN-based model, and for the tabular data, we can use embeddings and fully connected layers as explored in my previous posts on UFC and League of Legends predictions. It is easy to build two separate models for…

Image for post
Image for post
Donald Cerrone (Image source: MMA Junkie)

I recently watched the fantastic UFC Decade in Review series, which summarized some of the most memorable moments that happened in the UFC every year from 2010 to 2020. Inspired by those videos, I decided to quickly review some UFC data accumulated over the last decade.

The Number of UFC Fights Hit New High In 2019

UFC increased the number of fights dramatically in 2014 and held over 500 fights in 2019, which is twice as many as a decade ago.

Referee of the Decade

Herb Dean refereed almost 600 fights over the last decade, which is twice as many as what the runner-up refereed.

Hardest-working Fighter of the Decade

Donald Cerrone had 33 fights over the…

Image for post
Image for post
Image generated using Seurat

Single-cell RNA sequencing (scRNA-seq) has offered a comprehensive and unbiased approach to profile immune cells including T cells with a single-cell resolution using next‑generation sequencing. More recently, exciting technologies such as cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) have been developed to extend scRNA-seq by jointly measuring multiple molecular modalities such as proteome and transcriptome from the same cell. By utilizing antibodies that are conjugated to oligonucleotides, CITE-seq simultaneously generates sequencing-based readouts for surface protein expression along with gene expression.

Since gene and protein expressions convey distinct and complementary information about a cell, CITE-seq offers a unique opportunity…

Image for post
Image for post
Image source:

Microscopy images are widely used for the diagnosis of various diseases such as infections and cancers. Furthermore, they facilitate basic biomedical research that has been continuously generating new insights into the causes of human diseases. Therefore, microscopy images are of great importance in improving our health. However, obtaining high-quality in-focus microscopy images poses one of the biggest challenges in the field of microscopy. For example, certain tissues, such as lung and intestines, are uneven and can result in out-of-focus images. In this post, we will tackle this problem by using deep learning to refocus out-of-focus microscopy images. …

Image for post
Image for post
T cells attacking a cancer cell (Image source:

Our immune system protects us from various infectious diseases and cancers, and T cells are “foot soldiers” of the immune system and are essential for protective immunity. T cells recognize infected and cancer cells via the interactions between T cell receptors (TCRs) that are expressed on the surface of T cells and peptide-major histocompatibility complex (MHC) complexes that are expressed on the surface of target cells. The peptides that are presented by MHC molecules are termed T cell epitopes and are derived from pathogenic or tumor antigens. There are many tools available to predict T cell epitopes, i.e. which peptides…

Since the publication of my first post “Predict UFC Fights with Deep Learning”, I have received many requests for the datasets used in the project. Therefore, I have decided to open source the code for scraping the data in this follow-up post. Additionally, I will implement a neural network for UFC prediction in PyTorch.

Data collection

We will need two types of data to predict UFC fights. First, we need the information of each UFC bout such as the opponents and result. Second, we need UFC fighter records and statistics including their striking and grappling. Lucky for us, FightMetric hosts both types…

I have been teaching myself deep learning by following online courses including Andrew Ng’s Deep Learning Specialization and Practical Deep Learning For Coders taught by Jeremy Howard and Rachel Thomas. I wanted to build my own rig for deep learning and gaming (not gonna lie here) but was waiting for the new generation Nvidia RTX GPUs. Finally, the new cards are available, and I made it happen.


I used the most recently released Intel i5 9600K CPU and RTX 2070 GPU. Although there is a lot a debate on how much improvement they offer from their preceding generations, my philosophy…

Image for post
Image for post
Epitope recognition by T cell © Juan Gaertner

My colleague and friend Dr. Sandeep Kumar Dhanda recently published a paper where he used a neural network-based method called NNAlign to predict the ability of peptides to induce human CD4 T cell responses (termed immunogenicity). Notably, he achieved an average area under the ROC curve (AUC) score of 0.7 on 57 independent test sets.

Peptides, or epitopes, are short chains of amino acids derived from infectious pathogens, allergens, cancer and so on. CD4 epitopes are presented by major histocompatibility complex (MHC) class II molecules on antigen-presenting cells (APC) and recognized by CD4 T cells. …

Dr. Eric Topol is a physician-scientist and a pioneer in the field of digital medicine. He currently serves as Executive VP of The Scripps Research Institute (TSRI) in La Jolla, California. Dr. Topol is also the Founder and Director of the Scripps Translational Science Institute (STSI), which is about 3 miles from where I work. I read his book entitled “The Creative Destruction of Medicine: How the Digital Revolution Will Create Better Health Care” about a year and a half ago and have been following him on twitter since then. …

Image for post
Image for post

Pneumonia is lung inflammation caused by infection with virus, bacteria, fungi or other pathogens. According to National Institutes of Health (NIH), chest x ray is the best test for pneumonia diagnosis. However, reading x ray images can be tricky and requires domain expertise and experience. It would be nice if we can just ask a computer to read the images and tell us the results. In this story, we will use deep learning to train an AI algorithm that analyzes chest x ray images and detects pneumonia.

Convolutional Neural Network (CNN)

Convolutional neural network (CNN) is a class of deep…

Yuan Tian

Scientist, Programmer, and Photographer (Website:, LinkedIn:, Email:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store