How to crash the MICCAI conference in a Y-DATA style

Oleg Glybchenko
Yandex school of Data Science
4 min readOct 10, 2020

In the beginning of October the prestigious 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, or MICCAI 2020, was hold in Lima, Peru, but due to COVID-19, most of the participants had a Lima Zoom background, rather than the actual venue. Among many names of distinguished speakers coming from all over the world, representing the best research centers, top notch universities, like Stanford, Harvard Medical School, John Hopkins, companies like Google, Facebook and Amazon, an inquisitive reader might find the odd single line of “Independent Researcher”. In this short essay let me take you on a journey to discover who are these mysterious Ilia Kravets and Tal Heletz, who are set to crash the world renowned convention.

As you might already suspect these are not tenured professors or even PhD candidates, they are “simply” graduates of Y-DATA (Yandex School of Data Science) held in Tel Aviv! Long before turning into speakers in scientific forums, both Ilia (B.Sc. in Computer Science from the Technion) and Tal (B.Sc. and M.Sc. in Math from BIU) enrolled in the school not only to gain theoretical data science knowledge, but also to experience it practical applications via industry curated projects.

The original project definition as started by the trio of Ilia Kravets (Software and algorithms consultant), Tal Heletz (Deep learning researcher at Trigo) and Roman Gurevich (Senior Applied Researcher at Microsoft, working as a data scientist on the Microsoft Threat Protection product) was a challenging, but doable research in lung oncology, based on the datasets from a 2017 Kaggle and LUNA 16 competitions. The basic premise was to identify a lung nodule and assign it a probability of its having a benign or malignant origin. With the help of Shlomo Kashani from DeepOncology AI company, the project seemed to put our students on par with the high, but attainable standard in the industry, but hardly worth a conference stage.

After an extensive literature research and the dawning realization that the original object of binary classification could be accomplished via standard deep neural network methods, but that then it would suffer from the interpretability problem, the gang decided to gamble all-in on a greater research challenge. Would it be possible to make the network select similar cases of suspect nodules from previous history? That would make the ML prediction palatable not only to data science researchers, but also to radiologists who could examine and analyze the current case in comparison against all the potential previous medical cases in her and her colleagues’ careers. Such difficult questions in 3D imaging were not previously tackled by the industry as literature research indicated. After long hours, sleepless nights, and copious amounts of coffee, the aspiring researchers did not roll back to the original project and did not relent until they saw the converging gradients.

Ilia, Tal and Roman submitted the project at the end of their studies, and were happy to have their hard work recognised by winning the 2019 Y-DATA best project award. But the journey did not stop there — they were selected to present their findings in ACDL Data Science Summer School in Siena, Italy and there too won the best poster award for their contribution.

Finally, we are approaching the current leg in the journey, our student’s work was noticed by Professor Hayit Greenspan, the head of the Medical Image and Analysis Lab at the Biomedical Engineering Department, Faculty of Engineering, Tel Aviv University, who is an expert in content-based image retrieval algorithms. Her genuine interest and efforts pushed Ilia and Tal to further refine, then publish their findings and finally, apply for the MICCAI conference.

The paper they published together is titled “Nodule2vec: a 3D Deep Learning System for Pulmonary Nodule Retrieval Using Semantic Representation”. The team describes the implementation of what is known in the literature as a 3D Content Based Image Retrieval (CBIR). Probably the most known 2D CBIR system is Google search by image. However, instead of general purpose images used for the Google Image search, here the radiologist user uploads a lung CT scan to search for the similar medical cases. The search algorithm is based on the embedding vectors, the technique popularized by the Word2vec paper well known to ML practitioners and Y-DATA students. The evaluation showed clear improvements upon the prior publications. The authors also devised a novel evaluation technique enabling the fair comparison of the CBIR and the human performance and demonstrated that CBIR results agree with the doctors better than the doctors agree with themselves. It is stated at the summary part that the benefit to the radiologist end-user provided by new system is comparable to obtaining a second radiologist’s opinion.

Ilia Kravets sums up his experience: “When we started the project, we did not expect to end up where we are today — lecturing the community on the research findings. We just posed the right questions and worked tirelessly to achieve the scientific results.”

Tal Heletz highlights her takeaways from the journey: “My goal was to study and I learned the most from my two projects in Y-DATA. I was willing to work hard and take risks to master the field of data science. My motivation was doubled when we started to receive positive feedback. I could not do this alone without the help of my teammates, our mentors from DeepOncology AI, Prof. Hayit Greenspan and Y-DATA.”

The paper presented by Tal and Ilia at MICCAI 2020 can be found here on arxiv

--

--