Stepping into the Shoes of an AI Researcher in the AI4ALL Research Project Fellowship

Published in

AI4ALL

6 min readMay 17, 2018

Guest Post by Janice Yang, Stanford AI4ALL ‘17

AI4ALL Editor’s Note: Janice Yang participated in AI4ALL’s first AI Research Project Fellowship program, where she was paired with a mentor who works in AI to collaborate on an AI research project over the course of 3 months. She worked with another mentee, Irene Yang, to identify the severity of diabetic retinopathy using computer vision. Below she shares their research process and key results. Learn more about AI4ALL’s Research Project Fellowship Program and other mentee projects here.

Ever since I attended Stanford AI4ALL last summer, I wanted to delve deeper into the field of computer vision and start a research project of my own. I began learning about convolutional neural networks (CNNs) through online resources but soon felt intimidated by the sheer amount of new terms and complex mathematical equations. I still wasn’t sure where and how I could start the “real project” that I wanted to try.

By then, the AI4ALL AI Research Project Fellowship program had started, and my teammate, Irene, and I met our mentor, Dr. Andrea Frome, a professional in the computer vision field. Throughout the project, Andrea supported us with advice and explained many techniques AI researchers use when they face the same problems we were facing. It was her guidance and encouragement that made our project possible.

*A screenshot of the Slack channel (left), which was used to communicate about our project between our in-person meetings, and the training code in a Jupyter notebook (right).*

Our first task was to look at various datasets to get an idea of what we could work on. We were amazed by the amount of data publicly available at various websites, such as Kaggle, a Github repository of public AI datasets, and a list of public AI datasets.

After looking at an abundance of datasets, we determined that we wanted to work on a medical application of computer vision because we were interested in how AI can be used to diagnose diseases. Looking through Kaggle, we found a sizable dataset for Diabetic Retinopathy (DR) with over 90,000 images. Seeing this project as our learning opportunity, this dataset’s complexity made it the perfect fit for our project. DR can permanently damage one’s vision and early detection is crucial in slowing down the progress of the disease. Since the diagnostic process is manual and tedious, AI can play a critical role by providing instantaneous results and speeding up the treatment process.

Our project was to input a fundus image (an image of the retina), and output a class between 0 and 4, with 0 being no DR, and 4 being proliferative DR. An issue we encountered was that our data was imbalanced and noisy so only 75% was gradable.

Next, we decided to use Keras with a TensorFlow backend as our neural network library because it seemed like the best library for beginners. We were able to build our CNN model in less than 20 lines of code (tutorial). But after we trained the model with our images, we had an unsatisfactory result of 20% accuracy. Then, Andrea suggested looking at the confusion matrix, which compares what the model predicts versus the true label (bottom left). After doing so, we understood that our model was simply “guessing” everything as class 0, which was likely the result of feeding in unbalanced data (see the graph of our data distribution on the bottom right). Therefore, we decided to use (artificially) balanced data to train our model to prevent this “fake learning.”

On the left is an example of the confusion matrix of the “guessing model.” The graph on the right shows the uneven distribution of the training data, specifically, the high number of class 0 or “no-DR” images compared to class 1–4 images with varying degrees of DR present.

With our noisy dataset, an inevitable step was to pre-process the images. This involves denoising, proper scaling, enhancing color/features, and normalizing each image. Since we didn’t have enough training data for the model to learn across the widely varying conditions, preprocessing was critical for accentuating the information in the images that were relevant for learning the 0–4 class targets. To select the best option, we applied the three most effective pre-processing techniques: histogram equalization, CLAHE, and the Gaussian High Pass Filter (here, the OpenCV library was helpful. See the preprocessed images below). Then, the training process improved and we were able to select the Gaussian High Pass filter as the preferred technique for our dataset after analyzing the confusion matrices, pictured below.

The top row of photos are the original fundus image and the output image after each preprocessing technique. The bottom row contains the confusion matrices outputted by the model trained by each preprocessing technique.

*The table above demonstrates the accuracies of each technique used for the five classes. The blue box represents the highest level of accuracy in that class.*

During training, we encountered a severe overfitting problem. Overfitting occurs when a model knows the training data too well and cannot generalize. Therefore, it performs poorly on validation and test data (Graphs 1 and 2 below demonstrate an overfitting model). In order to prevent this, we used additional training data by implementing Keras’ data generators and implemented early stopping and dropout layers in the model. We also reduced the size of our neural network. These adjustments helped to lessen overfitting in our model (see Graphs 3 and 4). Overall, our final accuracy was 64.05%, which was reasonable considering that only 75% of our dataset was gradable.

*Graphs 1 and 2 demonstrate a severely overfitting model.*

*Graphs 3 and 4 demonstrate a model which doesn’t overfit.*

Looking back at the past three months of hard work, I am grateful to have had the opportunity to learn how AI research is conducted.

Countless discussions, correcting numerous bugs, and waiting for our model to train through the night gave me the opportunity to stand in the shoes of a real AI researcher. It truly was a life-changing and transformative experience for me.

I learned that although a task may initially seem intimidating, tackling small tasks and then transitioning to larger ones facilitates the smooth completion of a research project. I hope that sharing my experiences and my journey from a beginner in AI will encourage other students to try their own research project. Through AI4ALL, I have found my passion for this field and can’t wait to further explore the numerous ways this fascinating technology will revolutionize our future.

About Janice

Janice Yang is a 2017 Stanford AI4ALL alumna and a sophomore at Dougherty Valley High School. As president of the DVHS Girls Who Code club, she works to encourage high-school girls to pursue computer science. As co-founder of GAITEway, Janice gives underrepresented middle school girls an opportunity to learn about artificial intelligence and computer science in a supportive learning environment. Passionate about exploring AI and programming, she is also fascinated by mathematics and is a two-time American Invitational Mathematics Exam (AIME) qualifier. In her free time, she loves listening to music, playing the violin in her youth orchestra, and explaining science concepts to visitors as a volunteer at a local science museum.

Stepping into the Shoes of an AI Researcher in the AI4ALL Research Project Fellowship

About Janice

Written by AI4ALL Team