Summer Research Experiences: Using AI to Decode the Genome

AI4ALL Team
AI4ALL
Published in
5 min readSep 26, 2018
Jocelin and her fellow AI4ALL alumni at Stanford AI4ALL 2016 // Photo credit: Lauren Yang/Stanford AI4ALL

Guest post by Jocelin Su, Stanford AI4ALL ‘16

AI4ALL Editor’s note: Meet Jocelin Su, a 2016 Stanford AI4ALL alumna and current high school student. In this blog post, she shares her experience pursuing a summer research position in computational genetics where she had the opportunity to contribute to research on decoding DNA. Read on to learn about the specifics of her research and get her tips for pursuing research opportunities in high school.

Autonomous robots and computer vision and natural language processing? Decoding the genome to save human lives? After witnessing AI’s capabilities at Stanford AI4ALL my freshman year, I was hooked, and left certain I wanted to pursue it further. This summer, I had my chance — I applied to do research in computational genetics at Stanford University. The experience, my first at a research lab, was spectacular.

What I worked on

DNA is the code of our body, and gene expression is the process where information from the DNA sequence is used to build proteins for the body. In the first step of gene expression, RNA is replicated from DNA with the help of RNA polymerase. The process of replication is affected by proteins called transcription factors (TFs) which bind to certain preferred patterns of DNA called motifs. You can learn more about this process here.

One vital area of study is finding accurate, consolidated motifs for TFs, which is what an algorithm developed at the lab I worked in seeks to do. You may be wondering how this works. First, given a particular TF, a neural network is trained to learn the association between an input DNA sequence and the output label of whether or not the TF bound to the sequence. The reasoning is that if a DNA sequence is classified as positive for TF binding, it is likely to contain 1 or more motifs.

Next, consider positively-classified input sequences. For these, the algorithm finds the features of the input sequence that were most important towards the model’s classification of TF binding, because these important features could be part of motifs. To do this, the algorithm uses an existing method of computing importance scores.

Finally, the algorithm tries to find clusters of important features and then aggregates the features into motifs. Further testing and comparisons of this algorithm are underway, but its preliminary performance seems quite good.

Stanford University, where Jocelin spent the summer doing computational genomics research

The summer research experience

Over the course of the summer, I spent several hours a day at the lab. I worked closely with my mentor, who guided my research and gave me indispensable support. I can’t thank her enough for patiently answering all my questions, helping me through challenges, and being an awesome and encouraging person. In addition to lab work, I regularly attended group meetings, the journal club, and reading club, which were opportunities to hear lab members discuss papers and projects.

The most challenging component of the internship was catching up on knowledge. As a high schooler, on top of immersing myself in the specifics of my project, I had to simultaneously fill in the gaps in my biology, math, and programming skills. Fortunately, I — and you — have the ability to ask questions! Your mentor and other lab members are wonderful resources for help (as well as Google), and they made the learning experience more comfortable.

Advice for high schoolers considering research

Overall, research can be demanding; it is tough to read papers full of complex technical details and process what goes on in lab presentations. But even if you don’t understand everything, that’s okay. Research is a unique experience, where you can be exposed to so much real material, embrace your thirst for knowledge, and be constantly challenged. You realize there is so much in the world to learn about, and that’s a wonderful thing — not to mention having the great feeling of contributing to scientific knowledge. My summer experience made me seriously consider the possibility of doing research in the future, and I’m excited to continue during the school year.

If you are a high schooler interested in doing research, that’s awesome! Of course, getting the opportunity to do an internship is a challenge in and of itself. Here are some tips for getting a research internship:

  • First, remember to go for the subject or project that you’re really interested in, since you will spend weeks working in that area. If you do get an internship, even if you don’t seem to be given options, try to suggest an alternate idea or method you want to work in.
  • Apply for existing research programs for high schoolers. Well-known ones include the Simons Science Research Program, Stanford Institute of Medicine Research program, and Research Science Institute.
  • Alternately, look into interning at a lab at colleges or universities near you. Email professors who do work in the areas you are interested in, provide your resume and cover letter, and demonstrate your interest. (Online guides for writing these types of emails can help!)

Good luck!

About Jocelin

Jocelin Su is a senior at Evergreen Valley High School in California and is an avid enthusiast of math and computer science. After attending Stanford AI4ALL in 2016, she became committed CS outreach and founded the She.codes workshop program for middle school girls. In addition to her STEM interests, Jocelin enjoys art and reading good books.

--

--

AI4ALL Team
AI4ALL

AI4ALL is a US nonprofit working to increase diversity and inclusion in artificial intelligence.