Podcast Episode #2: Perspectives of Shannon Ellis, Academic Data Scientist
This article recaps the main takeaways of our second podcast episode with Shannon Ellis. Make sure to listen to the full podcast below or on Podbean. Follow us to stay tuned for more episodes!
Shannon Ellis is an Assistant Teaching Professor in the Cognitive Science Department, and has taught COGS 9 Introduction to Data Science, COGS 18 Introduction to Python, and COGS 108 Data Science in Practice. In this podcast episode, Shannon reflects on her experiences in academia, her projects, and how she transitioned from her biostats-focus at Johns Hopkins to her generalized teaching position at UCSD.
During Shannon’s undergraduate experience, “Data Science” wasn’t a word defined in her vocabulary — nor was the discipline offered at her undergraduate institution. Inspired by her high school biology teacher, Shannon sought out to study genetics at King’s College in Pennsylvania, where she got her first taste in research. From inputting data she spent years collecting into a software program, she became fascinated with the software and how it gave her immediate answers. This fascination carried Shannon into her graduate programs, where she learned to program and experiment with data analysis. By the end of her PhD, she was performing computational analysis on large datasets, and was exposed to Data Science in the genetics domain.
Deciding she wanted to go down a route of analyzing data and not dedicating the entirety of her career to “answering one question,” Shannon went on to pursue a Postdoc doing research at the Leek Group, a Data Science lab in the biostatistics department at Johns Hopkins. Shannon talks fondly of the Leek group, and recalls her mentor, Jeff Leek, fostering a more entrepreneurial environment in academia and differentiating himself from other mentors by trying out “bonker ideas”.
When asked how she got into the field of biostatistics, Shannon jokingly says she “back-doored [her] way” in, not having any prior experience or degree in the subject. At the end of her Postdoc, she was faced with the dilemma of continuing a career in research or teaching. “Those who can’t do, teach”, was a saying that stuck in her head, and almost convinced her that teaching was simply a fallback career. However, after much deliberation, she concluded that she wanted to teach and continue to experiment in different domains instead of pursuing a proper line of research.
At Johns Hopkins, she taught undergraduates on public health biostatistics, where her students experimented with public health datasets using the R programming language. Afterward, she applied to universities and education-focused positions at government programs and startups. Her transition to UCSD came after she encountered a job posting from the university’s Cognitive Science Department, and after following the much needed encouragement from her advisor, she decided to apply. She has now been teaching at UCSD for a year, and continues to teach a little bit of computational genetics as part of the Data Science Capstone.
Shannon worked on two projects during her Postdoc, the first of which is Cloud Based Data Science (CBDS). As the Curriculum Lead for CBDS, Shannon develops the content for the courses. Through offering a set of online courses that can be taken for free, CBDS aims to democratize Data Science education. The program sets starting points for people with limited math or comprehension skills, and was built to go in concert with CBDS +, an in-person tutoring program. CBDS was developed during a time when Massive Open Online Courses (MOOC) were revolutionizing education, and CBDS had hundreds of thousands of applicants. But when looking at their applicants’ backgrounds, they discovered that they were all educated with a masters degree.
“We don’t want hundreds of clones of the same person,” Shannon says. She further emphasizes that she wants CBDS to help educate people from different works of life. CBDS targets those who are economically under-privileged, particularly those “who can’t take time out of their life to just study.” The program aims to improve their students’ financial conditions by helping their cohorts obtain entry-level jobs, and teaches them the basics of data analysis and data wrangling.
Her second project, Recount, is more biostatistics focused. “Biologists have gotten really good at publishing data”, but unfortunately, it’s not easy to get or use. In response, the team took all publicly available RNA-Seq data (measures gene expression levels), and processed 70,000 human samples in a single pipeline to make it easier and more accessible for biologists to work with. Shannon worked on phenotype prediction in this project, using a self-coined term called “in-silico phenotyping”, which utilizes Machine Learning to predict the kind of tissue, sex, age, and a variety of factors pertaining to the individual without going back to them for validation.