Sonata in 0s and 1s: Somali Chaterji
Somali Chaterji grew up in a Bengali household, where kids are given nicknames; the one that stuck for her was Sonata. Her work has the quality of a sonata, a musical composition of complementary and contrapuntal themes. The assistant professor in the Colleges of Engineering and Agriculture at Purdue took time away from her keyboard — where she is integrating genomics, the Internet of Things (IoT), machine learning, and cloud computing in a harmonious arrangement — to speak with Purdue Engineering Review.
What kindled your love of science and engineering?
I went to a school with excellent math and science teachers. I loved both science and writing because my science teachers were stellar and strong women I looked up to. I loved “studying” my Webster’s two-volume, encyclopedic dictionary when reading Shakespeare’s biblical and classical allusions — yes, engineers can love to read Shakespeare! And poetry, such as “The Cloud” (Shelley), “If” (Kipling), and “The Road Not Taken” (Frost). On the math side, I used to do mental math to sharpen my quick thinking, prodded by my mom, who was then a high school science teacher. I always knew I wanted to be an applied engineer. I started coding early in fifth grade. Engineering and science were embedded in my persona because I grew up in a family of engineers and doctors.
How did your intellectual pursuits become so wide-ranging?
My first grant was from the National Institutes of Health, NIH-R01, to develop the backend of an open-source web platform for metagenomics — think of it as genomics on steroids because it refers to the community genomes of a population of microbes. My lab discovered that the database technology we were developing could work for IoT as well, and switched to trying it out with time-varying workloads from Microsoft and from Alexa, the voice assistant. We found that the higher the diversity of the queries to the databases, the harder it was for the conventional databases to work at high performance — while our technology shines brighter with such fast-changing workloads. My goal is to develop technology with an eye toward translating it to use cases that are sustainable or that directly improve humankind. This opened my work to the current avalanche of IoT sensors, such as those mounted on drones, requiring analytics to make sense of all the sensing and surveillance. Last year, I started my first company, which is all about speeding up cloud-based machine learning workloads. I am fired up to take our lab-grown code and beat the best database and cloud technologies out there.
You’ve also taken a diversion into digital agriculture. Tell us about Sirius.
Sirius is an exciting amalgamation of a suite of technologies that we have been working on in the past few years in my lab, The Innovatory for Cells and Neural Machines (ICAN). We are investigating on-device computation for the resource-constrained sensor nodes deployed in sensorized farms. My lab is developing solutions for heterogeneous networked sensors, balancing computation and communication (the latter is vital for farms in rural areas). We’re leveraging the cloud and edge computing to create a model of economical, fine-grained billing based on a user’s exact computational use, termed serverless cloud computing. This approach can be applied to other network-starved and adversarial environments, such as the Internet of Battlefield Things (IoBT). In all of this, the compute fabric is transitioning from large, energy-guzzling servers to small mobile-scale graphics processing units (GPUs), rightsizing computation in an energy-aware manner. Machine learning then builds the analytics algorithms that crunch the IoT and computer vision data from these ubiquitous sensing eyes and ears of the world.
Can you say something about the culture at Purdue’s College of Engineering?
Purdue’s College of Engineering excels at providing complete intellectual freedom to launch ambitious projects and to engage a continuous stream of meritorious graduate and undergraduate students to work with. Students can “float” projects for short terms, and select projects both from within Purdue and from other top international universities. My students have published with me at leading machine learning and computational genomics conferences. We also have excellent facilities; for example, I was able to quickly start my drone project, being trained as a drone pilot and using the College of Agriculture’s vast stretches of land in digital agriculture farms. Purdue Engineering encourages and propels innovation by funding commercialization, and its communications team excels at promoting the laboratories and innovations of its faculty. This enables collaborations and awards from top tech companies — like in my case, Amazon, Microsoft and Adobe — and attracts the best students from all over the world.
What is your philosophy of teaching? How do you work with your students and postdoc researchers?
I love blending my research with teaching, and constantly update my teaching techniques. For example, I podcast to offer additional material to listen to “on the run.” I bring tech speakers to my class from such companies as 23andMe, the direct-to-consumer genetic testing service; Adobe Research; and innovative startups. We read and discuss papers from the top machine learning and computer systems research forums. I teach my students how to mix and match algorithmic kernels in different application areas, so they can use similar algorithms for seemingly diverse applications. I also teach them the very fundamental algorithmic concepts that go into developing the intuition of diverse machine learning algorithms. I group undergraduate and high school researchers with graduate students and software engineers so as to be most productive. For our computational genomics work, we publish at supercomputing and machine learning conferences, as well as in top journals, such as Science Advances and Genome Biology, sometimes with clinical and biomedical engineering collaborators.
Is there one big question in particular you would like to answer on your intellectual journey?
I would love to rightsize computation on the IoT side. I love the idea of intermittent computing, energy-aware computing, and “good-enough” accuracy for such computations. This way, we are not wasting computational resources to get exceedingly and unnecessarily high accuracies at the expense of energy. To compute on Internet of Small Things (IoST) devices, one needs to be especially cognizant of energy usage because of the sparse capabilities and energy sources of these devices. Finally, to be able to achieve timeliness (latency) guarantees on such devices, one needs to compute at millisecond-level granularity — think self-driving cars — and for this to happen, one may need to work at the frontiers of accuracy and latency. In such Pareto frontiers, no individual criterion can be further optimized without making other criteria worse off. On the computational genomics side, I am excited to decode the computation in living cells. Single-cell genomics considers heterogeneity at the single-cell level, and the goal is to functionally cluster these cells to decipher the key differences between these clusters. This will then make these clusters of cells interpretable to domain scientists who can subsequently decide on the appropriate clinical interventions to extend our health and well-being.
Where do such technologies as genome engineering, machine learning, IoT and cloud converge?
A lot of the analytics algorithms share the same building blocks, so it is helpful to construct these algorithmic pipelines in a more disciplined manner, identifying the recurrent building blocks in computational genomics algorithms and synthesizing them together. A similar approach can be used to develop algorithms in other emerging fields so tools are not developed in an ad hoc manner. Rather, there is mixing and matching of these machine-learning building blocks to make them work for different use cases, often across disciplines. For example, in our current research, we are working with algorithmic constructs for both computational genomics and computer vision. For computational genomics, autoencoders will sieve out the salient features, making the analytics more interpretable. On the computer vision side, one of the goals is to be able to compress the information in streaming video files to transfer them across networks, saving bandwidth.
Any words of advice to young people with an interest in engineering?
Mix work and family and play; there are no clear boundaries. It is all fuzzy. Set your own rules. I blend work and play because that is what works well for me and for my startup-like lab environment. I love agile modes of communication with my students. When you write a paper, discuss what is the problem and what is the related work. Evaluate your tool against the state of the art (SOTA), and figure out how your work is striking a blow to the SOTA, or perhaps bolstering it. Take ownership of your projects as students; because you have just a couple of them, you have the luxury to go deep into them. You know more about the work than anyone has ever known, so be confident, be in charge, and run with the ideas. Make the code solid so it can be used outside of your published paper to make an impact. Don’t stop with the publication; think of ways to make the paper solve real problems, so your code can be used to build real systems to solve these real problems.
Any hobbies or interests in your “spare” time?
I like running 5Ks and half marathons. I like competing with others, so I enjoy outpacing others in my running group, and breaking a sweat to loud music. I also love caffeine (with whole milk and local honey), and I am an avid Earl Grey tea drinker, sometimes mellowed by lavender and rosehips. Oh! I also love to write poetry. I think I got that from my grandmom, except I can’t write in Bengali.