Shira Eisenberg interviewed by Sam McCabe
My first impression of Shira was her funny-looking triceratops profile picture for her Discord, aptly named triceratops. But past her mysterious prehistoric profile, I found Shira to be incredibly passionate about the work that was being done at NewAtlantis. Always itching to start collaborating with fellow tech wizzes, her eagerness was immediately felt. Shira is hoping to use her knowledge of Natural Language Processing (NLP) in order to better understand biodiversity with groundbreaking techniques.
Where do you live?
I’ve been all over the place! I grew up in New York, right outside the city, on Long Island. I went to college at UChicago, in Chicago. I lived in Cambridge for a little while when I was at MIT. Then I was pretty nomadic and did some traveling in Europe. I’m currently residing in northern Virginia, near Washington DC, with my mother and grandmother.
How has your academic journey shaped what you’re interested in researching today?
My academic journey began with an immense interest in human psychology, writing, and programming. My interest in research began early. I entered the Intel Science competition and placed second in the world in my category in high school. I was studying the effects of removing preparatory information on human perception of action goals over a computer interface — basically, if you removed microseconds of action cues from a simulated video, could participants guess which direction the action was going, or were predictions slowed from trials without those microseconds removed (they were). I was also interested in biology and did a study on exogenous ethanol exposure on c. elegans, a form of worm used in a lot of neuroscience research — I basically got worms drunk and saw if it affected their ability to find their way to hot spots on a thermal gradient (it did). In college, I found AI and really loved it. I’ve been in Natural language processing since my time at CDC after my first year of college when I built a recommender system for articles for their library. I just love the idea that computers can mathematize language and find connections we, as humans, cannot. The crossover to genomics now is amazing. Sequences are sequences, really. A K-mer is an N-gram, as I recently learned. I’m really excited about the project we’re currently working on.
What is Natural Language Processing (NLP) and how did you become interested in it?
For lack of a better explanation, it is teaching computers to understand human language. But it’s really so much more. I’ve always loved writing (as you can probably tell). NLP tasks are kind of like teaching computers how to write and read. There’s summarization, simplification, natural language understanding, and so much more. What I really love is how it’s forced neural architectures to evolve. The basic feed-forward neural network isn’t designed to keep track of sequential data. It maps each individual input to an output, which is helpful for tasks like classifying images but fails on text due to its sequential nature. To process text, we must take into account sequences and the relationships between words and sentences. So we needed Recurrent neural networks (RNN)s and long short-term memory neural networks (LSTM)s, but RNNs had the vanishing gradient problem (as you go deeper into a text excerpt, the early words of the sequence gradually faded in network representation), and LSTMs, which solved this problem but were slow to train and couldn’t support parallel computing. Then we had the transformer revolution (which is still going on). Transformers can learn context in sequential data (which is helpful not only for human language but for genomics as well) and introduced attention mechanisms, which make it possible to track relationships between words going both forward and backward across long sequences of text (or the genome). Massive transformers, called Large Language Models (LLMs) are revolutionizing EVERYTHING. I’m so excited about where this field is going.
What is your favorite project you’ve worked on?
My favorite project is probably what we’re working on now (which is why I’m so anxious to get going)! I haven’t touched biology since high school and had a brief stint in a computational neuroscience lab during my first year in college. I’ve been vegetarian for ethical reasons since I was 3 years old, so oceanic preservation and biodiversity causes hit close to home. I’m super excited to map the plankton metagenome to phenotypes and create a hybrid transformer+CNN architecture to do so. Let me explain a little about the project. A cubic mile of ocean water contains billions (if not more) of planktonic organisms. By routinely sampling ocean water, we can collect information and map the metagenome of these organisms. Taken together with metadata such as temperature, acidity, depth, and water pressure, and mapped to a time series, this data reveals changes in the oceanic biome due to changes in climate. By training new foundation models on the extensive metagenome sequences from our samples, we will be able to map changes in ecosystem conditions and climate to changes in the phenotypic manifestations of the animals that live there. Large Language Model (LLM) architectures enable us to take these data-rich genomic sequences and extract information relevant to phenotypes. CNNs have shown promise on long sequences in genomics, but lack the immense power of transformers and capacity for knowledge recall. We’re aiming to develop a new, hybrid architecture, for this task specifically. Once proven as a concept for the Baha ocean column (supplemented with simulated data), we could see this extending to a number of oceanic ecosystems. It is not hard to imagine foundation models trained specifically to output expected phenotypes when prompted with genome sequences and metadata of the ecosystem. By examining differences between the expected and actual manifestations, we reveal how climate change is changing our actual ocean life and tipping the scales of microbial organisms.
You’ve expressed how much you’re interested in seeing this project develop, could you describe what makes NewAtlantis so exciting?
NewAtlantis is providing the opportunity to explore questions that haven’t been explored and develop new NLP architectures in the process. What’s not to love?! Seriously though, I love the mission to support biodiversity and create assets of ocean life we need to preserve. Without the oceans, we’d be nowhere as a species and would probably all be dead. Who wouldn’t want to support work to prevent that?
Why do you think that the Decentralized Science (DeSci) community is the future?
Absolutely. It speeds up the peer review process to unprecedented levels and helps funds get into the hands of those who need them. I think it’s revolutionary.
What problems are you hoping to solve with NewAtlantis?
The metagenome-to-metaphenome problem is perhaps the most challenging problem there is. I’d love to contribute to solving this problem with the research we’re doing. We can expand from plankton to other organisms with proper sampling. The proof of concept alone would be remarkable.
You’re very enthusiastic about working with our beloved Stanley. What makes him a great person to work with?
We’ve always had great synergy and were looking for a chance to work together. Stanley’s awesome. He has a much greater depth of knowledge in genomics and libraries than I do and has exposed me to great projects outside of this like DeepChem. He’s also just a friendly and personable character. Highly recommend.
Thank you so much for taking an interest in NewAtlantis! Last question, what’s your favorite marine animal?
I love the whale shark, but I really love blue whales. It’s just so crazy they exist. They’re humongous. Kind of terrifying, actually. But they’re gentle. I just love the paradox. You’d expect the largest animal in existence to be some type of apex predator, but they feed on krill.
NewAtlantis seeks to address the twin challenges of climate change and biodiversity loss by aligning community, government, industry, and individual benefit with the improving ecological health of our oceans. Subscribe to our newsletter on our website, or join our Discord to learn more.