Building Tools for Science Together

Kicking off Collaborative Computational Tools for the Human Cell Atlas

Computational statistician Kim-Anh Lê Cao working with CZ Biohub data scientist
Angela Pisco.

Cells are the fundamental units of life, but we still have much to learn about their basic function and organization. There are thousands of cell types and trillions of individuals cells working in complex networks to enable a diversity of functions throughout our bodies, from the immune system to the brain. New experimental technologies for characterizing single cells — combined with the right computational approaches — may help us begin to organize and make sense of all this complexity.

The Human Cell Atlas (HCA) is an ambitious global collaboration to create an open reference map of all cells in the human body by comprehensively characterizing cell types, numbers, and spatial locations. Once complete, it will be a fundamental resource for scientists, allowing them to better understand how healthy cells work and what goes wrong when disease strikes. But assembling, integrating, analyzing, and sharing this resource demands new cloud-based data infrastructure and new analytical methods for processing and interpreting the large, complex heterogeneous datasets.

CZI is supporting the Human Cell Atlas through grantmaking, data infrastructure, collaborative open-source software development, and support for collaborative research. As part of this effort, CZI Science recently organized a four-day meeting of over 200 scientists, computational biologists, and software engineers to kick off the Collaborative Computational Tools for the Human Cell Atlas — a collection of 85 grants to researchers aiming to work collaboratively to solve computational challenges for the HCA.

Designing meetings for collaboration
At CZI, we believe that interdisciplinary teams working together accelerates science, especially at the intersection of biology, computation, and software engineering. But how can we use our scientific meetings to help bootstrap collaboration? This meeting was an opportunity to try some ideas.

In advance of the meeting, we grouped projects into 12 loose research areas, organized around data types, analytical methods, and software ecosystems. We referred to these groups as cell state, cell type, imaging, multiomics, trajectories, manifold alignment, population variation, latent spaces, compression, scale, portals, and bioconductor. Some of these groups had actually applied together — others were meeting each other for the first time.

We asked each group to present as a cohesive whole, so as to have 12 group presentations rather than 85 individual projects — and we had some lively conference calls to prep for the talks before the meeting started. This loose organization and the pre-meeting conference calls helped break the ice when people arrived. We also designed a meeting website with links to Github repositories, slides, documents, and other project outputs that served as a collaborative online hub throughout the meeting.

On day one, the 12 groups presented their projects. Following these 12 presentations, the groups then spent the following two-and-a-half days working together to better “frame” the work they might do over the one-year project period, emphasizing common threads like shared metrics and benchmark datasets for defining the relative success of different algorithms, data and metadata format standards, or integration across different software tool ecosystems. To add some structure, we also featured four parallel tutorial sessions to help teach and inform on a range of topics: tools for modern web-based data visualization, the Human Cell Atlas Data Coordination platform, using and improving bioRxiv for computational biology, and collaborative coding with Github.

We created plenty of time throughout the meeting for the groups to get to work, from collaborative coding time to user feedback sessions to group brainstorming, interspersed with invited presentations on current major experimental and computational efforts related to the HCA — including an inspiring keynote from Dana Pe’er, a leader of the HCA computational community. The meeting ended with presentations from the trainees on what they had accomplished while there, and ideas for how to keep working together. Throughout, we left plenty of unstructured time, both work and social, to encourage deep, open discussion and relationship building. Selecting a venue that provided both the meeting space and attendee lodging helped with community building. Curated benchmark datasets on Github might last a year, but s’mores and karaoke on the beach — that’s a bond that can last throughout a scientific career.

Continuing the discussion by an ocean side bonfire.

What we learned
Three aspects of the meeting were particularly inspiring. First, groups were truly excited to collaborate when given the time, space, and tools. Second, including students and staff scientists alongside PIs injected energy into the meeting, and probably helped make sure real work got done! Third, embedding several CZI computational biologist and software engineers helped spur collaboration across the meeting — our team learned enormously from the grantees about on-the-ground challenges, and also helped frame cross-cutting computational opportunities and collaborations.

Everyone engaged with this novel meeting structure in a slightly different way: in the best cases, groups clarified the vision and scope for their projects, sometimes literally curating or defining benchmark datasets or writing prototype code. We were also impressed by groups that found specific opportunities to join ongoing open-source software collaborations with CZI: Josh Moore from the OMERO project is now working with the Starfish team on data formats for image-based transcriptomics, and Ryan Williams and Cotton Seed from the scale group are developing more scalable approaches for storing and computing on matrices in the Data Coordination Platform.

We were also excited that developers from three key single-cell analysis software packages — Scanpy, Seurat, and Scater — made progress on enhancing the interoperability of their tools and data formats. Just being in the same room and having time and support can catalyze progress.

It was truly awesome to see the HCA computational community come together at this meeting. As one attendee put it to one of us, it felt like “the woodstock of computational science”, and we couldn’t agree more. We’re pleased that CZI Science was able to help kick off such exciting collaborations, and we hope to use this meeting as a model as we continue to launch more projects.

To learn more about work in science, visit our website or follow us on Twitter. To learn more about our technology team, follow the CZI technology blog. To stay updated on funding opportunities, sign up for our mailing list. And you can always reach us at science@chanzuckerberg.com.

Jeremy Freeman, Director, Computational Biology

Jeremy is a scientist at the intersection of biology and technology. He wants to understand how biological systems work, and use that understanding to benefit both human health and the design of intelligent systems. He studied computational vision in grad school at NYU, led a neuroscience research lab at HHMI’s Janelia Research Campus, and is currently at the Chan Zuckerberg Initiative leading our work at the intersection of computation and biology. He’s passionate about open source and open science, and bringing scientists and engineers together across a range of fields.

Arne Bakker, Manager, Scientific Meetings and Reviews

Arne Bakker is Manager of Scientific Meetings at the Chan Zuckerberg Initiative science team. He has a PhD in tumor immunology from the Netherlands Cancer Institute and did his postdoctoral research at UC Berkeley. More recently, Arne was Assistant Dean of Career Education for PhDs and Postdocs at Stanford University. Throughout his career, Arne has been actively bringing scientist together: he was Director of the Discovery Festival in Amsterdam, co-organized Beyond Academia at UC Berkeley and PhD Pathways at Stanford, and volunteered at the Bay Area Science Festival. At CZI, Arne combines lessons from this multifaceted career to lead our efforts to bring scientists together through meetings, workshops and other convenings, with the goal of creating and supporting collaborative scientific communities.