“Inverted personages” by JoAN MIRÓ

Novelty Squared

A Challenge of Modern Interdisciplinary Scientific Collaboration

With the announcement of the new $38.7M data science effort at Berkeley, NYU and U Washington, now the hard work begins. It’s a multifaceted endeavor, from tackling education to reënvisioning and reinvigorating the career paths of data scientists. Much of the success will take years to bear out; however, the production of novel science results, enabled by this investment, is one of the clear demonstrable “early wins” for this ambitious effort. Just what those will look like remains to be seen, but as I write herein, one of the greatest triumphs will be novel scientific output that is also enabled by novel methodological approaches to data.

The Toolkits of Domain Scientists

All scientists use tools and toolkits to push boundaries and in many famous boundary expansions a scientist used an existing tool for a new purpose. Astronomers use telescopes and specialized cameras to observe the heavens; and nowadays telescopes and cameras are purpose-built. But the first astronomer to use a telescope, Galileo, co-opted its intended use as a military instrument. And in opportunistically pointing the new Dutch invention not to the horizon to look for foreign ships, but instead to the night sky, he made a novel series of astonishing discoveries that changed our view of humanity forever.

A new tool used in aid of a “domain” science (ie. physical, life and social science) need not be just an apparatus or physical device. It can be a theoretical toolkit with which new theory can be developed, new hypotheses created and tested. It can be a new programming paradigm that simplifies the expression and evaluation of raw data. It can be a newly invented algorithm that has such and such desired statistical property. One of the growing modern challenges is that domain scientists must gain exposure to the ever-growing landscape of toolkits, to understand what tools not traditionally used in their discipline might be useful, and to become proficient users of the tools for their own pursuits. This is indeed an increasingly larger systemic issue: it’s hard enough to keep up with all the new knowledge arising in our sub-domains, let alone continually learn what new techniques are popping up across the landscape well outside of our discipline.

“The last thing I want to have happen with an interdisciplinary collaboration is that my CS and stats colleagues find their contribution to be routine if not mundane.”

What’s in it for the Methodological Creators?

So where does this leave the methodological creators of toolkits for modern data-intensive science, the computer scientists, statisticians, and mathematicians? Surely the inventors of the MapReduce paradigm for parallel computation would be pleased to know that it’s being used across the physical sciences to speed computation. But if I collaborate with a computer scientist to help me set up an Hadoop cluster in my lab, what’s in it for them? How has their work advanced the body of CS knowledge? Where’s their novelty? If I collaborate with a statistician here at UC Berkeley and we figure out that an obscure statistical metric from the 1950s is perfect for framing the solution to an import astronomy problem, what does my collaborator “get” out of our work together that does not push the boundary of their field? The last thing I want to have happen with an interdisciplinary collaboration is that my CS and stats colleagues find their contribution to be routine if not mundane.

On the flip side: a computer scientist may invent a truly novel technique for inference on a large amount of data but use as her testbed a question that no astronomer really is all that interested in knowing the answer to.

This is what I call the Novelty Squared problem in modern interdisciplinary collaboration: the challenge of finding work that may be simultaneously novel both to the domain scientist and to the core computation, statistical, and algorithmic scientist. Finding and doing something novel is hard, but novelty times novelty is extremely rare.[ I discussed novelty squared as part of the Berkeley Data Science Lecture series.]

Any scientist would be proud to say that he or she made just a few truly novel contributions in their lifetime—ones that bear fruit, hold up against scrutiny, and are enduringly remembered. I am hopeful that as we embark on our new data science efforts we will find ways to enable the creation of novel domain science alongside (and enabled by) novel methodological approaches. In turn, I hope we discover data and questions about data that require novel methodological approaches.

It would be wonderful to learn what examples of novelty squared you’ve encountered already…how did you first approach the problem? How was it solved? How did you find colleagues to work with you on the problem? What barriers (beyond the normal litany) did you encounter?