Data-Driven Work Cultures: David Magerman of Differential Ventures On How To Effectively Leverage Data To Take Your Company To The Next Level

An Interview With Pierre Brunelle

Pierre Brunelle, CEO at Noteable
Authority Magazine
15 min readJul 11, 2022

--

Be open with your employees about how data is being used. People don’t like to be manipulated by tools or management. A few years ago, we looked at a startup that was using data generated by work tools, like email, Slack, and other communication tools, to evaluate how employees were collaborating. They planned to deploy this tool to help managers and human resources departments identify employees that weren’t integrating with their teams, weren’t being managed well, and might be likely to want to leave their jobs.

As part of our series about “How To Effectively Leverage Data To Take Your Company To The Next Level”, I had the pleasure of interviewing David Magerman of Differential Ventures.

David Magerman is a co-founder and Managing Partner at Differential Ventures. Previously, he spent the entirety of his career at Renaissance Technologies, widely recognized as the world’s most successful quantitative hedge fund management company, playing a lead role in designing and building the trading, simulation, and estimation software. David holds a PhD in Computer Science from Stanford University where his thesis on Natural Language Parsing as Statistical Pattern Recognition was an early and successful attempt to use large-scale data to produce fully-automated syntactic analysis of text.

Thank you so much for joining us in this interview series. Before we dive in, our readers would love to “get to know you” a bit better. Can you tell us a bit about your ‘backstory’ and how you got started?

I started my career as an academic researcher doing early work in data-driven approaches to natural language parsing, first at the University of Pennsylvania for my undergraduate thesis and then at Stanford and IBM Research for my doctoral thesis. After failing to get the first faculty job I applied for, I agreed to join some former IBM colleagues at a quantitative hedge fund, Renaissance Technologies, at which I planned to stay for only a few years before going back to academia. I ended up staying more than twenty years before leaving and going into venture capital investing.

Can you share a story about the funniest mistake you made when you were first starting? Can you tell us what lessons or ‘take aways’ you learned from that?

When I was at IBM, I managed to get unsanctioned superuser status on the company’s computer network and used it to monitor usage of computers around IBM in order to steal time on unused machines for my group’s compute-intensive research. During the first few weeks I was at Renaissance, I did something similar. However, when I tried to launch a similar monitoring tool I built at Renaissance, it ended up becoming a virus that brought the entire company’s research environment to a halt and threatened to jeopardize the production of the next day’s trading data. I could have been fired for doing that, and it took me a while to regain everyone’s trust. It was a painful but important wake-up call for me, that working at a for-profit company was very different from working at a research lab. Things we did every day, even every minute, mattered at Renaissance, unlike when I was in academia.

Is there a particular book, podcast, or film that made a significant impact on you? Can you share a story or explain why it resonated with you so much?

The books that have had the biggest impact on me have actually been historical fiction, from authors like James Michener and Ken Follett. Most of my career, I spent my time behind a desk, sitting at a computer screen and writing programs. Learning about how technological advancements, and sometimes individual actors, have driven societal change inspired me to get more involved in society outside of work. However, the one book that has had the most impact on me is The Tanya by Rabbi Shneur Zalman. Over the past decade or two, I’ve become a more actively religious person, and The Tanya helps explain the physics and chemistry of the spiritual world. As a scientist, it helps frame religious practice as a practical endeavor.

Are you working on any new, exciting projects now? How do you think that might help people?

During the past five years, I went through a period of data privacy activism that convinced me I didn’t know enough to contribute to good solutions. My most exciting project right now is learning: learning what other people are doing to fix the internet, learning what has worked and hasn’t worked so far, and waiting for a great new idea to appear, either in my head or in someone else’s, to bring us back from the abyss of social media armageddon.

Thank you for all that. Let’s now turn to the main focus of our discussion about empowering organizations to be more “data-driven.” My work centers on the value of data visualization and data collaboration at all levels of an organization, so I’m particularly passionate about this topic. For the benefit of our readers, can you help explain what exactly it means to be data-driven? On a practical level, what does it look like to use data to make decisions?

When I was an undergraduate, I took a course called statistical decision theory, which basically said that people can use data to build models of expectation, and they can then make decisions that yield the highest expected value. This seemed so obvious to me that I didn’t understand why I was getting credit for taking the course. But by the end of the course, I realized that most of the world operated without the insight that we should make decisions using statistics to maximize expected return. This was even before I was in quantitative trading, when I was still just trying to build a natural language parser.

Being data-driven is using the data your organization has to reduce uncertainty about the future and to help you make business decisions that will give you the highest expected return. And, if your data can help you understand your risks better, then you can maximize your risk-adjusted return, which allows you to take bigger risks when the returns will be greatest.

Data-driven practices include automated use of data: using computer systems that digest data being generated by business activities, as well as the outside world, and build models that can be deployed to improve decision-making. They also include collecting business intelligence data, both internal and external, into databases that can be used interactively, by analysts and teams, to surface insights that can help humans make better, more-informed decisions.

Which companies can most benefit from tools that empower data collaboration?

These days, there isn’t a company that can’t benefit from data. Companies that make rapid-fire decisions all day can benefit from automated systems that digest data in real time and compile it into recommendations, either for humans to use as advice or for computer systems to act on without intervention. Companies that make fewer decisions that take longer to have an impact can still use data to make better decisions, and can also use data to monitor the progression of those decisions, in the event they want to adjust course based on data-driven analysis.

Recently, we’ve been seeing more products for data-driven collaboration in software development. The software development process leaves behind an enormous amount of structured and unstructured data that is typically unused: project status updates, incremental testing results, Slack conversations about problem resolution, and output from project management tools. Tools are actively being developed to collect this information and generate insights from the data, which can be fed to programming teams and management to advise teams about how to maximize their efforts to achieve their goals efficiently.

We’d love to hear about your experiences using data to drive decisions. In your experience, how has data analytics and data collaboration helped improve operations, processes, and customer experiences? We’d love to hear some stories if possible.

At Renaissance, every decision was data-driven. We built models to understand price movements, market covariance, trading costs, trading impacts, etc. We even had models to understand how well our models were working, so we could judge whether or not we were taking too much risk, such that bad luck might cause us to go out of business. And we had models to estimate how much it would cost us to liquidate our portfolio if we were forced to. Everything about Renaissance’s operational decisions were driven by data.

At Differential Ventures, where we invest in early stage startups, data-driven decision making takes a very different form. There isn’t enough data to build models we can train and test enough to convince us they are accurate enough to use to make automated decisions. Right now, the best thing we can do is collect more and more data, try to hypothesize what trends in the data might be predictive of market behavior and success, and eventually use those insights to train models to inform our decision making.

The most important thing any and every business can do is collect data thoughtfully. The day you start collecting data brings you a day closer to having enough data to build explanatory and predictive models. The data might not be useful today, but in the years to come, more and more algorithms will be discovered to extract information out of data that can improve business practices. And once those algorithms are discovered, businesses that have been collecting data will have an advantage over competitors that haven’t.

One of our portfolio companies, Parlor, helps companies improve their customer experience by collecting data from different forms of customer communications: social media posts, product reviews, and even web site interactions like what pages they visit and what tools they use. Parlor helps their customers collect all of these data sources into a single comprehensive database, and then builds new tools that analyze the data for these companies, as well as giving their customer support teams access to the data for their own analyses. Parlor collects a lot of data their clients aren’t using yet, but they will constantly be coming up with new analytical tools to leverage more of the data to help their clients manage their customers’ experiences better.

Has the shift towards becoming more data-driven been challenging for some teams or organizations from your vantage point? What are the challenges? How can organizations solve these challenges?

There are definitely some industries that have been slow to accept and adapt to data-driven decision-making. When Renaissance was first hitting its stride with quantitative trading in the mid-1990s, other firms were trying to compete with Renaissance but not succeeding. Ultimately, Renaissance’s advantage was that management trusted statistical decision theory and trusted the models. Human intervention in trading decisions ALWAYS led to lower profits, and Renaissance was always more automated than its competitors. We heard many stories about groups with good models that went on a losing streak and were shut down. At Renaissance, if we went on a losing streak, we’d redouble our efforts to figure out what we were doing wrong and fix it, or wait until our luck changed. It took a lot of faith when we endured long losing streaks due to bad luck, but it always made sense to keep trading.

Given the success of quantitative models in hedge funds, it’s surprising that other financial industries haven’t embraced fully-automated decision-making systems. Mortgage lending is still a very manual process, with decision making using some data sources, but not nearly as much statistical decision theory as could be brought to bear if the management teams had more faith in the models. Real estate investing is another area where data-driven investing hasn’t taken hold.

Areas that depend on human-intensive labor and that have been traditionally driven by instinctive decision making, like construction, medical diagnosis, and human resources, have been slow to adopt data-driven solutions, perhaps because the decisions being made have long-lasting impact and so the typical trial-and-error of adopting new technologies is less tolerable. But startups are building solutions for solving problems in these areas, and innovators are taking the risk of deploying them. In the long run, that risk taking should pay off in efficiencies and decisions that lead to better outcomes.

Ok. Thank you. Here is the primary question of our discussion. Based on your experience and success, what are “Five Ways a Company Can Effectively Leverage Data to Take It To The Next Level”? Please share a story or an example for each.

  1. The most important thing a company can do is collect as much data as they can as thoughtfully as they can. Collect data in its raw form with as much information saved, including timestamps, data sources, and any processing that was done to the data. Compile the raw data into a cleaner format, but also keep the raw data in case you discover errors in data processing algorithms or new ways to get more information out of the data. Disk storage is cheap relative to the value of data, so collect as much data as you can. Don’t throw anything away.
    -When we were looking for new data sources for building models at Renaissance, we came across a company that streamed an interesting signal in real-time. We started collecting the data to see if the information was valuable to us. We figured out that the signal they were sending us was interesting, but we thought that there might be more information in the raw data from which they were deriving the signal. When we asked them if they could generate a different signal from the historical raw data, they said they couldn’t because they didn’t save it. They didn’t want to waste all of the disk space storing the historical data, so they threw it away. We ended up not using their product after a while. But we were stunned by their short sightedness about throwing away all of that raw data, as though they were never going to think of a better way of analyzing the raw data and want to test it out on historical data.
  2. Make sure that whatever tools they use have interfaces that allow data generated in and by those tools to be pulled out for analysis. At Differential, one of the most important tools we have is our CRM system, which we use to manage our deal flow, portfolio company, investor, and venture capital fund databases. All of our interactions, comments, notes, and analyses are stored in our CRM system. The sad reality is that there aren’t any comprehensive CRM solutions for venture investing, at least none that we have found. The one we use is the most functional for our purposes, except for one glaring omission: there is no way to search the unstructured notes that we enter into each of these databases. Countless times over the years we have been using it, we have wished we could search for a keyword or a pattern to collect companies that have similar characteristics for analysis, but there is just no way to do it. As a result, we are pulling all of our notes out of the CRM and putting them into a Google Drive filesystem, so at least we can search them. But, in retrospect, it was a mistake to have started to use this CRM, because it lacked such an important data analysis feature.
  3. Hire people that understand your data, understand your business processes, and understand how to extract information from data. We have seen countless startups selling products that I know from my experience are absolutely essential to harnessing data to improve business decision making and practices. And time and again these products go unsold and the startups languish without meaningful revenue. The main reason these products don’t sell is because the customers don’t understand enough about how their business can use data to understand why they need those tools. To use data in business processes, you need observability in data pipeline management, you need to monitor the quality of data and statistical drift in that data, and you need to monitor the consistency of the performance of models trained on that data over time. Most people think models just get trained, work out of the box, and continue to work. They also think that all models are equally important. Companies need to have people who understand what data might change over time, what models have more financial risk associated with them, and what kinds of information can be found in data vs. what kinds of information can’t be found. Companies that have those people will recognize the need for tools that give them visibility into what is important about data and data-driven models, and those companies will be more successful in leveraging data in the long run.
  4. The more dependent you become on data-driven software, the more you need to be concerned about cybersecurity: the security of your data, the security of your network, and the security of your software. There has been a lot of focus on data privacy in the commercial sector, with companies suffering breaches of user information, credit card data, and account passwords. But a bigger risk to companies is having breaches that change the behavior of their software systems. If someone breaks into your network and steals data, that can be embarrassing and harm your customers. But if someone breaks into your network and changes the data in your databases, or changes the software that you use to process your data and use it to make decisions, your business practices will be irreparably harmed until you discover the breach and fix it, and even then you may never be able to recover the critical data that was changed in the process. Firewalls and identity management systems are of course important. But also important are: tracking the integrity of critical databases, monitoring user behavior on your network to detect inappropriate access to software systems and databases, and trying to identify supply chain attacks in internal and third-party software. Recent attacks on major software vendors like SolarWinds and Microsoft, as well as on open-source tools like Apache’s log4j, have made software supply chain security an important focus of cybersecurity startups.
  5. Be open with your employees about how data is being used. People don’t like to be manipulated by tools or management. A few years ago, we looked at a startup that was using data generated by work tools, like email, Slack, and other communication tools, to evaluate how employees were collaborating. They planned to deploy this tool to help managers and human resources departments identify employees that weren’t integrating with their teams, weren’t being managed well, and might be likely to want to leave their jobs.
    -The theory behind the tool was sound, but the initial implementation was flawed. The tool developer found that, in their early trials, the tool stopped being effective after being used for a while. After analyzing the data from their beta testers, they realized that employees started to become aware that their communications were being monitored, and that monitoring was impacting their evaluations. They started gaming the system, deliberately generating “collaborative” communications to appear to be working well with their teams. The spurious messages drowned out any valuable signals that might have come from the information sources. The tool developer realized they needed to re-evaluate how to use this kind of data without alienating the employees and making them feel like they needed to defend themselves against the way the data was being used.

The name of this series is “Data-Driven Work Cultures”. Changing a culture is hard. What would you suggest is needed to change a work culture to become more Data Driven?

Employees need to be brought into the process and educated to embrace the ways in which data is going to change their work environment, their day-to-day work processes, and the way their business produces its products and services. If the management team isn’t equipped to manage that education process, they need to bring in the right consultants to manage it, and to train the management team to oversee it in the long run. Management also needs to adapt to embrace the ways that data-driven processes will change their company. They need to be flexible, and they may need to let go of “the way things have always been done” to allow everyone to embrace the way things will be done in the future.

The future of work has recently become very fluid. Based on your experience, how do you think the needs for data will evolve and change over the next five years?

With people working less and less in person, and more people working independently, collaborative tools will be critical to connect people that are working remotely and at arm’s length. The data that is generated by remote-working tools, like Zoom, Slack, email, and even the telephone, will need to be collected, analyzed, and harnessed to replicate the insights that were previously gained by in-person interactions. Management will need these kinds of tools to maintain a level of observability of their group’s productivity. And this data will need to be shared across organizations, to support outsourcing of some parts of projects without losing that observability. All of the digital communication between physically disconnected team members, employees and consultants alike, need to be viewed as work-product and project-critical data that companies will use to monitor productivity.

Does your organization have any exciting goals for the near future? What challenges will you need to tackle to reach them? How do you think data analytics can best help you to achieve these goals?

We would like to understand better how to leverage the relatively sparse and inconsistent data that exists in the venture capital industry to make our investment and portfolio management decision making more data-driven. This is a much harder problem than doing this modeling work in public markets, where information is broadly available and asset pricing information is continuously maintained. The challenges in this area are manifold: how do you collect the data; how does the relevance of data change over time; what kinds of decisions can you make in a primarily data-driven way, and which decisions need to be simply biased by data, but ultimately made by humans? We hope our experience with data science in other industries, as well as our access to cutting edge practitioners of data science in our portfolio companies, will give us the insights we need to build and deploy systems that help us achieve the goal of being more data driven.

How can our readers further follow your work?

Follow us on Twitter and Linkedin, check out our portfolio companies on our website, and read our blog.

Thank you so much for sharing these important insights. We wish you continued success and good health!

--

--

Pierre Brunelle, CEO at Noteable
Authority Magazine

Pierre Brunelle is the CEO at Noteable, a collaborative notebook platform that enables teams to use and visualize data, together.