Pattern recognition vs pattern succession: let’s talk about model bias in venture capital

Aya Spencer
GVCdium
Published in
7 min readJun 16, 2021

How data science can both ameliorate and exacerbate the effects of bias in VC

Photo by Bradyn Trollip on Unsplash

What is bias?

When you hear the word bias, what comes to mind? A quick google search of the term “bias” yielded some interesting results. Synonyms include words such as partial, influence, partisan, and skewed. Other related terms were much more extreme, including words like prejudice, discriminatory, racist, distorted, and twisted. Yikes!

The opposite of bias included words such as impartial and fair.

How is bias defined in the world of business? Investopedia defines bias as

…an irrational assumption or belief that warps the ability to make a decision based on facts and evidence. Equally, it is a tendency to ignore any evidence that does not line up with that assumption.

How is bias applied in venture capital?

Photo by Product School on Unsplash

There is no denying that bias affects many sectors of our society today, and venture capital is no exception. Bias can occur at every stage of a venture deal, from sourcing startups to negotiating the cap table. Bias during sourcing can include favoring startups founded by people of a similar background. Bias during negotiations can look like setting valuations and equity percentages based on preconceived notions about the future success of a company and/or its founders that come from stereotypes or prejudices.

Words such as “culture fit,” “intuition” and “spark” are sometimes used by venture capitalists (VCs) to justify bias towards founders and their startup, whether at a conscious or unconscious level.

In this section, I’ll be discussing two types of bias in VC: human bias and model bias.

Human bias

When we talk about bias in the venture community, we’re usually referring to human bias. In traditional venture capital, human bias has been and still is a very big problem. Human bias often comes in two forms — conscious and unconscious. Each of these can also be further broken down into two levels: emotional and cognitive. Below is an example for each:

Conscious bias at an emotional level can happen when an investor makes drastic investment decisions that are heavily dependent on their mood. Investors who suffer from this type of bias are often aware of these tendencies. (self-control bias).

Conscious bias at a cognitive level can look like a VC that solely invests in blockchain technology and nothing else. While having a specialty can be a strength, completely refusing to explore other opportunities can lead to missing out on good investments or making poor choices in that specialty (status quo bias).

Unconscious bias at an emotional level may, for example, involve a first-time investor being unwilling to invest funds because they’re the first in the family to have acquired immense personal wealth, and are afraid to deploy capital due to a fear of losing it (loss-aversion bias).

Unconscious bias at a cognitive level can happen when a VC surrounds themselves with other VCs that merely echo their own sentiments, rather than bringing new perspectives to the table (confirmation bias).

Some VCs may argue that having bias is not necessarily a bad thing, especially if the bias has historically worked in their favor (when it comes to seeking the right investment opportunities). However, I’d like to challenge this argument by pointing out that it goes against the very nature of the venture industry. Venture is a business that is built on a foundation of investing in innovation. In order to reach new heights as a VC, thinking outside the box is a critical skill. The best VCs are aware of their own biases and take steps to acknowledge and correct them.

Model bias

While human bias is something that can be worked though with self reflection and awareness, bias in technology (specifically in machine learning models) is a bit less intuitive and therefore is more difficult to spot.

Pattern recognition is the basis for and essence of machine learning (ML) models. To understand model bias, we first need context, so I’ll go over the basics of how pattern recognition works within machine learning models.

According to an article by Analytics Vidya,

An “algorithm” in machine learning is a procedure that is run on data to create a machine learning “model.” A machine learning algorithm is written to derive the model. The model identifies the patterns in data that fit the dataset. Fit is a synonym to “find patterns in data.”

In other words, ML models provide a formula for a pattern. From a high-level perspective, the goal of every ML model is to predict patterns based on the data that it uses. So when you hear data scientists say “the model is only as good as the data,” it is very true!

Knowing this, you can begin to understand how problems with bias can arise. If there are biases in the data that is fed into an ML model — which is intended to improve investment decisions by recognizing historical patterns of successful exits — VCs can easily end up treading through murky waters by becoming exclusive and discriminatory. This is model bias.

The problem with model bias

Photo by Chris Liverani on Unsplash

Classifications models (such as logistic regression, decision trees, K-nearest neighbors and SVN) are often used to predict a binary outcome (0/1 classifier). If these models are used to predict the success of a startup, they run the risk of serious model bias. Take, for example, a logistic regression that, based on the data it’s been fed, concludes that three of the most important predictors of startup success are a founder’s schooling (Ivy League), gender (male) and race (white). Based on the pattern recognition in the ML model, it proposes that among other factors, an investor should prioritize startups whose founder comes from a particular educational background and is of a particular gender and a particular race.

The model itself is not classist, sexist nor racist, but rather is providing a clear depiction of the patterns recognized from the data that the model has been fed. Investment decisions at the seed and pre-seed stages are especially vulnerable to model bias, because there is often no revenue data available. Decisions in these early stages often must be made based on a startup’s qualitative features, such as founder characteristics and company culture.

A good data scientist (and a better venture capitalist) should not blindly accept the results of the model, but rather use the model’s output to guide them to ask meaningful questions:

  1. Is the sample skewed? An ML model is only as good as the data that it’s been fed. Check that data to make sure it is inclusive of the entire population. Does the model suffer from a sampling error?
  2. How are these factors related? Understand that just because variables appear to be correlated, there may not be a causal relationship, or not the one that may first appear true. For example, if most successful unicorns are mapped to founders who are Harvard graduates, it may signify a problem with historical access to capital, rather than the ability to build a viable product.

Similarly, regression models (such as linear regression, multivariate regression and support vector regression) are often used to predict a target value based on key variables. These models answer questions about values, such as how much capital should be deployed for certain startups. We can also begin to see how model bias is in play here, as securing sizable funding is a challenge in and of itself for many founders from underrepresented backgrounds.

As you can see, pattern recognition, if not followed up with proper human due diligence, can easily create what I like to call pattern succession.

It is not the model bias itself that is problematic, but our inability to recognize and account for the bias in our models that leads to unfair investment decisions.

Pattern recognition vs pattern succession — the importance of breaking old patterns and creating new ones

As mentioned in the previous section, it is our inability to recognize and account for the bias in our models that leads to unfair investment decisions. If we do not examine biases that may be present in the data to determine which model-driven insights are meaningful versus which require further questioning, we end up using what I like to call pattern succession for our decision-making. While pattern recognition is an observation based on data, pattern succession is a choice based on the interpretation of that data. Let me repeat this again:

Pattern recognition is an observation based on data, while pattern succession is a choice based on a subjective interpretation of that data.

To put it more bluntly, we cannot blame the model for being biased because the model is only as good as the data that it is fed. The very nature of ML models is to recognize existing patterns, so when it returns a result that seems to support a biased action, it is doing what it is intended to do. It has recognized a pattern of bias.

Data science can help to dispel some of the human biases that exist in venture capital. However, it is important to recognize where data models fall short and how they can sometimes actually exacerbate bias.

A good model will be built to check for human bias, while also accounting for the shortcomings of model bias. In my opinion, the best model should work as a complement to the judgment of an experienced and open-minded venture capitalist.

If you’d like to connect with me, send me a message at ayaspencer@gmail.com! If you are a VC needing help combating the hidden biases at your fund, feel free to reach out. Let’s chat!

--

--