Quantitative Evidence for Attribute Based Filtering in Venture Deal Flow

“There are a number of signals you can mine and draw patterns from, including where successful entrepreneurs went to school, the companies they worked for and more”- TechCrunch paraphrasing Google Ventures’ Graham Spencer


Build a tool (Prism) to support deal flow sourcing and filtering of high potential startups for Angel and institutional seed investors in the NYC area


1. Built scraper to rip AngelList and Regulation D filings to identify new startups every week headquartered in NYC

2. Built evidence based attribute filtering model to recommend identified startups based on probability of funding as a proxy for startup success


In a low information environment (as in angel and seed investing) attributes like school, company experience and team size can be used as an initial quantitative screen to supplement and enhance deal flow management.


I scraped information from AngelList on 100 startups for each of 8 schools (listed in Figure 1). Using funding as a proxy for startup success I examined:

1. Probability of being funded per school

2. Probability of being funded given the founding team > 2, per school

3. Probability of being funded, given number of attributes (for example team went to Harvard and worked at Google = 2 attributes)

Results are shown below. We see that the school does impact probability of being funded given the team > 2. (For Stanford I propose that there are far more startups being formed, than Yale, so probability decreases due to higher denominator). The dotted box highlights the probability per school of being funded given the founding team was greater than 2 people.

Figure 1: Probability of being funded per school based on a subset of AngelList data

Below shows the probabilities of being funded (per school) given the number of attributes (or tags) the founding team exhibits (attributes are school or company experience, Harvard, Google etc.). We see a mild increase in probability of being funded given the higher number of attributes per startup team.

Figure 2: Probability of being funded given attribute count, per school

Importantly there are certain biases to be aware of when scraping historical data from AngelList. If one were to scrape the entire catalogue of Harvard startups severe survivorship bias would affect the results (since unfunded/failed startups would delete their profiles). The 100 startups scraped were taken carefully (from August 2014 — March 2015) so as to examine a period where the startups had time to raise an angel round but not long enough for profiles to be deleted as a result of not being able to raise.

Independent Industry Evidence:

First Round published a report ‘First Round 10 Year Project’ where they investigated patterns common to their successful startups from their last 10 years. Their results support actively sourced attribute based filtering:

1. Startups with at least one founder from Ivy League schools or Stanford, MIT or Caltech performed 220% better than other teams

2. Founding teams with experience at Amazon, Apple, Facebook, Google, Microsoft or Twitter performed 160% better than other teams

3. Teams with more than one founder perform 163% better than solo founder startups

4. Startups found by active sourcing (Twitter, Demo Days etc.) performed 54% better than referred startups

Institutional Portfolio Evidence:

Using publically available information (from CrunchBase), I analyzed the efficacy of attribute based filtering on an actual institutional seed portfolio (startup names redacted). The ‘scaled’ column normalizes funding and time in operation between startups (Scaled = [FundsRaised] / [Now() — DateFounded])

Figure 3: Institutional seed portfolio with funds raised used as proxy for startup success in examining attribute based filtering efficacy

The institutional portfolio provides support for attribute based filtering in seed investment decisions. There are cases in this portfolio where both the previous founder attribute and school attribute were met, however funding seems low (highlighted by the startup with the lowest funds raised). This importantly supports the notion of attribute based filtering as an initial, supplementary approach in deal flow management. In also highlights the potential need for further distinction (and weighting optimization in the below model) of the individual attributes used in the filtering process.


An attribute based quantitative filtering score (FS) for potential investments sourced from AngelList and Reg D scrape can be created to be used as an initial supplementary indicator only. FS for startup i, where:

FSi is the weighted sum of Proprietary Tech (PT) and Team (T) attributes, α is the weight given to the sum of PT attributes and β is the weight given to the sum of T attributes. For example one PT attribute j would be whether the tech is 10x (score from 0 to 1). Attributes of the team are described previously (school, company, team > 2).

Sample Output:

NYC based pre-seed startups sourced in August and September from AngelList scrape ranked from highest Filter Score based on startup attributes shown below.

Model Optimization:

With each cycle angel investors will be able to identify successful startups and optimize the model (update weights per attribute) to reflect new information to move toward asymptotic optimal decision model. Here I am recommending to build a robust negative feedback loop where Input is the startup characteristics, A is the proprietary investment decision model, B the reflected information on previous decision success and Output the current decision).

With this negative feedback loop the angel investor structures a formalized optimization process for his or her investment approach. Each investment (and non-investment) serves as a learning opportunity. This provides the angel investor with the opportunity to continual enhance their investment approach.

In a forthcoming 4 part series I hope to build a detailed, practical control system for venture decisions based on the above (Part 1: Inputs, Part 2: Diligence and Decision Model, Part 3: Performance Feedback, Part 4: Output Decision).

“Today’s ‘best practices’ lead to dead ends; the best paths are new and untried.” — Peter Thiel, Zero to One