We tried our algorithms in the real world and they worked: Our First Pilot

Corey Kiyoshi Clippinger
Decissio
Published in
8 min readSep 13, 2017

It’s been a while since we are building our algorithms here at Decissio and finally it was about time to take them out for a spin with a real use case. For that purpose, we got in touch with the best Central European accelerator: StartupYard. In this post we’ll go over some of the key findings from our experiments and how they can help investors and founders think about the early stages of the startup process. You can see StartupYard’s evaluation of our process here

StartupYard had over a hundred applications to sort through in order to get to its six final entrants. 43 of the most promising applicants were chosen for a Skype interview after which eventually six went on to participate in StartupYard’s program. Utilizing our research of previous datasets from other accelerators we have and the dataset provided by StartupYard. We created our own ranking system for the recent applicants to StartupYard. We wanted to see how do our system stands up to StartupYard’s extensive selection process with a professional jury.

[The selection accelerators have at hand involves data and several interviews. On the other hand we only had the information given by the applications. That information was augmented by our own data collection, categorization and data management server.]

Our initial task was to correctly classify applicants into domains that StartupYard was interested in AI, blockchain, IoT, robotics, and virtual reality.

We had the names of the startups, their teams, duration of their projects, revenue, and funds invested to date (when available), and answers to questions built into the application that described their business concepts and the backgrounds of the team involved. Our process produced about 200 variables for each startup that told the story of that company, their founders and their progress. Which we used to automate the initial screening part of the evaluation process.

Instead of human evaluators at StartupYard going and screening using interviews, we automated the process to save StartupYard’s staff a tremendous amount of time otherwise devoted to sorting through subpar applications. The applicants themselves, stand to benefit. A star applicant will hopefully not have to worry about their application being missed simply because a human evaluator is exhausted after reading hundreds of applications that day.

Cleaning out the mess

We started by measuring the completeness of each application. StartupYard has a detailed application form that expects applicants to have well developed understandings about everything from their business plans, their end users, and how they differentiate themselves from their competitors. One obvious way to cut down on the number of applications a human evaluator would potentially have to go through is to sort the applications based off on how complete the application is. Our first step was recording the percentage of questions answered and sorting our applications accordingly. We also calculated into this score the presence of working social media accounts, websites, video advertising, profiles, and other pieces of their digital trace.

Distribution of raw percentage of application completion within our dataset. The median represents the middle of our dataset where half the data is on one side or the other. Q25 is the 25th quartile. To the left of that point we have the bottom 25% in terms of completing their applications.

One of the first thing we noticed was that while for the most part applicants filled out the information requested of them, a portion didn’t, 7.8% of applicants filled out less than 60% of the available fields. At 60% these applicants had left many crucial pieces of information missing and possibly weren’t taking the application very seriously. We sorted these to the bottom of the evaluator’s pile.

But this methodology alone doesn’t take into account the content of each response, just whether or not there was a response. Ideally we would like analyze the content of each response in order to gauge its relevance to what StartupYard is looking for. Luckily we didn’t have to conceptualize an AI literary critic to solve this problem. We had taken into account incomplete applications but now we want to sort out the poorly written ones.

During our initial conversations with StartupYard, we learned that grammar and spelling mistakes were common in many applications and a source of frustration for human evaluators. We analyzed applicants responses for spelling and grammar errors. The median applicant had 11 spelling mistakes throughout their application and we can see in the visualization below that this is pretty standard for most. To err is human and even though we took into account proper nouns that could have set off false positives in our spell check program, for the average applicant a few misspelled words are the norm.

Kernel Density plot showing the distribution of applicant spelling errors. Q25 marks the bottom 25% of the applicants in terms of spelling errors. In other words the top 25% of our applicants in terms of not making spelling mistakes. The y axis is the proportion of the total applicant pool with the number of spelling mistakes listed on the x axis.

Looking at the distribution above we can see that there is a skew to the left in our distribution. In the bottom 25% of applicants in terms of spelling mistakes the number of mistakes descend rapidly.

It would be reasonable to suppose that in more complete applications there would be more spelling errors. You can’t make spelling mistakes in questions you don’t answer. But our analysis shows that this isn’t the case with our datasets, where the two scores were slightly negatively correlated. One hypothesis could be that applicants who took more chances at making a spelling mistake by answering more questions, were more likely to have the sense to use spell check. The conscientious are consistent in that regard.

We also checked the applications to see if they fit into technology domains StartupYard was interested in. Applicants were asked to self describe their domains but we went a step further. With our algorithms we analyzed the vocabulary used by applicants. We have managed to get a robust algorithm that can pinpoint the category that a startup falls in based on the vocabulary they use to describe themselves. Scaling these words appropriately, we created a robust score of how relevant an applicant was to their claimed field of expertise. We brought the most relevant to the top and punished the rest.

Top legend shows domain when categorized by keywords associations. The length of each bar indicates the number of applicants classified by keywords according to their own self designations. For example there are 8 applicants who designated themselves as VR/AR companies but our methodology would classify them as “Other” which is our designation for applicants who do not fit into a domain based on keyword usage.

From our own analysis a large chunk of startups don’t articulate their self described domain through their applications. While talking to StartupYard we discussed that, while anyone can claim to be an AI or VR firm what was important to StartupYard is that they actually demonstrate competence in their applications. Rather than simply trusting applicants to accurately self-describe their domain, Decissio does its own Natural Language Processing evaluation to ferret out a startup’s real tech expertise or lack thereof.

After these previous factors were analyzed we took into account more of the hard numbers behind each applicant. StartupYard asked applicants how long they and their teams had been working on their startups. We transformed the number of man hours completed over the lifetime of the startup into a normalized score, which we built into what we call the effort ratio.

Each individual line represent a single applicant and their effort score on the bottom of the figure. The 90th percentile is labeled as Q90 and marked by a red line. Everything to the right of the Q90 mark is in the top 10 percent in terms of effort scores.

What we see here is a large rightward skew in the dataset. As most applicants are early stage startups their combined hours, transformed into an effort score largely hover around the low end of the scores. But there is a significant subset of applicants who worked on their project far more than other applicants. For some investors this may be a useful metric to follow while for others not so much. Decissio is here to help bring clarity to your investment decisions, and with this metric investors can get a fuller picture of their business prospects.

Some applicants already had established revenue streams and previous investments. We automatically tabulated the ratio of revenue to investments used them for our analysis of the applicants, favoring startups with a better record of turning investments into revenue. We did this while still taking into account startups that had received investor money but were pre-revenue at this stage. Like with effort scoring, this is a metric that investors can filter when looking at their prospects. Below is a chart of applicant reported revenue per investments:

This chart should be interpreted in a similar manner as the above. One line equals on applicant, the vast majority of applicants are overlapping lines below the 90th percentile.

The top 10% of our distribution, everything to the right of the the Q90 mark is largely a return on investment (a score of 1 to 1 on revenue to investment dollars). There are some applicants claiming to be making much higher returns. Through our own current capabilities investors can not only filter prospects based off of previous ROI, but also quickly and easily analyze a prospect financials, in the context of other factors.

As a final variable we utilized our own Natural Language Processing techniques to tabulate a media mentions score for each applicant, both the team members and the startup as an organization. We pulled information from hundreds of news startup and business sources to create a score based on the sentiment, quantity, and quality of the media mentions. In this dataset applicants either had either positive media exposure or none, so we don’t see any negative media score. For investors using Decissio, our media mentions score can inform research into firms that have been shortlisted for consideration. A positive media score could indicate to an investor that an uninspiring prospect on paper, may be worth looking into if they had managed to attract positive media attention. Also a negative media score could signal that a promising startup could be hiding something that the media had already picked up on.

Taking a step back within each of our metrics we can see that within StartupYard’s applicant pool a normalized set of data hovering around the median but have significant groups of outliers that stand way apart from other applicants. While the average applicant had an average revenue per investment ratio of .44, the top 5% had a ratio of 2. In all of our scores we have tracked the notable few that outperform weather that’s in terms of effort, ROI, or publicity. On the flipside we have tracked those on the opposite end when it comes to sloppiness in their communications (spelling mistakes, domain classification, completeness scores) or bad publicity. In the end for each investor these metrics will mean different things, but with Decissio these metrics will be at the fingertips of investors.

--

--