InReach Ventures` Tech Stack

Ben Smith
4 min readJul 7, 2017

--

=== Edit ===

I feel like it’s prudent to tell people that very little of this post is true any more. DIG has come a long way in the past ~6 years and it’s tech stack has evolved dramatically.

===

Christoph’s post a while ago on Point Nine’s Tech Stack was a great read. I love it when people pull back the curtain to demystify what can be quite an aloof industry.

At InReach we build and use software to power everything we do. From discovery to executing an investment: it’s all software. Inspired by Christoph, in the spirit of openness I thought I would share our tech-stack and maybe, just maybe, it will encourage others to do the same…

It all starts at Discovery

Finding the traces of an interesting company, from some dark backwaters of the Internet or our network of partners, and building up a snapshot of where that company and its founders are at and how they’re trending.

This is what DIG, our dealflow platform, is constantly doing: searching for and stitching together these traces and signals from a whole host of APIs / websites.

The aggregation layer is built in Java8, plugged together with Dropwizard and hosted on AWS ElasticBeanstalk. The individual pieces of data are stored in DynamoDB and put on SQS queues by a Lambda function listening to the DynamoDB streams. As the new data points are received from the queues they modify the canonical Company and Person data that is then indexed in ElasticSearch.

With so many pieces of data flowing through the system, it made sense to model them as reactive streams (Observables with RXJava in our case). We also used Hystrix through Tenacity to manage all the external dependencies with circuit-breakers; the system needs to remain resilient to the vast majority of the external systems going down.

Predicting the Probability

As company data comes in it is scored by our Machine Learning component. Firstly it classifies whether a company is ‘spam’ or not, i.e is this a tech Startup. Once determined not to be spam (which is very accurate now) the more interesting scoring happens. We predict the probability of whether we would be interested in talking to this company about an investment opportunity. Not whether we would invest, but whether we want to go deeper and find out more.

DIG’s ML cluster is powered by Apache Spark and written in Scala. We prototype and iterate on our algorithms using Jupyter notebooks then migrate them to Scala when we’re happy with the results.

Deployed to Production

We continuously deploy the various components from master to production from GitHub via Codeship. Once there, we monitor and alert through Papertrail (logs), Wavefront (metrics) and Pingdom (shit the bed). We use Waffle to organise ourselves, but we could probably just use GitHub Projects and be just as effective.

Users are authenticated and managed with Auth0. The front-end is a single-page React app built in ES6 that gets deployed to Netlify, which is a static site hosting platform which allows you to configure HTTP rewrite rules. We also use Lambda and the API Gateway to support different front-end functions (like computing stats).

Making Decisions

We are able to train our ML models because, from DIG’s first incarnation, we have been building a training set of decisions: is this company a tech startup and, if so, do we want to talk to them?

If the decision is positive, this kicks off our get-in-contact / CRM process. We use emailhunter.io and Clearbit to try and work out the best email to use. We send emails (that are written by Roberto) through Mailgun. We then use Streak as the CRM to manage the process. Streak has a very complete (if nuanced) API. At this point Streak is really the Gmail plugin for DIG.

Collecting Data

When we proactively contact an entrepreneur, we ask them to fill out as much of https://funding.inreachventures.com as they can. This creates a uniform pipeline for us and speeds up the investment process — not having to spend the first call going over basic company info. This is built with SurveyGizmo. My opinion of SurveyGizmo is not high.

Qualification

For each company in the process, we automatically create a Google Drive folder for them which DIG will fill with any files we are sent (automated through Streak). We also create a Google Doc to keep all notes we take.

We’re currently using Zoom for all video calls. Video quality between multiple people is better than competitors and worth the hassle of installing yet another video app.

Staying Connected

We use Slack for ongoing conversations with our portfolio entrepreneurs. This is where we have the most work to do. We are actively designing what portfolio management and community should look like in a next generation VC.

Just the Start

We’ve come a long way but we’re still only scratching the surface of how software can revolutionise VC. There’s so much more we need to build and so many exciting products we want to try out to ultimately make investing as efficient and scalable as possible, putting informed decision making into the hands of Partners and getting investment and support to the next breakout European entrepreneurs.

--

--