Data science for startups

Real-world examples on how to generally use data science for startups with a data-driven approach.

--

Are data scientists valuable to startups with a data-driven approach, even when startups are not Machine Learning (ML)/Artificial Intelligence related startups?

The answer is YES! I will use some examples from practice to demonstrate this, but first, let’s refresh our memory on what startups and MVPs are.

Startups are new companies that want to validate their innovative ideas and strive to have fast growth.

MVP means “Minimum Viable Product”, a bare-bones version of a product, required to achieve a proof of concept — the validation of a product value hypothesis. MVP is often used in creating new software that will be beta tested and later upgraded with extra features. The whole point of an MVP is to collect feedback and validate a specific strategy.

Key startup areas are:

  • Team Formation,
  • Lean strategy,
  • Fundraising,
  • Product Discovery and Design,
  • Evolutionary Architecture + Continuous Delivery, and
  • Growth.

*Reference: Startups 101 with Faris, Faris Začina, co-CEO @ MOP

We focus on Product Discovery and Design and Evolutionary Architecture + Continuous Delivery startup areas in this article.

I think we warmed up nicely for the following:

How data scientists can contribute to MVP development?

I’ll explain it through the example of the app for giving gifts.

We need to be data-driven, but we don’t need an ML/AI approach for the MVP -> Still, we, data scientists, can bring value!

OK, but how?

The image below shows the data scientists’ contribution to the startup areas of interest for the gift giving app’s MVP:

  • Research and Analysis,
  • Design, and
  • Development.
Application of data science in different phases of MVP.

Notice that data scientists should be outstanding team players -> we work with team members performing various team roles.

Exciting, right? So, let’s start with Research and Analysis MVP phase.

Here, we seek the answer to the question:

Who are our potential users, and who should we choose for the user interviews?

Research and Analysis: Who are our potential users, and who to choose for the interview?

We don’t have users’ data (or any other data), so we do web scraping (legally, of course :D) and get Users’ first and last names, photos, and locations.

Using Python libraries, it’s possible to get the user’s ethnicity from the last name. From the photo, we can get the user’s age and gender.

Using Python libraries py-agender and ethnicolr, we can get the user’s age and gender from the photo. From the last name, we can get the user’s ethnicity/race.

Do you recognize Monica Hall from the Silicon Valley series?

If you still haven’t watched it, you should. You will find out more about startups and why they are so awesome. :)

Let’s imagine Monica is one of our users, and from her photo, we want to find out her age and gender. Also, besides the character’s last name, we use the actress’s real last name, Crew, to see if we get the same results for her ethnicity/race. Then, we use ethnicolr and pyagender Python libraries to get the mentioned information. Ethnicolr contains the USA statistical data, and pyagender uses TensorFlow and OpenCV in the background.

Finally, we get results: Monica Hall / Amanda Crew could be a 32-year-old Caucasian female. This result is pretty precise since this photo is few years old, and this actress is currently 35 years old.

*Note: The following methods of using Python libraries: py-agenda and ethnicolr are not suitable for production. We use them to enrich the data when researching and analyzing potential users.

We have the enriched dataset containing a potential users’ first and last name, photo, age, gender, location, and ethnicity and therefore can make some plots and statistics to find some insights. Besides Team Lead and Product Lead, we can help Designers to define User Personas.

Our app contains some products — gifts, so our next task is to make a Recommender System for the MVP that will recommend gifts to the users.

The main challenge for the Recommender System for the MVP is a cold start problem — we don’t have data!

Our users are gift-buyers, but they give their contacts — family, friends, etc., particular recommended presents. Therefore, we need the contacts’ data. So, how can we collect this data?

We can use two approaches:

  1. Generate data manually using some libraries, and
  2. Contacts fill the questionnaire.

We can collect data from the mentioned contacts using the questionnaire. We help designers design it with insights from the Research and Analysis phase.

After the data collection, we need to make this Recommender System somehow. We can’t use classic ML algorithms and approaches since they are inefficient when data is tiny. So, data scientists need to know the product and be creative. :) We have the contacts' attributes, but also, products/gifts have specific characteristics. We can compare these attributes and use some string matching algorithms.

You can find out more about Recommender Systems for the products in its early stage in my Medium article.

*Note: This kind of Recommender System is probably not suitable for the next product versions.

In conclusion, let’s summarize the negative and positive sides of being a data scientist in startups.

Negative sides:

  1. Cold start problem in the early stages of product development;
  2. The issue of data scientist specialization:

Changing area — example: Natural Language Processing -> Computer Vision;

Changing roles — example: Data Analyst -> Deep Learning Engineer.

Positive sides:

  1. A data-driven approach is essential for startups, therefore a data scientist is an asset to every startup.
  2. Application of unique skills make a product more valuable, for example, using Deep Learning to classify text and images.
  3. Data science can be used at various stages of product development without the product focus being directly related to Machine and Deep Learning.
  4. A chance to get an insight into the business side, apart from the technical one.
  5. There is a possibility of fast and significant progress and learning.

Notice that the positive sides prevailed over the negative sides, and it is true. This article is written by a happy data scientist who works in startups and has brought value so far. ;)

Business photo created by rawpixel.com

--

--