How to build ML Products in a Start-up scenario?[1/3]

manish patkar
4 min readAug 15, 2020

--

Welcome to my first ever blog, In this blog(Series) I will try to detail the process of building products involving the application of machine learning.

My first major assignment as a Product Manager was to help build a Resume Parsing Service (extracts key entities from a Resume). Resume Parsing is a very complex problem involving the use of different Deep learning techniques, image processing, and a lot of heuristics! The complexity arises from the fact that there is no standard format and the user can end up writing Resumes in their own-unique way. All further discussion would be derived from the experience gained by building this product.

Product Development of an ML-heavy product.

The typical Product development for an ML Product varies from the usual Software product lifecycle as it involves continuous model deployment and feedback through metrics that are used to improve the model. Also, the effectiveness or the confidence of the solution is only understood as one does more iterations!

We can divide the entire process in the time-space into broadly three parts:

  1. Plan- Phase
  2. Execution
  3. Production- Phase

1. Plan

This is the most crucial step and critical for the success of the Product.

Always stick to the first principles and ask why all the time??

Look whether the problem really exists or if you can build any rule-based solution that gives good coverage. Understand customer pain points, figure out the top three things you can do to solve this and estimate the impact of such a feature!

It is very easy to get biased with what ML/Deep learning algorithms can do and one may end up over-promising on outcomes which can later result in a lot of disappointment to the end customer/Business. Following approach can help you streamline your plan and determine outcomes-

Analyze all possible Datapoints and build use-cases

Look for different data points and categorize what inferences can be drawn from each or a combination of these. You can create a table that includes all these data points. It would be extremely helpful if you understand some basic algorithms or understand them superficially(you can always take help of your Data Scientists)

Typical data-points for resume parsing

Quickly Build a Vanilla Model and Benchmark Metrics

The best way to solve this is to use agile principles and quickly build a vanilla Model and test your hypothesis. Do a quick iteration and look at the metrics and come up with an action plan with your Data scientist on how could the metrics be further improved?

These metrics can be your benchmark metrics which can help you and the business understand realistic expectations and viability of the product. Some use cases require a high-confidence metrics while others can do with medium confidence.

For example: Consider a Banking use-case where you are building applications around Fraud Detection. This requires you to have a model that gives you high metrics as every false detection could lead to a large loss. On the other hand, for the above case of Resume Parsing, it is established that not every resume could be parsed exactly and hence the metrics around these would not be on the higher side(though desirable)

Build a Road-map and then execute!!!

A Product Road-map is a way to build certainty and process to achieve the end objectives.

  • Based on the initial iteration and customer priorities, build a short-term and long-term objective. Ensure you break down the objectives into specific sub-goals and use techniques like KPIs to measure success.
  • Talk to all the key stakeholders and obtain their feedback on the product plan. Set up clear expectations while getting their buy-ins on the objectives and approach.
  • Sit with your Data scientists and architects to build a detailed plan based on priorities. Idea is to build a basic MVP whose features can be scaled into more robust solutions.
  • The development process may not be straight forward, there could be many days where one might not see promising and stable results or see huge progress in a short-span- so include some buffer time in your RoadMap to accommodate this.

Set a Baseline and improve from there! The release should be planned in a manner that you start with basic pre-built models and build a stabilized feature, measure the output- set this as a new baseline, and then move on complex solutions.

  • Look at what kind/scale of data is required, set up a process for data-tagging, and ensure the quality and delivery in time for Data-scientist’s to use.
  • Also, have a brief plan on the deployment strategy, how big would be the model size, or how quickly you want a response, etc. Ensure that the business in line with these plans!

Thanks for making it till here! follow me for detailed articles on the next steps :)

--

--