#6: Engineering Considerations That Product Managers Should Watch out For

Yael Gavish
3 min readJul 25, 2017

This is part 6 of the 6-part tutorial, The Step-By-Step PM Guide to Building Machine Learning Based Products.

While it’s largely your engineering team that is responsible for designing a solid system architecture, business requirements can impact an ML system in ways that could turn a great system into a bad one. The more you as a PM have the right discussions with the engineering team upfront, the higher the likelihood that you’ll end up with a system that fits your business needs. While the following list is far from exhaustive, it should give you a sense of the areas you should poke more into.

  • Real time requirements. Can the results of your algorithms be calculated in advance, or do they need to be calculated in real time? As you get new data and/or the users enters specific information or performs certain actions, do your models need to produce updated results that reflect the new data immediately? The architecture of a real time system is vastly different than that of a system that can process data offline.
  • Data and model dependencies. Your system may be comprised of multiple models where one’s output is the other’s input. When the output of one model changes, do any other models need to be re-run? When data is added or modified, which models need to be re-run (or sometimes even re-trained)? How quickly should the updates happen — what are acceptable SLAs?
  • Data collection frequency. The rate at which data is collected and accumulated affects both the design of your pipelines and the storage method you would choose. It is determined by how frequently the data actually changes, how critical it is to know that it changed and how expensive it is to acquire the data (in terms of purchasing cost if you need to pay for it, as well as retrieval, processing and storage cost). For example, if you’re collecting data about earthquakes, the data may not change very often, but when it does change you want to know about it immediately; if you’re collecting data about opening and closings of restaurants, the data may change more frequently than earthquake data, but if you’re a day early or late to discover the change it’s not as critical to the overall usefulness of your product.
  • Data collection methods. How is the data collected — in batch or streamed on a continuous basis? Is it being pushed or pulled? Real time requirements complicate this further. Your system may need to support multiple data collection or upload methods — APIs, file uploads etc. It needs to be modular enough to support all the necessary methods and separates data ingestion from data processing (which is a good software engineering principle regardless).

That’s it! You should now have enough knowledge to start diving into ML yourself. You will certainly develop more knowledge as you think through your own business problems and how ML applies to them, but hopefully you have some tools to make the journey easier. Even if ML is not the top priority for your organization, understanding the power of it and how it works can get you thinking in that direction until you identify a compelling enough use case to make the jump.

Let me know what you think — was this tutorial useful? Does this jive with your experience? What else would you like to know? Leave your feedback in the comments!

I write about life purpose, mindset and creativity for professionals who want more from life than they’re experiencing at https://producthumans.com/