Machine Learning System Design Stage: Problem Navigation

Paul Deepakraj Retinraj
5 min readJun 17, 2023

Navigating Machine Learning Problems: Organizing, Connecting, and Maximizing Business Outcomes

This series is to help provide the fundamental concepts for building machine learning systems.

Earlier in this series:

Machine Learning System Design: Template

Visualize and Organize the Problem and Solution Space:
To ensure a comprehensive understanding of the problem, it is essential to visualize and organize the problem and solution space. This involves creating diagrams, flowcharts, or mind maps that capture the various components and relationships within the system. By visually representing the problem space, it becomes easier to identify potential challenges, dependencies, and opportunities for improvement. Additionally, organizing the problem space helps in breaking down complex problems into manageable subproblems, enabling a systematic approach to designing the ML system.

The Problem Statement and Scope:
Defining a clear problem statement is fundamental to the success of any ML project. It involves understanding the specific objectives, requirements, and constraints associated with the problem. By clearly articulating the problem statement, stakeholders can align their expectations, and the ML team can focus on developing solutions that address the defined problem scope. Defining the scope helps set boundaries, enabling efficient resource allocation and mitigating potential scope creep.

Success Criteria:
Establishing success criteria is crucial for evaluating the effectiveness of the ML system. These criteria should align with the overall business goals and be measurable, allowing for objective assessment. Success criteria can include improvements in key performance indicators (KPIs), customer satisfaction ratings, revenue growth, or any other relevant metrics that reflect the desired outcomes of the ML system. Defining success criteria upfront provides a benchmark for monitoring progress and making data-driven decisions.

Key Performance Metrics (KPIs):
Identifying and defining appropriate key performance metrics (KPIs) is essential to measure the effectiveness and impact of the ML system. KPIs should align with the defined success criteria and reflect the specific objectives of the ML project. For example, in an e-commerce recommendation system, KPIs might include conversion rates, average order value, or customer retention rates. By regularly monitoring and analyzing these metrics, stakeholders can gain insights into the system’s performance and make informed decisions for further optimization.

Non-Functional Requirements (NFRs):
Non-functional requirements (NFRs) define the operational characteristics of the ML system. They encompass factors such as the system’s responsiveness, scalability, security, and availability. NFRs help shape the design decisions and influence the choice of technologies, architectures, and infrastructure. For instance, determining whether real-time or batch processing is required depends on the specific needs of the business. Considering factors like data staleness, scale, and SLAs aids in designing a system that meets the required performance and reliability standards.

Solution Space:
Understanding the solution space involves identifying potential approaches, algorithms, and techniques that can be applied to address the defined problem. This includes considering existing research, industry best practices, and available tools and frameworks. It is important to document assumptions made during the solution design process, as they can impact the system’s performance and feasibility.

Multiple Language Support:
If the ML system requires support for multiple languages, it is essential to consider the implications during problem navigation. This may involve language-specific preprocessing techniques, models trained on multilingual data, or translation services integration. Addressing language-specific challenges and ensuring robustness across different languages are crucial for delivering an inclusive and effective ML solution.

Connect the Business Context and ML Decisions:
Effective ML system design requires connecting the business context and needs to the decisions made throughout the process. This involves understanding the business goals, user requirements, and constraints and aligning them with the capabilities and limitations of ML algorithms and techniques. By establishing this connection, ML practitioners can make informed decisions about feature engineering, model selection, data preprocessing, and other crucial aspects of the ML pipeline.

Convert the Business Problem to a Machine Learning Problem:
Converting a business problem into a machine learning problem involves mapping the business context to an ML framework. This requires identifying relevant data sources, defining target variables or labels, and formulating a clear ML problem statement. By clearly defining the ML problem, practitioners can choose appropriate algorithms, design effective training and evaluation strategies, and ultimately provide valuable insights and solutions to the business.

Maximize Business Outcomes:
At the heart of ML system design is the goal of maximizing business outcomes. This involves continuously evaluating the impact of the ML system on the defined success criteria and making iterative improvements. By monitoring the system’s performance, gathering feedback, and incorporating insights, practitioners can fine-tune the ML models, optimize decision-making processes, and drive tangible business value. Regularly revisiting and aligning the ML system with the evolving business needs ensures its ongoing relevance and effectiveness.

Conclusion:
Navigating machine learning problems requires a comprehensive approach that encompasses visualizing and organizing the problem and solution space, connecting business context to ML decisions, and maximizing business outcomes. By understanding the problem scope, defining success criteria, establishing relevant metrics, and aligning ML decisions with business needs, practitioners can design effective ML systems that drive tangible results. With a focus on continuous improvement and evaluation, businesses can harness the power of machine learning to unlock new opportunities, improve decision-making, and achieve their objectives.

References:

Visualize and organize the entire problem and solution space

The problem statement,

the scope of the problem,

Success Criteria

Key Performance Metrics(KPI)

Establish Baselines

NFRs:

- — real-time or batch

— — — — How quickly do items get stale? (eg. do we need to content that’s only a few seconds old?)

—the scale of the system:

— — — — how many users/items etc

— — — — peak number of requests/sec

— — — — SLA

solution space

—assumptions

— — — -multiple language support

Connect the business context and needs to the ML decisions.

Convert the Business problem to a Machine learning problem

Maximize business outcomes

Metrics:

Online: (North star metric to use)

— Users’ engagement

- — — Positive — Time spent on view, like, comment, retweet, share etc

— — -Negative — Hide a tweet, report user/post/tweet as inappropriate

— — — Weighted engagement

— — — — — i.e — Comments have more weights than likes.

Offline:

— AUC, log loss, precision, recall, and F1-score.

— For the search ranking problem, you would use NDCG as a metric

Further in this series:

Machine Learning System Design Stage: Data Preparation

Machine Learning System Design Stage: Feature Engineering

Machine Learning System Design Stage: Modelling

Machine Learning System Design Stage: Model Evaluation

Machine Learning System Design Stage: Deployment

Machine Learning System Design Stage: Monitoring and Observability

--

--

Paul Deepakraj Retinraj

Senior Principal SE / Software Architect at Oracle Inc - Machine Learning, Deep Learning and Artificial Intelligence. https://www.linkedin.com/in/pauldeepakraj/