DATA ANALYTICS PIPELINE

Guide to Solution Design

Design Thinking for Data Scientists

Vinita Silaparasetty
Jan 21 · 7 min read

In this guide we will use elemets from the design thinking approah to help with designing our solution.

In order to improve products, we need to analyze and understand how users interact with products/services and investigating the conditions in which they operate. The solution-generation process helps us produce ideas that reflect the genuine constraints and facets of that particular problem. Design Thinking helps us to prototype and test our products and services so as to uncover new ways of improving the product, service or design itself. Design Thinking provides a solution-based approach to solving problems.

It is an iterative process in which we seek to :

1. Research consumer needs

2. Pool experience from previous projects

3. Consider present and future conditions specific to the problem to be solved

4. Test the parameters of the problem, and

5. Test the practical application of alternative problem solutions.

Benefits of Design Thinking:

  • Acts as a one-stop reference and guidance throughout the project for various stakeholders that are involved with the solution implementation.
  • Maps the business requirements to various aspects of the solution that will be built out
  • Provides the functional outline and technical architecture for the solution
  • Provide clarity to developers on what to build, to the testing team on what tests to run and to the client/customer on what to expect from the end product.
  • Helps come up with estimates for the cost, timeline and resource requirement for the project
  • Acts as a baseline for change control
  • Rapid iteration
  • Targeted feedback from relevant stakeholders, allowing a larger range of possible solutions to be considered in the selection process.
  • Avoid personal biases
  • Prevents selection of the first idea when a better idea may have come along down the road.
  • Includes ambiguous elements of the problem to reveal previously unknown parameters and uncover alternative strategies.

Solution Implementation Process

Phase 1: Planning

Step 1: Empathise

Define the relevant stakeholders at the beginning and keep them in mind along the way and perform subject-matter expert interviews.

This way you can meet the stakeholders’ needs.

Step 2: Identify the type of Analytics to be conducted

Depending on the problem, you may be asked to do one, all or a combination of the different types of analysis which are as follows:

Predictive Analytics — It is a method of studying historical data to make a short term prediction.
It involves

Descriptive analytics — It involves exploratory data analysis and helps to summarise data to get a better understanding of it.

Diagnostic Analytics — It helps to get to the root of the problem that you are trying to solve.

Prescriptive Analytics — In the industry, a data scientist must conduct a prescriptive analysis to come up with actionable solutions for a business. In order to impress employers, you can suggest solutions to the problem you have chosen to work on. It is an entirely optional step in the case of portfolio projects.

Step 3: Identify type of Machine Learning to be done

Supervised: When a model is trained using labelled data, it is called supervised machine learning. It is of two types:

  • Classification
  • Regression

Unsupervised: When a model is trained using unlabelled data, it is called unsupervised machine learning or ‘Clustering’.

Semi-Supervised Learning: Tasks include both problems we described earlier: they use labeled and unlabeled data. That is a great opportunity for those who can’t afford labeling their data. The method allows us to significantly improve accuracy, because we can use unlabeled data in the train set with a small amount of labeled data.

Reinforcement Learning: When a model is trained using a method of reward and punishment to encourage it to provide the desired output, it is called reinforcement learning.

Inductive Learning: From the perspective of inductive learning, we are given an input dataset and a set of
the desired output samples. The problem is to estimate the function i.e to estimate the output for new samples
in the future.

Step 4: Determine End Use

Analytical solutions could be used for:

1. Reporting for insights- we want to use historical data to understand performance and identify patterns and use as basis for future planning.

2. Forecasting — use the past data and the analysis to make forecast of the future. It is time related.

3. Predictive Analytics- use the past data and the analysis to make forecast of the future.It is not time related.

4. Optimization- use the historical data analysis and our predictive models to

generate strategies to optimally use resources given the constraints.

Step 5: Ideate

Think of all possible solutions to the problem, ignore feasability for now.

Step 6: Identify Limitations

The best solution in theory may not be the best solution in practice. This is due to the constraints that real world problems pose. Identifying what is

Always take into account:

Resources: Data availability, system specs etc.

how the solution will be implemented time constraints

Step 7: Create a road map

  • Identify Milestones
  • Set expectations for each
  • Set Deadlines for each

Phase 2: Model Design

Software design process can be perceived as series of well-defined steps. Though it varies according to design approach (function oriented or object oriented, yet It may have the following steps involved:

Class hierarchy and relation among them is defined.

Application framework is defined.

Step 1: Reuse or Invent

Determine if you will be using an existing model and tweaking it or if you need to build your model from scratch.

Top Down Design

Top-down design is more suitable when the software solution needs to be designed from scratch and specific details are unknown.

We know that a system is composed of more than one sub-systems and it contains a number of components which are further divided into sub-systems.

Top-down design takes the whole software system as one entity and then decomposes it to achieve more than one sub-system or component based on some characteristics. Each sub-system or component is then treated as a system and decomposed further. This process keeps on running until the lowest level of system in the top-down hierarchy is achieved.

Top-down design starts with a generalized model of system and keeps on defining the more specific part of it. When all components are composed the whole system comes into existence.

Bottom-up Design

Bottom-up strategy is more suitable when a system needs to be created from some existing system, where the basic primitives can be used in the newer system.

The bottom up design model starts with most specific and basic components. It proceeds with composing higher level of components by using basic or lower level components. It keeps creating higher level components until the desired system is not evolved as one single component. With each higher level, the amount of abstraction is increased.

Phase 3: Prototyping

A researcher’s understanding of the problem space can benefit from drawing out hypothetical complex analysis or models beforehand to grasp scope and test out possible solutions before significant efforts are invested in a full solution. Furthermore, rapid prototyping is proven to bring similar results as non-constrained prototyping, in less time.

Phase 4: Black Box Testing

It is the testing would mean testing Machine Learning models without knowing the internal details such as features of the Machine Learning model, the algorithm used to create the model etc.

Metrics to pay attention to:

  • Check the accuracy — The higher the better
  • Check Loss — The lower the better

Blackbox Testing Techniques for Machine Learning Models:

Model performance- Testing model performance is about testing the models with the test data/new data sets and comparing the model performance in terms of parameters such as accuracy/recall etc., to that of pre-determined accuracy with the model already built and moved into production. This is the most trivial of different techniques which could be used for blackbox testing.

Metamorphic testing- In metamorphic testing, one or more properties are identified that represent the metamorphic relationship between input-output pairs. In metamorphic testing, the test cases that result in success lead to another set of test cases which could be used for further testing of Machine Learning models. Test cases can be executed until all results in success or failure at any step. In case, one of the test cases fail, it could result in the logging of a defect which could be dealt with, by data scientists.

Dual coding- With dual coding technique, the idea is to build different models based on different algorithms and comparing the prediction from each of these models given a particular input data set. For inputs where the majority of remaining models other than random forest gives a prediction which does not match with that of the model built with random forest, a bug/defect could be raised in the defect tracking system. These bugs could later be prioritized and dealt with by data scientists.

Comparison with simplified, linear models

Testing with different data slices

Avoid Overfitting and Underfitting

Additional Tips:

  • Active and purposeful feedback: Gathering input frequently and intentionally from both technical and non-technical stakeholders can aid in both developing a deeper understanding of the problem and brainstorming approaches to find a solution.
  • Diagrams over descriptions: Communication of analysis, models, and findings can become complex. Help a non-technical audience understand the process; make it visual.creating process diagrams and frameworks to organize key learnings, identify areas of further interest, and communicate decisions to outside stakeholders.
  • Build on the ideas of others: Look internally and externally for what has been done and how one can build on that work.

Connect With Me:

, , , and .

Vinita Silaparasetty

Written by

Vinita Silaparasetty is a data scientist exploring the field of Artificial Intelligence, particularly in Machine Learning, Deep Learning and Neural Networks.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade