Disciplined Prototyping Approach for Big Data

This post was originally published on TheValdasBlog.

Over the last two years, I have been involved in Big Data and Analytics development. Cutting-edge technologies, machine learning, cloud, redefining businesses using data insights. At a first glimpse, it looks so innovative and engaging. In reality, it’s a continuous battle with cutting-edge technologies without clear objectives through endless prototyping activities.

Mobile and Web developers usually start with wireframes, mockups, and then start building prototypes. It’s a process that helps to identify risks and refine customer wishes. Instead of building full-scale solutions, developers use less costly prototypes.

Are there any similar, disciplined prototyping approaches for Big Data and Analytics? Yes!

Rick Kazman, Hong-Mei Chen and Serge Haziyev introduced Risk-Based Architecture-Centric Strategic Prototyping (RASP) model, which was developed, as they put it, to provide cost-effective systematic risk management in agile big data system development.

1. Vertical, Horizontal, Evolutionary and Throwaway prototypes. What is the difference?

I thought there is only Proof-of-Concept, Prototype, and MVP. Apparently not. A proper understanding of the differences between vertical/horizontal/evolutionary prototypes is essential to understand the RASP model.

Software prototyping has many variants. However, all of the methods are in some way based on two primary forms of prototyping: throwaway prototyping and evolutionary prototyping — Wikipedia

There is a substantial explanation on Wikipedia-Software prototyping. However, I find below images easiest to understand.

2. Why are Big Data solutions different?

Architecting Big Data systems is challenging because the technology landscape is new and rapidly changing, and the quality attribute challenges, particularly for performance, are substantial.
There are so many technologies and technology families, and most programmers have little, if any, experience in them. Some software architects manage these risks with architecture analysis, while others use prototyping. Especially with Big Data, prototyping is necessary.

3. Risk-Based Architecture-Centric Strategic Prototyping (RASP) model. What is it?

It is a set of standardized questions and decision procedures, as well as architecture analysis methods, to enhance prototyping activities. The answers to these questions help an architect determine whether to prototype and how to interpret the prototyping results. I attach a part of diagram, and the full chart is available here

4. How was it validated?

RASP model was validated via nine case studies of big data projects at SoftServe. Here you can find the case studies. Each project had different business and innovation goals, and their risks differed. However, consistent risk themes emerged, such as technology selection, the performance of new technologies, the total cost of ownership, and integration of technologies.

5. Alternatives?

So far I struggled with finding alternatives. It’s trivial to find high-level business explanations on how Big Data and Enterprise Data Lakes will make the lives of all better (really?), but I haven’t found any comparable papers on prototyping for Big Data.

Informatica created an interesting read on How to Run a Big Data POC — In Six Weeks. There is a checklist of what has to be done and validated at different stages to determine the complexity and reduce the risks. Informatica’s paper puts a lot of emphasis on data security and governance aspects.

6. Summary

Overall, I see many similarities with our chaotic prototyping approach, where we don’t use a standard process to run a prototype. Though, it would be exciting to analyze it further and make some improvements to our processes according to RASP model and measure how it improves our efficiency.

Links: 
- Prototyping for Developing Big Data Systems
- Strategic Prototyping for Developing Big Data Systems
- Strategic Prototyping for Developing Big Data Systems — Presentation
- RASP Case Studies validation
- Big Data Rapid Prototyping with AWS
- Informatica — How to Run a Big Data POC — In Six Weeks

Visit my personal blog