Data Science Unicorns and Where to Find Them

Nenad Bozic
SmartCat.io
Published in
7 min readJan 16, 2018

Each project in this Big Data world is going through a carefully paved path. First, there is the acquisition phase, which is followed by the data analysis phase and the last and most important phase for a business is prediction. Below you can see a pyramid that we like to show in order to explain the above mentioned phases.

These phases open up additional challenges for companies going down this road, so new skills are needed and it is hard to find the right people. You need a multidisciplinary team than can solve all the challenges of going through these phases. On the other hand, the level of understanding all the hype words of Data Science and Big Data engineering presents a challenge on its own. Everybody is talking about it, everybody is talking that they do Big Data and Data Science but people have a hard time explaining what their needs are and what they exactly do. Tools and technologies in this field are popping up each day and things are moving so fast.

But let’s go step by step: we will explain each phase and the challenges it brings to the table. We will touch upon those multidisciplinary employees that you need within your team that will bring you to prediction phases and explain why we think they are unicorns that are hard to find. Last but not least, we will finish with our vision and what SmartCat does to overcome these challenges in providing our services.

Data Acquisition

This is the first phase; you know that your business generates a lot of data, volumes are there, variety, velocity, and you need a system or a platform which can successfully store this data. You still do not know what to do with the data and the answer to that question will arise from the next analysis phase, but you know that this data is valuable and you will pull valuable information later on. You decide to store all the raw data in a scalable fault tolerant fashion and provide a hook for the business analyst to figure out what to do with the data later on. For this phase, you will need engineers who are familiar with all the tools and frameworks in Big Data landscape and you will need to talk to them. You have certain needs which you are aware of, and certain needs which you still do not know you need and you need a consulting partner who will guide you in the right direction.

In this phase, the most important skills your team should have are Big Data engineering and DevOps. With DevOps skills, you can answer important first step questions, such as should we go to cloud or have on premise infrastructure, what kind of hardware do we need to process and store X amount of data that we need, do we need fault tolerance, which provider promises the most value for the money.. In addition to these questions, or in parallel with them, there are questions regarding tools and technologies out there which could solve our problems, what tech is out there, which products are mature enough, which are compatible with one another, what are the pros and cons of each of them, what is a learning curve…

As you can see, there are lots of questions out there, and you need people with experience in both infrastructure and engineering to help you answer those questions. Lots of teams we work with are struggling with these questions and this phase is the most important since it unlocks the gate to the following phases which bring true value to the business directly.

Data Analysis

The next phase after you figure out how and where to store the data is data analysis. Here you will need people in your team who have a good understanding of the business, but also an extensive knowledge of tools and algorithms out there. You need to come up with ideas with fast ROI (return of investment) so stakeholders can be happy. Because of the lack of focus in phase one and decision to store all data (just in case you need it later on), the cleanup of the data for specific use case is an important part of this phase.

You will need people who will attend business meetings with you and search for gaps in day to day business that can be filled up with smart solutions. Then they will go and search through data, to figure out if they can fill up the gaps with some ideas that come out of this exploratory phase. For this phase, Data Scientists need an easy hook for their visualization and query tools, and they usually hook to analytics database or data warehouse, and try to reason about the data based on what they see and what they hear the business needs (on business meetings they attend).

The output of this phase is an idea paper with options such as: we can do feature X which will solve the problem we have and optimize the process we do, and the final goal will be raising customer satisfaction of our product which can be measured in this way.

Prediction

This is the final and the most important phase. On the basis of the idea paper we can build a certain solution and we iterate through phases to move from historic data and statistics to prediction. We want a solution which is tightly integrated into our system, which can help us with our day to day decisions in near real time. IoT, social networks and all information generators have unlocked this, but the problem now is that we are overburdened with data, and we need help from SMART software which will either help us with decisions or make decisions instead of us. We want to predict traffic jams so we can synchronize traffic lights better, we want a smart solution in our car which will tell us when is the time to go to a mechanic, we want a smart solution so we know when is the time to move the stocks out of the warehouse to our retail store.

For a fully integrated solution that predicts events of interest, you need a good collaboration between the Data Science team designing solution and Big Data engineering team which need to implement this solution and integrate it with the existing system to work in near real time, in a fault tolerant and resilient fashion.

Data Science Unicorns

Going through these phases, we have raised a couple of important questions you can probably relate to if you are in the process of changing your business in order to make smarter decisions. The first obstacle is hiring this new guy, Data Scientist, since this is now really popular and it is obvious that you will need him in order to build a SMART system (as Bernard Marr mentions in his book and on this website in the Think Smart part). There are very few data scientists out there. Going through these phases, it is clear that a Data Scientist needs to have an engineering background, and DevOps background to understand the tools and technologies out there. He needs to have great communication skills to attend meetings with youand he needs to understand the business and technology. He also must have great presentation skills since he needs to sell his idea to the business. He should have good mathematics and statistics skills to reason about the data, as well as machine learning in order to make predictions. These skills are really hard to find in one person, and if you manage to find at least some of them, those guys are really expensive to hire (because they know they are unicorns). Most of the good ones are already working full time in Silicon Valley for big companies and earning big bucks.

SmartCat solution

We at SmartCat have thought about this for a long time and we have decided to offer all these services by a remote team. We have A and B team (Analyzers and Builders), A team is a team of data scientists who analyze data and pull features out of it, generate ideas and listen to the business and its needs. B team is here to help build stuff; they are carefully selected engineers who are familiar with the infrastructure and technologies and integrate the solutions which result from the work of A team into our clients’ systems. We are constantly working on a variety of different projects where we are building up experience and can make better judgements about a specific use case based on this wide knowledge. Through our A and B team, we have clear focus and separation inside our company — A team is working and constantly improving business, communication and presentation skills and Data Science knowledge while our B team is a team of engineers constantly learning about the tools and technologies important to make those ideas become production-ready solutions. With our A and B team and their collaboration we build smart solutions for our customers, tightly integrated into existing systems which are ready to work from day one.

If you are aware of all the potentials of the current Big Data landscape and you want to transform your business to be smarter, but you have a hard time finding the right people for this organization change, we can help you. Give us a call, let us know about the problem you are trying to solve, or share ideas about your business and we will generate a few ideas of our own. If you like those ideas we can help you implement them and transform your business.

Originally published at www.smartcat.io.

--

--

Nenad Bozic
SmartCat.io

Data enthusiast and Apache Cassandra fan. Co-founder of SmartCat company providing Data Engineering, Data Science and DevOps services.