5 Reasons why your Data Science Project will Fail, and what you can Do about it.

Anass Bensrhir
Data science Central
6 min readMar 1, 2017

(This article was originally posted here : https://www.linkedin.com/pulse/5-reasons-why-your-data-science-project-fail-what-you-anass)

If you are exposed to Data Science news, or if your daily job is to try to gather as much value from data as possible, you might have read a lot of articles and success stories of initiatives that transformed many corporations and businesses, but we don’t tend to find enough return of experience about why some initiatives failed and how to solve them.

I thought it would be good to share my thought through my experience on why Data Science and Big Data initiatives fail, and what are the patterns to follow in order to overcome these issue.

5 Reasons why your data science initiative will Fail :

1. You don’t have the right team

Don’t fire your lead data scientist just yet, you still need to be reminded how PCA (Principal component analysis) is important before a Kmeans clustering, but what you need most at the inception of your project is people that can turn data into value.

Feedbacks have been made to favor teams composed of :

  • People with high technical knowledge of data science and Big data
  • Domain experts that can provide insights at the right moment
  • Business savvy people that can turn an idea into value and have the story telling skills to provide a bankable business case.
  • Contractors or consultants to bring a different approach and expertise
  • Pink or Blue collar workers that can get you field impact insights

If you found THE person with all these skills then good for you.

2. You got Overwhelmed by the Technology

This might be the complaint I hear the most, technology keeps changing each time, I tend to hear about a new framework or technology stack each day, so it might look there is no safe way to go forward without taking a risky move, so before taking a blind bet from a Gartner analysis and purchasing an all in one platform , Take a look at what really works within early birds organizations who were successful on previous Data science initiatives.

Here is Magic Quadrant based on the tools data scientists actually use:

(source : http://www.oreilly.com/data/free/2016-data-science-salary-survey.csp)

3. You Over-promised

A successful project is usually a project that has a substantial financial or technological return of investment , recently , we can witness many success stories emerging within many sectors, and large investments are conducted constantly within those organizations.

Often these investments come with big expectations, especially if you promised your CEO that the technology you acquired and the expensive team you hired will get your company a larger market share. failure to do so will be frustrating and can jeopardize the whole operation.

Everybody agrees that broader performance improvements from large-scale investments in data analytics often don’t appear right away, and need time, talent, and constant improvement.

4. You have execution problems and don’t have the power to get the work deployed into production

This is an organizational problem that occur very often within mid-sized or large organizations, when you have to follow the procedure to deploy your project and get feedback from the field, which isn’t the way how these initiatives should work.

Successful initiatives prepare for that very well, and make substantial organizational change to allow these teams to succeed, there are many ways to solve this issue, some companies go with :

  • The Taskforce model, where the team is constituted by several data scientists, domain experts from multiple departments, blue collars and a senior executive to make sure the efforts get deployed ,tested and valued.
  • The new venture model, where an internal company or entity get created, and it serves other departments with value (as described on the figure below).

Both models work well depending on the situation, but the most important part here is that the team should have enough support an authorization level to get things done, hence the need to create a Chief Data Officer role within your organization.

5. Data exchange within your organization is nonexistent and Data Quality issues

You might be surprised to know that departments within many big companies don’t share data between them for many reasons, you should know that even with seasoned data science and big data experts, you will achieve nothing if the data access is restricted or the data quality is terrible.

As a stake holder, you need to make sure that you can have access to reliable internal data feeds, large companies made great initiatives to solve that issue :

  • Internal open data platforms
  • Data Exchange programs
  • Internal hackathons

What you can do about it Now :

1. Start Small

Don’t get me wrong, having 1000 nodes Hadoop cluster and in memory databases sounds impressive, but it shouldn’t be your first move towards a data driven organisation, it is an evidence that premature scaling and optimization is the root of all evil, the key here is to start small, access the business value, technology maturity , available skills than scale if all those are available.

2. Look for the Business value First

Yes, you guessed it right, [1] and [2] are the sweet spots that you should look into first, those are the use cases with so many success rates, like Churn prediction, Up-Selling/Cross-Selling and product recommendation, these are the safe bets you can build your successful Data science program upon, but don’t forget about [3] projects that are technically feasible but the business value isn’t clear yet like data monetization it can prove to be a perfect source of revenue for many entities or departments, the important thing here is that you shouldn’t come close to [4] which is a career limiting move.

3. Invest in People

One of the most important laws of economics is the law of scarcity which is the “fundamental economic problem of having seemingly unlimited human wants in a world of limited resources or talent”, as you might have guessed i’m talking about talent scarcity, in order to have a successful data science team you need to acquire, train and retain the best people on the market, and those people don’t come up cheap, and they need a constant amount of training and motivation, also they need to be up to date to the latest success stories and technology advances.

4. Be Agile

I feel sorry to tell you that waterfall project structures don’t work at all with Big Data / Data Science initiatives, what we proved to work 2 months ago can be done efficiently with a new technology (does anybody still work with map reduce or pig latin scripts ?), The model we developed last month can be more precise with this new data feed, should we rebuild the infrastructure from the ground up ?

The key here is to be agile and use short iterations in order to obtain feedback that you can use to make decisions about what to do next.

If you want to discuss more about how you can build a successful data science initiative, and you happen to be present in Paris the 6, 7 March, I’ll be happy to talk to you at the Paris Big Data Summit.

--

--

Anass Bensrhir
Data science Central

BoldData's CEO , Big-data Consultant , I write about Strategy and Marketing.