A Data Science Landscape, One Year After

Christophe Bourguignat
5 min readOct 18, 2015

--

This is the transcript of the “Data Scientist 2015” Paris conference opening keynote.

A Kudu

Hi everyone,

I prepared this keynote by asking myself a question : what topic would I have mentioned, if I had to do the introduction talk of the last year edition ? Would it have been still relevant today ? Or already totally outdated ?

Last year, for instance, I definitely would have tried — yes, I say tried — to describe what is a data scientist. You know, this fictional role, half math nerd, half software geek, and half communication skilled. Three halves showing that it doesn’t really exists. Today, I’m still even more confused. A recent survey depicted a data scientist as a spider with …. 25 feets ! Maybe after this conference day, we will know a bit more about this new role. And understand how broad it is.

However, compared to last year, we start having data about data scientists. After the quantified self, it’s time for the quantified data scientist — data science on data scientists. Two weeks ago, a linear model predicting the salary of data scientists was published.

What is noticeable ? If you are a girl, unfortunately, you lose points. It won’t surprise anybody. Too bad, but even the data scientist job, like lot of technical positions, cannot escape this rule.

More funny, the more time spent in meetings, the more a data scientist (/analyst/engineer) earns. And if he spends too much time exploring data (4+ hours / day), he earns less ! That beats everything !

Data science on data scientists : a linear model to predict how much they earn

Some months ago, I would have been criticizing the lack of awareness of France, on what represents the data revolution. Let’s recognize that the landscape has changed. A new role has been created — the France’s Chief Data Officer, who recently also became the France’s Chief Information Officer, showing that public IT moves to a more data-centric approach. France now also has its own data science team, and a new word is born : “mégadonnées” — “Big Data” in french.

Henri Verdier, France’s CDO / CIO

Last year, I would have talked about pioneering companies, experimenting with data — doomed to a bright future. Today, I would be more nuanced. Data maturity of companies is very disparate, and the most advanced of them start doubting. 75 % have invested in Big Data, but only 10% have projects in production. For the first time “machine learning”, one of the key component of data projects, is falling down in the last Gartner “Hype Cycle”.

Companies face disillusions. And ask themselves questions : I know how much it costs, but how much do I earn ? What is the ROI ?

Even projects with small data surface new problems — how do I use my data scientists discoveries ? This implies change management, modifying established business processes. One retailer, for example, learned that it could increase profits substantially by extending the time items were on the floor before and after discounting. But implementing that change would have required a complete redesign of the supply chain, which the retailer was reluctant to undertake

For the first time “machine learning” is falling down in the last Gartner “Hype Cycle”

On an other level, technological this time — and because data science is about a lot of technologies — I would probably have mentioned Map Reduce. An algorithm designed by Google about 10 years ago, to allow distributed processing of large volumes of data. A short time ago, it was a star. Today, it is outdated by a tsunami called : Spark.

Lets’s take an other example. Two weeks ago, Cloudera announced Kudu, a new columnstore bypassing entirely HDFS, the de-facto current big data storage technology. Aside from the fact that it helps data scientists improve their zoological knowledge (the Kudu is woodland antelope found throughout eastern and southern Africa), Kudu makes analysts wonder if HDFS joined MapReduce in the emerging “legacy Hadoop project” category

On the other hand, I would undoubtedly not have talked about Deep Learning. A branch of Artificial Intelligence (AI). Neural networks, incredibly powerful, that learn from data like — and sometimes better than — humans. This domain made recently decisive advances. These algorithms showed how they were able to paint, write, or compose music. What’s next ?

Depp Learning Paintings

Neither would I have talked about ethics. Yes, ethics — who would have thought it comes to the debate ? A society where every single decision regarding citizens is driven by predictive models, raises concerns.

That’s why data for good, transparency in predictive algorithms, and education about AI are currently growing topics.

To conclude : don’t try to remember too much what I just exposed, it will be partially obsolete next year ! At least, it’s my prediction.

One thing, however, will remain. D.J. Patil, named recently by Barak Obama “US Chief Data Scientist”, wrote in 2012 in a famous and visionary Harvard Business Review article, that Data Scientist would become the Sexiest Job of the 21st Century. I’m also deeply convinced about that. Data Scientist is one of the most thrilling job of the world, and this will remain unchanged for a long time. We are a just at the beginning of the story.

I wish you an amazing day.

--

--

Christophe Bourguignat

Data enthusiast #BigData #DataScience #MachineLearning #FrenchData #Kaggle