Image courtesy of Super Data Science

Key takeaways from the DataScienceGO live stream

This year I attended the DataScienceGO live stream. In this article I will share my key takeaways from attending the conference remotely. But before we dive deeper let me introduce you to the event.

DataScienceGO is an annual conference focused mainly on helping people with their career in data science. The theme of this year’s conference was to narrow the data science gap between the offering and demand for data scientists. The conference consists of workshops, presentations, networking and general socializing with like minded people. There is a big emphasis on the community, something that the live stream viewers are unfortunately left out of. The conference provided three different tracks for the participants on-site: the practitioner, the newcomer and the manager. Each providing catered content for the different use cases. The live stream contained talks from all categories even though it was not specifically mentioned which talk belonged to which category in the online agenda.

In the opening keynote Kirill Eremenko, the main organizer for the event, spoke about the market demands and how they do not fit with the data scientists on offer. This although data science has been coined the sexiest job of the century by the Harvard Business Review. His main message was that together we need to narrow this gap. Data scientists should better focus on the market need and the market should better understand how to apply data science in their own field and what competences are actually required. We must stop looking for the data science unicorns.

What to learn to become a data scientist?

There were a couple of talks focusing on what learning path to take in order to master the skills needed in data science. Randy Lao gave his point-of-view in a talk named “Master the Fundamentals”. One of the main points of his talk was OSEMN (read: awesome), which is an acronym from the words: Obtain, Scrub, Explore, Model, and iNterpret. This path took you all the way from the skills required to get data to work with, clean and transform it, understand the data, build and optimize a model and finally presenting the results. Other interesting points were the data science hierarchy of needs (similar to Maslow) and remembering to be problem oriented.

Tarry Singh presented the five plateaus to applied AI. First one must master the fundamentals such as statistics and programming. Second comes data visualization, analysis and related tools. Third is understanding machine learning. The fourth step is mastering deep learning and the final step having the skills to apply the learned skills in the industry be it healthcare, manufacturing, auto industry or aerospace among others. You can read a good overview of the 5 steps in the Forbes article that was just released.

Jorge Zuloaga’s take on the topic started with a little bit of history in AI slowly progressing to the 3 big applications today: image recognition, voice recognition and recommender systems. He highlighted the importance of being able to bridge the gap between business goals (conversational and vague) and the BI report (quantitative and precise). He added that the utilization of automated machine learning platforms will increase. One tip to succeed with the business questions was to be specific. Instead of focusing on ROI through customer lifetime value one should aim to be more concrete and see how much can be saved/increased in for example one years time.

The keys to success in addition to technical competence

Eric Weber held a very good talk about the importance of soft skills in your career. Although this talk was focused from a data scientists point-of-view, the tips he provided were generally applicable. He defined what soft skills consist of, motivated why soft skills are needed and how one can go about to develop them. One interesting take on the topic was “one minute with the CEO” where he outlined they key points to focus on when talking on the executive level. What are the pain points we’re trying to solve, what differentiates this solution from the rest, why should they trust this model/solution and how would you go about pitching your idea?

In DSGO 2017 Ben Taylor coined the term reckless commitment. This year Rico Meinl talked about what an impact that talk had made on him and how he had decided to live up to that guideline for the past year. It was an interesting and well told story about how he back in Germany started his own data science meetups, learned specific topics that he would later stand on the stage presenting at different events and so on. The main point was to bravely commit to even reckless commitments. After you’ve really committed to a goal the brain will help you find the path to the solution. His mantra was: Commit. Fail. Improve, in the footsteps of Edward Deming. You can read his entire presentation on Medium.

“There is no such thing as failure, only temporary defeat.” Rico Meinl

Ben Taylor held a talk also this year and even if I did not attend last year’s conference I kind of got the feeling that he continued where he left off last time. He emphasized the extra mile one needs to take in order to actually stand out from the crowd. In this occasion you want to be the outlier. The three key takeaways for me were:

  • Get your hands dirty — there is nothing to gain unless you put some actual effort into it.
  • Hustle — find means to maximize your efforts.
  • Be ready to go beyond the regular — almost everyone in the field have done the “hello world” type of tutorials. Do something special to stick out from the crowd.

On the importance of storytelling

The benefits and eventual deployment of a data science project can depend on the success or failure in conveying the project results in an understandable manner. Luckily we got to see a few talks focusing on storytelling and visualization in order to improve in this aspect. Kristen Kehrer talked about making presentations that bring the model to life. Put concretely, we want to transform the confused stakeholders to advocates of our work. She approached the subject through case studies of her previous work and what could’ve been done differently to improve the message. The key takeaways were to think of the flow of the presentation, present actionable results, recommend next steps, and practice makes perfect. Remember to focus on the problem and results that the model solves instead of presenting the technical solutions. Have a look at this blog post for more information.

Nadieh Bremer presented stunning visuals and her journey in data visualization. Her personal projects are divided into three phases: data, sketch and code. After deciding a topic it was time to find related data, she gave her tips on a myriad of ways how to gather and prepare data. After that she sketches her visualizations on paper, often taking inspiration from other elements. Finally it is time to code the solution. She used d3.js for the visualization and later learned to create custom SVG paths. You can see the projects on her web page visualcinnamon.

Mollie Pettit held a visual walk-through of Illinois Traffic Stop Analysis. It was interesting to see how she went about showing the steps and level of detail increasing as she progressed in her presentation. The biggest takeaways for me was critically interpreting what the (intermediate) results might or might not mean. It can be easy to make hasty conclusions when you first see the results. In addition, we once again saw nice looking presentations and visuals using d3.js. There is a caveat though. Making visuals with d3.js requires more commitment and effort compared to other more simple libraries or tools and that’s why it’s best suited for dynamic and/or interactive presentations on the web.

What does the future look like?

Hadelin de Ponteves kicked off the second day with a fun and informative presentation but did not forget to cover the more serious side of AI as well. He went through the current top algorithms for different use cases, paying extra attention to deep reinforcement learning as it has the most potential in the future. However for me, I think the biggest point was that he brought up the need for better control in the future. The models can no longer be black boxes, but instead we need transparency and dialogue for equality in terms of AI.

Sinan Ozdemir held a presentation about the challenges of AI from a data privacy and ethics point-of-view. He gave good examples of the challenges in different industries such as biased data used in criminal justice and building trust in the field of cyber security. Similarly to Hadelin, Sinan gave some good insights for what we can do to open the black box in the future in order to get more transparency in how a model performs. As a European it was refreshing to hear him point out how we are in the forefront of data privacy with GDPR regulations and such, while US and China are lagging behind while competing for the technological breakthroughs. Funnily here in Europe we seem to be more concerned that we’re losing to others continents in the field of AI because of the same reason. Maybe we should leverage our strengths and what we have in terms of AI and data privacy so that we are well prepared for the future?

There was also a discussion panel with the theme “emerging technologies” that consisted of Mark Skinner (Senior Solution Architect at NVIDIA), Rachel Wang (Manager of Data Science at TrueCar), Ben Taylor (Chief AI Officer at Ziff, Inc) and Pablos Holman, a world renowned speaker, inventor, hacker, and entrepreneur. My key takeaways were that we must not be afraid to fail but instead focus on what we can learn as well as evaluate whether the assets can be transformed and re-used for something else useful. We should put effort to find the right problem to solve if we want to minimize the risk of failure. The top list of current and future favorite technologies included Generative adversarial networks, self-driving cars, virtual reality, cloud computing and sowing robots! I’ll return to this last item next.

Saturday’s keynote was held by Pablos Holman who talked about innovating and breakthrough research. He presented inventions from Intellectual Ventures, such as disease modeling, vaccine storage and using nuclear waste for energy production! It was really inspiring to see what is possible with disruptive innovations and how you can contribute to make this world a better place to live. Here is where the sowing robots hopefully come into place some day. Pablos pointed out that we need a hacker mentality and people who dares to break the rules and follow their own path in order to reach new breakthroughs. He also emphasized that technology cannot solve all our problems referring to Maslow’s hierarchy of needs.

Conclusion

All in all the conference provided some interesting talks and catered something for everyone interested in the field of data science. It is good to keep in mind that I have only scratched the surface by taking part in the conference remotely. Maybe next year there could be more advanced solutions for discussion and communication within the online community, as well as provide a panel/AMA for online participants? All in all this was a good experience and I’ll probably attend next year as well. This conference clearly showed that the field of data science is really broad. To anyone aspiring for a career in data science I’d say that it’s best to start with a subject that you are passionate about and put your efforts to that. It’s better to master one thing than try to learn a little bit of everything, since it doesn’t help you stand out from the crowd. There’s a lifelong journey ahead of you to master other topics later on. And remember to commit, fail, improve…. REPEAT!