Thoughts on Developing an Executive Data Science Workshop

Paul Meinshausen
GreyAtom
Published in
10 min readNov 27, 2017

By Shweta Doshi and Paul Meinshausen

A few weeks ago a team of four from GreyAtom, including two founders and a mentor and instructor traveled to London to conduct an executive data science workshop for the Chief Data Officer of a top UK bank and his entire leadership team. This workshop was conducted in parallel with a four week practitioner level data science bootcamp held in Pune to upskill a class of analysts and technologists in the bank.

This was GreyAtom’s first executive workshop and this blog post will share some of our learnings from developing and conducting it.

Why did GreyAtom create an Executive Data Science Workshop?

GreyAtom’s core mission is to create practical, hands-on training based on real work for aspiring data scientists. This mission is carried out through our core Commit.Live product. An executive workshop wasn’t part of the original plan. However, two strong reasons motivated us to develop it.

Data Science remains a broad term for a whole lot of fairly disparate problems, skills, and capabilities. To train data scientists using real work we need to be constantly learning and updating our awareness of what real data science work involves. We have a lot of combined experience in data science at GreyAtom and we know we still need to be constantly updated and connected to core problems being met in industry. Engaging with leadership at top-tier businesses lets us learn where and how they need their data scientists to add value and focus their efforts. We want to train data scientists to meet specific needs, not to fit mythical stereotypes.

The second reason is both complementary and a flip-side to the first. As our team has worked with organizations and corporates to recruit GreyAtom graduates, we’ve realized that a data scientist’s success is not just determined by their individual skills and knowledge. It also greatly depends on whether their leadership understands how to enable them to be effective as data scientists.

In other words, companies need their leaders to learn to deploy and manage data science as much as they need their analytical employees to learn how to do data science. GreyAtom will therefore dedicate time to spend with business leadership and help them refine and improve their ability to deploy data science. We’ll also help our students identify companies where the leadership has a strong grasp of data science and therefore where our students can be successful.

So what did we learn?

We developed the workshop with a few key thoughts in mind. We knew that the workshop participants would not be people who write code on a regular basis or as a core part of their job. On the other hand we knew that the participants were an informed audience and since they were technical leadership, they would understand the basics. The participants were looking for a workshop that would provide practical knowledge that they could apply to the actual problems they were facing within the bank. Here are some of the principles that we applied and learned to apply through our first iteration of the workshop.

Keep things adaptive and interactive.

We had prepared enough content to more than fill the workshop’s allotted time. Dale Carnegie called this developing Reserve Power: “Assemble a hundred thoughts around your theme, then discard ninety.” In our case we discarded plenty and also still kept more than enough for the full three-day workshop. To decide what to actually use we could have compiled our own list of the top N topics that would fill each day most valuably. We knew that would be the wrong way to go because value is contextual. Despite all we did to understand what the team wanted before the workshop began, we still didn’t expect that our list of the most important topics would exactly match a list they would compose.

We also knew that our role was to be trusted advisors to the bank. As advisors we did not expect them to compile an exact list of topics (although we of course did ask them for their initial input). They were looking to learn from us and part of that learning consisted of learning what they needed to learn.

So instead we compiled all the sessions that we believed would merit attention in a five day workshop (even though the workshop would only be three days). And then we put ourselves in a position where throughout each day we could regularly update our schedule of material. If a topic came up that clearly deserved more time and attention we would push back other material. If a topic was simpler than we imagined or didn’t resonate as importantly with the team as we thought it might, then we would shift away from that topic in a direction guided by the participants. To put it briefly, we designed the workshop so that it would run like a Choose Your Own Adventure course.

This worked superbly and the participants really engaged in keeping the sessions focused on areas that mattered most to them. We believe it succeeded because we were prepared and did the work ahead of time to map out all areas of potential interest and develop material in those areas. The key thing is not just to be willing to adapt; it’s to be prepared to adapt.

Be as connected as possible to real work being done in the organization

It was helpful to have a training session for the bank’s data scientists running in parallel with the executive workshop. This happened for us more by chance than by design, but it’s definitely something we’ll try to engineer purposefully in the future. Reviewing ongoing progress in the data science training with the data scientists’ leadership merged the operator and executive viewpoints together clearly and concretely. It also made everything feel more compelling and important; and that feeling makes a big difference in how engaged participants get.

We also made an effort to connect the workshop content directly to the leadership team’s own real work. For example, at one point in the workshop we explored different architectural patterns and designs for data platforms. During that session we invited some of the participants who worked on the bank’s data platform to walk through their own designs in light of some of the principles and ideas we had discussed. We did something similar when reviewing the topic of machine learning models.

Material resonates more when the examples come directly from the participants. Instead of just lecturing and seeing the participants as passive recipients, we structured the workshop so that they were regularly taking the reins and applying the concepts we discussed to their own problems. Those sessions gave us a chance to give feedback directly relevant to the participants’ responsibilities. That was valuable for them.

This also gave us a chance to understand their responsibilities and problems more clearly. We were then able to come back to Mumbai and incorporate our findings into GreyAtom’s curriculum development. When you’re running an educational workshop it can be scary to ask participants to take the driver’s seat. It turned out that those sessions were some of the most valuable and memorable of the weekend for both sides.

Get hands-on for a part of the workshop

Even when senior executives come from technical, engineering, and even computer science backgrounds, they may have last written code half a decade or more ago. Some (or most?) may have always worked in business functions and never wrote code in a professional (or otherwise) capacity. So an executive data science workshop was going to be quite different than GreyAtom’s core data science training where students are spending most of their time elbows deep in the work of writing code. However, we decided that it would be useful to give our executive participants a chance to experience the hands-on world of data science for a part of the workshop. And it turns out they enthusiastically agreed and enjoyed this part of the workshop.

We believe strongly that data scientists need to regularly step away from their terminals and code to see and explore the real world context of their business problem. If they’re building a user-facing product, they should use the product themselves and talk to users. If they’re building a tool to help business decision-makers, they should go and talk to those decision-makers and understand their context as deeply as possible. This same principle applies to those who lead and manage data scientists. You will be a far more effective data science leader if you develop a basic understanding of the tools data scientists use and the engineering/developer environments they work within.

During the workshop we gave participants the chance to build a couple of toy models in a jupyter notebook. Participants have probably heard of jupyter, but they may not have had the chance to see for themselves how the tool enables rapid iteration, visual data exploration, and sharing and collaborating on ongoing work and analysis.

Recognizing the right questions to ask — e.g. know your data generating process

For all the correct and helpful answers that data scientists can provide to business decision-makers and leaders, they can also provide a lot of wrong, misleading, or ultimately impractical and unhelpful answers. Leaders can avoid those situations by getting better at identifying the kinds of questions within their organizations their data scientists are in a position to answer. You save a lot of time by not sending your data scientists on months-long goose chases. Then when data scientists return with answers, it’s also important to know what you need to be asking in order to help validate and verify their results. The way we think about it, the right position to take in data science is “Trust, but Verify”.

A good example of how and why this is important is “The Replication Crisis” happening in the field of psychological science. For those unfamiliar with this ongoing story, it basically amounts to an evolution in the methods of science wherein simpler statistical methods are being recognized as insufficient and potentially misleading and are being replaced by more rigorous and precise methods. This is a good process for the discipline of psychological science and it’s being driven by more careful and informed (and skeptical) consumption of scientific findings.

Business leaders have much to learn from this and should apply a similar degree of careful and skeptical reading of the work done by data scientists in their organizations.

This was a critical part of the workshop. One particular example was our discussion of what in data science is called the ‘data generating process’. In a world beset by the marketing material of Big Data, executives should be wary when results appear to emerge from vague, amorphous datasets and sources.

Leaders should be ready to ask a systematic set of questions that prompt data scientists to clearly identify and document the processes that generate the data they used. This includes the technological routes and procedures that enabled the data to reach their database. It also includes an informed and clear presentation of the statistical and probabilistic models that fit the phenomena they’re investigating (whether it’s customer complaints to a call center, or transactions on a website, or credit histories for loan applicants). Even when machine learning algorithms that are difficult to interpret are ultimately being used in production, business leaders should expect to see statistical and visual explorations of the underlying data precede more complex and black-box methods.

Some additional and concluding thoughts

As GreyAtom has developed its core data science training programs and Commit.Live product in the past year, we’ve thought a lot about the balance between effectively teaching the foundations of data science and catering to our students’ keen interest in the cutting-edged methods and techniques of deep learning and artificial intelligence, etc. We’ll keep refining and navigating that balance.

We also think it’s fascinating and important to think about the skills we currently consider fairly unique to data scientists which might become a more standard body of knowledge for analytical professionals over the course of the next few years.

Back in the 18th century the German philosopher/poet Goethe called double-entry bookkeeping “one of the finest inventions of the human mind”. That’s a pretty generous description for something that most of us probably consider a fairly banal part of business. More recently the development of spreadsheets comes to mind. To understand how spreadsheets so significantly changed the nature of business it’s worth checking out this fascinating Planet Money podcast on the topic. This quote by a journalist over thirty years ago from the end of the podcast and linked article sums it up: “The spreadsheet is a tool, and it is also a world view — reality by the numbers.”

Spreadsheets remain important and useful even for data scientists. For example, John Foreman the head of data science at MailChimp, wrote an excellent book on data science with examples entirely built in spreadsheets (“Data Smart”). But more importantly, knowing how to use spreadsheets almost equates to literacy for many modern professions. So we ask ourselves regularly: what skills that are mostly practiced by data scientists today will soon become a far more universal skill in business and industry? Maybe something like Pandas for basic data cleaning and transformation?

Ultimately at GreyAtom we want to build an educational experience that recognizes the value that comes from the broad democratization of some parts of data science as well as the value that comes from areas of extreme specialization and focus that will bring deep innovation. We also believe that as we help to train and develop more data scientists, more work needs to be done to help executive leadership and management enable their data scientists to succeed.

We thoroughly enjoyed our chance to do our first executive data science workshop. We’ve received a lot of interest from some other companies in doing a workshop with them. We’re not sure yet exactly how this kind of program fits with the core GreyAtom business. But we’re excited to learn and grow and excited to see where our journey takes us.

Shweta Doshi is Co-Founder and Head of Strategic Partnerships at GreyAtom and Co-Founder at DataGiri, the largest Data Science community in Mumbai.

Paul Meinshausen is an Advisor at GreyAtom and is Data Scientist in Residence at Montane Ventures, an early stage Venture Capital investor based in India and the U.S.

--

--

Paul Meinshausen
GreyAtom

Data Scientist in Residence at Montane Ventures, Co-Founder & former Chief Data Officer at @gopaysense, @Housing, @Teradata, @datascifellows, @Harvard