Data Science Best Practices

Will Roberts
IBM Data Science in Practice
2 min readMay 6, 2021
Photo by Alex Litvin on Unsplash

We’re very excited to announce that we’re publishing our previously private repository of data science best practices. It’s a Github repo created to help the consulting team from IBM Services (formerly IBM Global Business Services/GBS) team work with our clients to produce consistent, scalable, and performant data science solutions.

The best practices documented here represent years of IBM expertise, and the repository acts to preserve knowledge gained while IBM data scientists built actual implementations. It’s a resource you can consult immediately as you need to plan your own data science projects, and you can use it to understand what an IBM data scientist needs to add to their own toolbox to be productive with our clients. You can also use it as a template for planning what skills to hone for your own data science career.

IBM Services

First, who are the experts ? IBM’s GBS group builds tailored solutions for IBM clients when an out of the box (OOTB) solution is insufficient for their needs. IBM’s GBS clients span a spectrum of industries, so their solutions are similarly varied. It also means that they are not required to only build with IBM products as the backbone of their products — we meet our clients at their stack instead of asking them to move to a stack we sell. As a consequence, the knowledge documented in this repository details how to best use the tools of both IBM tools as well as those of other vendors.

What’s in the repo?

The repository is 21 chapters of written content and Q&A material to teach and contextualize the work of a data scientist in a broader organization. If it’s data scientists who are responsible for building artificial intelligence solutions, people tend to expect bring their pre-conceived expectations for AI to each and every of the products data scientists build. Some of that hype is well-deserved while some of the expectations are misplaced — a data scientist on their own cannot be expected to rebuild a business.

What to expect?

This repository is an invaluable resource for data scientists and organizations alike. We’re publishing the repo itself in one single commit, but make sure to check back for updates. We in the Data Science Community will also publish a series of blogs highlighting especially important concepts in the IBM blogging program.

Twitter [author][community]

Thanks to Thomas Pfeiffer, Wouter Oosterbosch, Fabian Lanz, Sebastian Hirschl, and Beth Rudden for letting us tell this story!

Credit also to Austin Eovito for reviewing.

--

--