Starting Five of a Data Science Team

Vassily
6 min readOct 5, 2022

--

After the overwhelming success of the previous story, I simply had to continue writing. Ride the wave, they say. Strike iron while it’s hot. Milk the cow. Make it to 25 followers! I will, I promise.

Last time we were talking programming and football. Today it’s going to be equally surprising, inadequately disturbing, yet hopefully enjoyable — we will connect … Data Science and basketball!

Yes, I am going to keep squeezing these unconnected unrelated orthogonal apples and oranges. Next article — applied research and artistic gymnastics. Then — project management and alpine skiing (the slalom discipline, obviously). After that — talent acquisition and mixed martial arts. Fundraising and powerlifting. Customer success and golf. Technical support and baseball. Wait, don’t run away! I’m kidding! No baseball.

Anyway, back to the business. We’re going to talk teams and we’re going to talk positions on those teams. From my late masterpiece (nobody will promote it for me, so don’t roll your eyes, just click the link if you have not yet; thank you!) we know that software engineering is a team sport. There are many ways to structure a tech company but arguably each one of them would divide employees into groups. We will rarely if at all see a bunch of individual contributors working on the same thing for an extensive period of time. Unless it’s an open source project or an early stage start-up. Mid- and large-scale companies would usually have departments, groups, squads, guilds, teams, and whatnot. Those that deal with Data Analysis and/or Artificial Intelligence in one of its form will usually possess a Data Science team.

This group of savvy professionals who know how gradient descent is different from a decent graduate may consist of an arbitrary number or people holding equally arbitrary titles and fulfilling equally arbitrary roles that have nothing to everything to do with those titles. The problem is that these people, roles, and titles are often confusing for “normal” software people, they simply do not understand who these people are and what exactly they do. Luckily, we have me who is going to induce some order into the chaos. For the sake of the article I’m picking a certain configuration of roles/titles of a purely Artificial (and Intelligent!) team that is nonetheless very probable in the real world setting.

Disclaimer 1. This piece is based on my experience. If your experience is different, let me know in the comments. I usually read all two of them.

Disclaimer 2. All descriptions are fictional, some analogies are random. Any resemblance to real people is unintentional and cannot serve an excuse to stop talking to me.

So, what is a Data Science team? How is basketball [not] connected? When will S&P 500 bounce back? Let me explain. Do you know what MapReduce is? Me neither. Still, I’m going to map Data Science team members to basketball players and reduce the responsibilities and merits of the former to the roles and characteristics of the latter.

OK, without further ado, let’s start! Here is the starting five of the Dream Team that never existed.

Numero uno — Data Science Product Manager. He’s our pass-first, true point guard. Not very different from PMs on other teams. Initiates almost every attack, controls the tempo, distributes the ball and delegates assignments. Ideally, understands his players, knows their strengths and limitations, plans accordingly. Less ideally, may be carried away with his own dribbling which would result in squeezing the clock till the end and either turning the ball over or passing to someone right before the buzzer and expecting him to score.

Rarely scores himself, never actually guards or manages despite the name. Transforms coach’s directives into action items through the prism of his own vision and game understanding. [Trash] talks a lot. Mainly with the opposing team or to fans in the stands. However, may be the nicest guy around and make everyone around him better. The team that trusts its floor captain, runs the plays with enthusiasm and excitement. A great PM/PG brings you over the top.

Next. Center position, standing at cloud high tall — Data Science Infrastructure / MLOps Engineer. The guy fights at the hottest spot — under the basket where all the production happens. Occasionally goes outside the paint and ssh from distance. Throws [s3] buckets, blocks unauthorised attempts, rebounds practically everything and puts it either in or away. Doesn’t do what he shouldn’t, what he should — does well. Well defined role, much respected natural abilities. Nevertheless, has to constantly enrich his toolbox and broaden the skillset to stay in the game. Receives jealous looks from smaller players. Hunted by opponents intensively, paid accordingly.

The very presence of such player on the team changes the whole dynamics. Rare are developers and even more so researchers who like DevOps aspects of their work. Configuring environments, automating processes, dealing with numerous tools and utilities, continuously integrating, delivering and deploying — all of this continuously irritates, distresses and depresses their tender souls and therefore makes the team function much worse. Center position is indeed central, it glues things together and keeps them rolling.

Another essential position on a Data Science team is … well, you guessed right, — it’s a Data Scientist. The guy who does the magic or at least tries to do the magic. He is obviously a shooting guard on the team. One-dimensional (curse of dimensionality, duh) player, narrow domain specialist. Spot-up shooter, spills to his favorite spots, catches the ball and releases it with no hesitation. Trajectory to the basket is unpredictable. The ball may never reach the basket or even the rim or even-even the backboard. It couldn’t bother him less. Believes in the law of large numbers and in bunch of other statistical sheets. Uses advanced metrics to explain what he does. Nobody either understands or cares enough to dive into those. If he scores, everyone’s happy; he misses — they wait patiently for another attempt. Loves baseline. Thus, would generally go for a simple yet surprisingly effective logistic regression lay-up but is often tempted to shoot BERT from the half-court. Sometimes it even goes in.

The next in the lineup is Data Analyst. Classical power forward. Limited role, basic skills. More times than not he’s a late bloomer, coming from a different domain, being passionate about what he’s about to do on this new playground with the guys who’ve been there before. Doesn’t mind to do all the dirty work that no one is willing to deal with. Sets screens, goes for the [dash]boards, can step up and score if asked too. Loves the ball even if doesn’t get it too often. Learns and practices much to stay in the game. While on the bench, sits next to the coach, makes queries, analyses feedback, visualises actions. Hard working, easy going, humble but firm. Fundamental. On some teams soon becomes more useful than anyone else around in making valuable points and bringing victories.

The last and the least spot in our starting five is taken by Machine Learning Engineer. Whatever it means. He is our small forward. All-around guy. Does a bit of everything. Dribbles, shoots, rebounds, plays defense. Finds and exploits mismatches. Can fill in for anyone on the team. Nothing really special to tell. Generally, just a great guy. Writes ridiculous posts on Medium. Finishes them abruptly.

P.S. No clue when S&P 500 would make us happy again. I’ll ask the data scientist. Stay tuned.

--

--

Vassily

software engineer | data scientist | athlete | words lover