Risk-takers communicate better
Why do many data scientists fall short of delivering the desired value of their work, and how can they fix that
Communication skills are paramount to all jobs, including data science. The data-related job postings clearly show that good communication is not just a required skills, but also rare that it has to be emphasize.
Poor communication stems from risk avoidance, egocentricity and laziness.
Many advices focus on the mechanics of the language. Don’t write long sentences, avoid adverbs, use subordinate clauses wisely and download Grammarly. But like painkillers, these won’t help unless you fix the main disease. Why can’t smart people convey their messages clearly? I believe it boils down to incentives. Wrong incentives lead smart people to risk avoidance, egocentricity and laziness. These three problems are the focus of this post.
Even with data, you have to be opinionated
Imagine a data analyst being to check the bounce rate for the different pages on a company’s website. As a SQL ninja, she will craft a perfect query, get the data and hand it to the requesting stakeholder. For me, this is a junior-level piece of work. It has two big problems, one is easier to fix than the other, so let’s start with that.
She did not ask why! Without knowing why this data is needed, she cannot decide how to represent the data, whether to exclude some irrelevant pages from the analysis, break it down by new vs existing users, etc. You should not stop at what they want, but ask them why they want it. Your analysis should serve the latter question, not the former.
She did not take risks! Data people sometimes forget that they can interpret data better than many others. Thus, when they are asked to analyze some data, they are implicitly being asked to come up with their conclusions as well. Often, junior analysts are afraid that their conclusions might be wrong. They don’t want to take responsibility for it. What do they do? Put all necessary and unnecessary outcomes in their results. They throw the burden on the reader.
Without data you’re just another person with an opinion, but with data you still need to have an opinion.
But with no risk, there is no reward. Think of any organization. The leaders there have to think for themselves, be opinionated, and take risky decisions. Even in a data-driven organization, their value still comes from making decisions based on the incomplete data they have. If it was all written down in the data, no one would have had any additional value to offer on top of it.
P.S. Although you can omit some irrelevant details, or intermediate data, you should still give the reader a way to validate, replicate or build on your analysis. A link to the code is always a good practice.
Your job is to serve the reader not yourself
I quickly touched upon the need to ask why in the previous section. Getting to know the intentions of someone is one step ahead into putting yourself in their shoes. You need to understand the different stakeholders and empathize with them. Only then you can deliver them the value they are looking for. In the end, they are your readers, and as June Casagrande wrote about readers in her book, It was the Best of Sentences, it was the Worst of Sentences:
“This is the rule: whether you’re Christian, Jew, Muslim, or a disciple of the church of Penn Jillette, when you sit down to write, the Reader is thy god” — June Casagrande
By understanding your reader, you understand what vocabulary to use, what kind of charts they can easily read, and what context they already know or don’t know. The next three examples say the exact some things in different wordings:
- For every 100 users the model flag as fraudsters, 85 of them are actually fraudsters. And for every 100 fraudsters, the model is able to detect 65 of them.
- The model’s precision is 85% and recall is 65%
- P=0.85 & R=0.65
Now, I want you to think of the different people you deal with at your job, and try to see which one of these three forms is more apt to each of them. Hint! You cannot do that unless you understand the background of each one of them and how how they think.
Business people call it deck, technical people call it slides, and old people call it PowerPoint. Choose your own register wisely.
It’s not only about the use of different vocabulary, or register, but also how to format your data, what to show and what to hide, and much more.
I like to say, good graphs beat tables, and text beats bad graphs. Graphs are usually easier to read than tables. Nevertheless, if the reader is not used to a certain graph, there is no use of showing it to them. Think box-plot or KDE, not everyone is trained to read them. In such cases, I’d pick some other format to represent my findings. Furthermore, stick to conventions rather than inventions. If scatter plots is used to show correlations, stick to them unless you have strong reasons to use something else.
And just like text, brevity is important here. If you cannot convince them with your text, don’t confuse them with your numbers. Keep your graphs and tables lean. Remove all irrelevant details. The more noise you have, the weaker your message is. But of course, make sure to put all the needed labels, to make sure your graphs deliver answers not additional questions.
I jokingly like to say that I PEP8 my SQL queries. I am sure the SQL community has their formatting style-guides. Regardless of your favorite style, the main point is to make one’s queries more readable. Think capitalized keywords, indentations, and spacing. Now we are thinking of two groups of readers, the reader of the query itself, and the reader of the query results. When you execute this query, “SELECT item.id”, the resulting column will be called “id”. Not very meaningful to the reader. Thus, I prefer “SELECT item.id AS item_id”. Notice how I kept the keywords capital. Furthermore, although AS is not required, it helps with the readability of the query.
Stop calling your analysis, analysis. No one calls their baby, baby. Call it after the question it tries to answer.
Be lazy sometimes, but not all of the time
As you can see, taking risks, understanding your readers, and going an extra mile, all require you to exert some effort. Is this effort always justified? Ideally, it should be justified most of the time. You may be thinking, some things I do are just quick and dirty and their small impact doesn’t justify the effort. True, but you should then take a step back and think if you should be doing these low impact stuff in the first place. If most of your job is low impact stuff, then rest assured that Google and Amazon are on their way to automate you. I like how this post from Intercom compared low impact work to snacks:
“If you want to have a high impact team stay away from low impact work. Eat, don’t snack” — Des Traynor
Furthermore, laziness is a habit that is hard to get rid of. People who use adverbs such as “a lot of”, and “most of the time” and “sometimes”; they use them because they are too lazy to put real numbers into their statements. I bet you can see the irony here. I have just used similar adjectives. As you can see, I am not able to tell what percentage of your work should be high impact, and what exact percentage should be low impact, so I hid behind words like “most of the time”, and “sometimes”. I promise to do better next time.