The Soft Skills in Data Science

Jay Chung
USF-Data Science
Published in
3 min readJan 18, 2024

In technical fields, “soft skills” like effective communication are often perceived as less important than so-called “hard skills”. However, the truth is that communication skills also play an important part in many of a tech professional’s daily responsibilities, including collaborating with colleagues, writing, and giving presentations.

It’s a common oversight to view technical roles in isolation, focusing solely on the technical chops required to manipulate vast datasets or devise complex algorithms. However, the reality is that even the most technically proficient individuals don’t work alone. Having strong interpersonal communication skills makes it much easier for coworkers and teammates to work together to generate ideas, solve problems, and learn from one another.

The Communications for Analytics course was aptly designed to cultivate the soft skills required for technical positions.

Communications for Analytics was taught by Professor Robert Clements who has had approximately 10 years of industry experience where he primarily worked on developing machine learning models in different domains. Most recently, he was a Senior Director of Data Science at Optum. Robert has a PhD in Statistics from UCLA and his interests include making data science concepts more accessible to a general audience through visual methods,the combination of data, code, and models with art, and ethical considerations in AI and data science. For this MSDS cohort, he is teaching Communications for Analytics, Ethics in Data Science, and Introduction to Machine Learning.

One of the basics of the course was AIM (Audience, Intention, Medium). All these factors should be considered when determining how to communicate.

  • Audience: Always keep in mind who the audience is — are they a customer who has very little technical depth or are they your manager who has at least as much technical and domain knowledge as you?
  • Intention: What are your intentions? Is it to help your marketers understand the technical details that they will then translate into customer value or is it to explain an error you saw to an engineer?
  • Medium: Is it a quick Slack huddle? Or is this information going to be presented at the quarterly all-hands meeting?

In fact, one of the assignments in the class was to explain a technical concept to an audience with varying technical depth. Below are two of my responses from this assignment.

For a non-technical audience with no data science or computer science background:

Imagine you are a librarian. You probably would want to come up with a way to organize the books in your collection so that the visitors can easily find books that they’re looking for. For example, you might split the library area up by genre and first character of the author’s last name so that each shelf in the library has books that match a specific combination of genre and first character of the author’s last name.

A hash table works in a similar way but with data records instead of books. A computer can create a hash table where they create buckets of records that are determined by certain characteristics of the records. This way, using a hash table the computer can later easily locate which bucket to go through to find the record, instead of going through all the records.

For an experienced data scientist who has encountered the topic many times before (your manager at work):

A hash table is an important database design tool to increase the performance of data retrieval. First, it relies on a hash function to produce hash indexes for each record. Then once those hash indexes are created, you form a hash table that consists of two columns: a column with the hash indexes and a column with the references to records.

When you are retrieving data, if you’re using hash tables, your search will be much faster. You first apply the hash function to the record you’re looking for. Now, you know which hash index the record is tied to and you only need to go through the records with that hash index. Additionally, there are nuances like whether a hash index is the best form of index to use and which fields to create the index on, which should be determined by understanding what types of queries will be most often used.

--

--

Jay Chung
USF-Data Science

AI Product Manger; student and Class President of MS in Data Science at USF