What coding skills do devs want to develop

How do developers’ skills related to each other and how will the skills evolve?

In a recent Stack Overflow Developers’ Survey, 70% of the respondents specified their role (front-end, database admin, etc), among which many see themselves as wearing multiple hats.

devs of all trade

We can see web development is a generic skill shared by devs of all trade, while machine learning specialist and quality assurance engineer are more specialised.

And here are the number of respondents by their roles:

Median number of languages used frequently is 2.

number of programming languages used

Around 80% use 1 to 4 programming languages, while possibly know more languages.

With that in mind, we can compare languages respondents have used vs want to acquire.

Current skills vs Future skills

A snippet of the survey data

As the data contains what devs have worked on vs what they wish to develop, we can extract the additional languages /platforms /frameworks /databases, which is oftentimes a list of several items. We can look at them in aggregate and in details.

We can see much interests in TypeScript, Swift, Go, Scala, R, Python, Haskell, Rust and so on.

And if look into co-occurrence of languages in each individual’s wishlist, it looks like this:

language skills devs want to acquire
platforms that devs want to work on in addition to what they have worked on
frameworks that devs want to work on in addition to what they have worked on
databases that devs want to work on in addition to what they have worked on

Further questions and thoughts

For a dataset with co-occurrence patterns, how can we intuitively visualize the portion with co-occurrence together with the portion without?

For example, among respondents using SQL, most of them use not only SQL but also other languages.

I feel one way is to introduce interactivity and animation — by providing the option of zooming in to language of interests, one can see portion of respondents using this language solely vs using it as part of the tool kit, and then span out to details of co-occurrent language skills.

For survey data, how can we be sure it’s representative of the population? And if it have imbalance of categories, how can we take that into account in analysing a dataset with network patterns?

If javascript is the most popular language surveyed, it could simply be there are a lot of web developers returning the surveys. As a result I am cautious of using node size to indicate number of respondents, so meanwhile all the node size are showing number of degrees.

How can we best visualize the evolution of network in a collective manner given all the individual changes?

For example, a developer uses Go, Python, Ruby today and wants to use Go, Python, Ruby and PHP in future. How can we visualize the added language given the current language set? This is also different from the scenario if a VBA programmer wants to use Python instead, which is a shift of skill instead of augmentation of it.


This is #day40 of my #100dayprojects on data science and visual storytelling. Full code on my github. Thanks for reading. Suggestions of new topics and feedbacks are always welcomed.