Neo Salacup
Feb 12 · 4 min read

The Annual Data Visualization Community Survey for 2020 has responses to 50+ questions and was taken by over 1,700 people. It’s public data covering professional data visualization details like salary and hourly compensation, tool use, location, audiences, organizational structure and more. The dataset can be found here if you want to see to visualize and analyze it.

By definition, data visualization explains the importance of data by placing the data in a visual context. And hopefully, I can interpret it based on my understanding of the data. You may have different visualization results on the dataset, like using dashboards, where unobserved text, patterns and correlations can be visualized using visualization software.

Let’s begin processing the data by cleaning them, but I didn’t use sophisticated or more advanced tools like Tableau or Python to analyze. Instead, I choose the basic — EXCEL.

I will explain as we go along why I choose this kind of visualization instead of different methods or showing the various trends of each of the survey questions.

If you notice, there are so many BLANK or NULL values in the dataset. If we plot the dataset, this should look like respondents have answered GREEN and questions that skipped are the RED. It’s quite exciting, right?

Snapshot of the cleaned data with over 60+ survey questions and over 1700+ respondents in coloured-tabulated excel form.

Each of the respondents, thirty (30) percent, skipped answering the survey questions on average. Some studies that using skip logic means better data. Skip logic is a survey design feature that lets you send a respondent to a later page of your survey or a specific question on a later page in your survey. It means you can design personalized surveys where respondents answer only the questions that apply to them. As we go along, let’s understand the trends in the skipping behavior of the respondents.

After cleaning the data, the most skipped question is — “What are you pursuing a degree in?”; translates to a whopping 93%, over 1600+ respondents. The free text questions were also some of the more problematic ones, as respondents struggled with the purpose of generic degree questions. Hence, 37 respondents skipped 66 out of 67 questions. That’s the maximum number of questions skipped by a single respondent.

From looking at over 60+ question survey, we’ve seen that skip logic can influence a lot. The least skipped question is “Are you a Data Visualization Society member?” corresponding to only 51 out of 1700+ respondents or 3 %. And the minimum number of questions skipped by a single respondent is five (5) questions.

Besides, the completion rate-percentage of the respondents who completed the whole survey — was relatively low. We can see in the middle of the survey the spike in skipping frequency hits by the respondent. Thus, the overall increase in the frequency of skipping the questions could be correlated to the length of the survey.

By showing the trends in the survey questions that require NUMERIC input, the question “How many hours per week do you spend doing data visualization as a hobby or side project?” is mostly skipped. That translates to over 1400+ respondents.

It shows that any numeric input in the survey is rapidly increasing in trends.

While questions that require TEXT input is lower than the latter. However, it has doubled the skipping rate, especially those open-ended / free-text similar types of questions to try to differentiate whether problems with doing data visualization as a hobby or specific professional issues.

I loved the survey questions on different visualization tools because it shows you the different alternative when you do your story, and the various roles available if we proceed and more serious on the data science arena. It was posed from the last challenge that we also need to differentiate between data visualization as a professional role or not such a function exists. With many skipped questions, probably by the next round of survey questions (next year), will be design based on the results to minimize NULL answers.

If you end up better-suited processing on the survey dataset challenge, please feel free to file a request, and I will be glad to include it in the updates.

