Finally, a Measure for Secure Development Skill

Published in

Sparks of Innovation: Stories from the HCIL

5 min readMay 27, 2020

Fostering sound and secure development by measuring programmer skill.

Photo by Philipp Katzenberger on Unsplash.

Security vulnerabilities and data breaches are reported almost constantly, costing people time and effort dealing with potential identity theft or embarrassment and costing companies in profits and usage. Vulnerabilities in software are often caused by software developers’ errors, many of which could or should be preventable. Many researchers and companies provide tools, guidance, and education designed to help developers avoid these errors. But it’s hard to know which ones are most effective. What approach actually helps developers produce more secure code?

One common method researchers employ is to have developers take a secure development test before and after being given the intervention. Developers are asked to look through some code and identify a vulnerability or write a secure program. Unfortunately, only simple programs can be used without taking up a significant amount of the developer’s time. Even so, assessments can take hours to complete.

Alternatively, researchers have compared the number of vulnerabilities developers have introduced while working on real-world code. This approach is even more problematic. First, it can be a noisy indicator, as there are likely environmental factors, e.g., employer pressures, affecting vulnerability introduction. Also, it can be costly to collect this data: you either have to invest the time of experts to perform a security audit or wait until a malicious actor finds and exploits the vulnerability!

In our recent research, we sought to remedy this situation by developing a scale that can be used to assess how much someone knows about secure development, by asking them. Scales, or sets of predefined questions, are commonly employed by human behavior researchers to “measure elusive phenomena that cannot be observed directly” due to cost or complexity. (One of the most well-known examples is the Myers-Briggs personality scale.) Asking about something like secure development is more convenient than trying to measure skills directly, and somewhat surprisingly, it works fairly well: self-efficacy, or the belief in one’s ability to successfully perform a task, is often associated with actual skill. We call our scale the Secure Software Development Self-Efficacy Scale (SSD-SES).

The key to this research, then, is figuring out the right questions to ask. We wanted to identify a representative set of secure development tasks and ask developers to tell us how well they believed they could perform each. To do this, we needed to determine the full set of tasks required for secure development. Then, because we don’t expect people to spend hours answering questions, we needed to pare down that list, focusing on questions most likely to find differences between developers. It’s not helpful to have participants answer questions that are, for all intents and purposes, redundant or that all participants answer the same way.

Identifying secure development tasks

To identify all possible secure development tasks, we started with a review of five popular secure development guidelines:

NIST’s National Initiative for Cybersecurity Education framework (NICE)
The Building Security In Maturity Model (BSIMM)
OWASP’s Software Assurance Maturity Model (OSAMM)
Microsoft’s Security Development Lifecycle (SDL)
SAFECode’s Fundamental Practices for Secure Software Development framework

After reviewing each of these guidelines for unique tasks, we asked 22 secure development experts to review our task list. The experts rated each task and suggested tasks we might be missing.

After this exploratory review, we were left with 58 unique tasks within 8 categories: Determining security requirements, identifying attack vectors, identifying vulnerabilities, designing mitigations, designing for resiliency, testing, communicating security assumptions to colleagues, and communicating security assumptions with leadership and users.

Paring down the task list

Next, we pared down our list by surveying 157 professional software developers. The goal of this survey was to weed out tasks that most developers responded to similarly and tasks developers didn’t think were relevant. After finishing this survey, we were left with 18 tasks.

At this point, we also looked for groups of tasks, known as factors, where developers’ answers are related: your answer on one task in this group gives a pretty good idea of how you’ll answer other questions in the group. In fact, we found two factors of secure development self-efficacy: vulnerability discovery and mitigation, and security communication. This tells us that developers’ self-efficacy can improve in either of these dimensions independently, and we can target education and support tools depending on which factor(s) the developer needs more help with.

Confirming our results

We tested the remaining 18 tasks with 146 developers. This second survey was intended to see whether we got similar responses from another set of developers, meaning that our scale produces consistent results. Primarily, we were looking for whether the grouping of responses into two factors matched our prior survey. For the most part, this was true. However, we removed 3 tasks that diverged from their previous group, resulting in our final 15-item scale.

We also wanted to make sure SSD-SES scores matched? with other indicators you might expect to relate to knowledge or skill in secure development. To do this, we also asked participants whether they had discovered a vulnerability previously or had regular contact and discussions with a security expert. Indeed, we found that this type of exposure to secure development experiences correlated with higher SSD-SES scores.

How to use SSD-SES

Our final 15-item scale — formatted as survey questions used to administer it — is available here. The survey includes instructions to give developers who will answer the questions, as well as the questions themselves. To generate a total SSD-SES score, you simply sum the responses for each question (1 point for “I am not confident at all”, 5 points for “I am absolutely confident”).

If you’d like to measure the impact of your developer training program on secure development self-efficacy, you can have participants take the survey before and after and compare scores. You could also use the survey to probe current secure-development confidence among your team members and use the results to determine who may need additional support.

If you chose to use SSD-SES for any reason, we’d love to hear your feedback. Please feel free to email our research team at sec-pros@cs.umd.edu to tell us about your experience.

Read the full paper for additional details:

Daniel Votipka, Desiree Abrokwa, and Michelle L. Mazurek. Building and Validating a Scale for Secure Software Development Self-Efficacy. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 2020.

Finally, a Measure for Secure Development Skill

Identifying secure development tasks

Paring down the task list

Confirming our results

How to use SSD-SES

Written by Daniel Votipka