A survey into static analyzers configurations: Clippy for Rust, part 1
Static analysis is fantastic!
It allows developers to get rid of many problems that can occur after “git push” is performed. Its utility varies depending mostly on language and platform, but overall, it can provide tremendous benefits, such as, to name a few:
- Keeping the project’s code style consistent.
- Early detection of potential security vulnerabilities.
- Enforcing code conventions related not only to code style but also to code hierarchy and complexity of its parts.
Even though static analysis is a handy technique, many developers tend to ignore it.
The most experienced developers are just capable of writing code that quickly passes even the pickiest linters. They don’t bother themselves using static analysis because they just know it won’t do anything helpful. They most likely won’t find this article useful.
However, developers, especially inexperienced ones, don’t know which tools they should use and how to configure them and thus don’t use static analysis. Simultaneously, they are most likely to make silly mistakes that modern linters can easily catch, the ones who can get the maximum benefit from code analysis.
We decided to dig into this topic and write about the most popular linting tools, what are they capable of, who can find it useful, and most interestingly — how developers configure them, and which options are most handy.
Table of contents
- What is clippy
- Methodology
- Infographics
- Parameters
1. doc-valid-idents
2. cognitive-complexity-threshold
3. too-many-arguments-threshold
4. type-complexity-threshold
5. blacklisted-names - Conclusion
What is Clippy?
In this article, we are focusing on a tool called Clippy. Clippy is a static analysis tool for Rust programming language. We mainly chose to start with it because we damn LOVE Rust — and because Clippy is an official industry standard for the language by pretty much everyone.
According to maintainers, Clippy is “A collection of lints to catch common mistakes and improve your Rust code.” It contains over 400 rules and is highly configurable. Clippy can check your code for correctness, style, complexity, performance, and many more and doesn’t have any capability-matching alternatives so far.
The tool is useful for all Rust developers — for solo and team projects, local and cloud use, and a wide variety of software domains. Moreover, some of its rules are so widely used and loved that they eventually go upstream to the Rust compiler as warnings.
Methodology
While dipping into the topic of using Clippy among Rust developers, we came up with several questions:
- How often is Clippy being configured?
- Which configuration parameters users define most often?
- Which values are they assigned?
- What are the optimal values?
To answer them, we decided to research how developers of open-source projects use Clippy configs on GitHub. For a method, we chose a statistical study.
To collect the data, we used a script written in Python. We interacted with GitHub using pygithub — Python bindings for GitHub API v3. The data was stored using TinyDB.
Clippy uses configuration files in the TOML format, which can have one of two names —.clippy
or .clippy.toml
. To obtain the data, we used the GitHub code search.
Among almost 100k Rust repositories at GitHub, we’ve found only 1073 config files with names specified above. After skipping 175 empty files, we had 898 left.
We collected the following information on each repository containing the Clippy config file:
- repository name
- repository URL
- Clippy config file path
- Clippy config file text
Next, the content of each config file was validated and parsed. 102 files with broken formatting we’ve dropped at this step.
We ended up with 796 valid config files for further analysis.
We published all the data we used here in case you want to use it.
Here they are — configurations.
Overall, Clippy is rarely configured — even though there are almost 100k Rust projects on GitHub, we have only found around one thousand configs, some of which were empty.
Infographics
Here you can find some exciting infographics with the conclusions we made while analyzing them.
Axis X of the given histogram represents several parameters in a config file.
Axis Y shows what share of analyzed configuration files has several redefined parameters equal to given.
Here we can see that more than 80% of users redefined only one parameter. Less than 10% redefined two, and the case when three or more parameters are redefined is extremely rare.
The overall distribution of redefined parameters.
Here we can see that:
- We can attribute All parameters that were redefined to just two distinct groups — identifier lists and thresholds.
- Three definite leaders comprise over half of the parameters redefined in the dataset configs.
Parameters
There are a lot of configurable options in Clippy, but here we’ll describe only the most commonly used ones:
- doc-valid-idents
- cognitive-complexity-threshold
- too-many-arguments-threshold
- type-complexity-threshold
- blacklisted-names
There will be a brief description of each option, an analysis of frequently used options, and some recommendations.
doc-valid-idents
A config parameter for Clippy lint doc_markdown
Among the other awesome things Clippy does, it looks through the doc-comments and ensures that all identifiers are appropriately highlighted. For this, it tries to find any non-whitespace sequences which don’t quite fit the form of an ordinary word written according to grammar rules — for example, ‘words’ with underscores or with capital letters in the middle.
Highlighting identifiers the right way is essential because RustDoc uses this markdown to attach smart hyperlinks to these identifiers whenever they appear in the text. However, sometimes Clippy gets it wrong and points to a word you don’t need to mark as code — for instance, GitHub or LaTeX. The doc-valid-idents property allows you to list the identifiers that do not need highlights and should be ignored.
Example
Consider you have a comment with the following text:
/// This structure contains _phantom_data
Here we’ve got the unticked name: _phantom_data
.
It represents a valid code identifier that should be put in tickles according to RustDoc rules.
Clippy would give you the following message:
Hereinafter we’ll use the screenshots of monocodus comments based on Clippy output as examples because we find them a little more visual and informative than plain Clippy output. Check more examples of monocodus output here.
To fix this snippet, we should just add some tickles:
/// This structure contains `_phantom_data`
Here we are. Comments are OK now.
Default value
Parameter doc-valid-idents
is defaulted to
[ “KiB”, “MiB”, “GiB”, “TiB”, “PiB”, “EiB”, “DirectX”, “ECMAScript”, “GPLv2”, “GPLv3”, “GitHub”, “GitLab”, “IPv4”, “IPv6”, “ClojureScript”, “CoffeeScript”, “JavaScript”, “PureScript”, “TypeScript”, “NaN”, “NaNs”, “OAuth”, “GraphQL”, “OCaml”, “OpenGL”, “OpenMP”, “OpenSSH”, “OpenSSL”, “OpenStreetMap”, “TensorFlow”, “TrueType”, “iOS”, “macOS”, “TeX”, “LaTeX”, “BibTeX”, “BibLaTeX”, “MinGW”, “CamelCase”, ]
It contains words that are most commonly presented in code comments but rarely used as an identifier.
Of course, it’s not exhaustive, so many developers prefer to extend it with both other commonly used words and some more project-specific ones.
Commonly used options
We found 178 unique identifiers used for this option. The vast majority of them appeared 1–2 times, so we took ones that occurred more than ten times. There were 25 of them.
Recommendations
As we mentioned above, this parameter is highly individual. You should configure it manually, taking into account the lexicon of your project.
However, we can suggest you expand the default parameter using the data we collected.
According to this issue, doc-valid-idents currently don’t support expanding the list, only overriding. So if you just copy-paste our values, you will lose default ones.
It explains why so many of the configs we found contain also default values.
To complement default values with ours, you should set the parameter as follows:
doc-valid-idents = [ “KiB”, “MiB”, “GiB”, “TiB”, “PiB”, “EiB”, “DirectX”, “ECMAScript”, “GPLv2”, “GPLv3”, “GitHub”, “GitLab”, “IPv4”, “IPv6”, “ClojureScript”, “CoffeeScript”, “JavaScript”, “PureScript”, “TypeScript”, “NaN”, “NaNs”, “OAuth”, “GraphQL”, “OCaml”, “OpenGL”, “OpenMP”, “OpenSSH”, “OpenSSL”, “OpenStreetMap”, “TensorFlow”, “TrueType”, “iOS”, “macOS”, “TeX”, “LaTeX”, “BibTeX”, “BibLaTeX”, “MinGW”, “CamelCase”, “FreeBSD”, “CppCon”, “HashDoS”, “SipHash”, “SwissTable”, “SQLite”, “WebIDL” ]
cognitive-complexity-threshold
A config parameter for Clippy lint cognitive complexity
Cognitive complexity is a modern metric representing how hard it is for humans to understand a given code snippet (module, function, etc.) introduced by SonarSource.
It takes into account breaks in linearity, nestings, and interviewing between logical operators in functions. In short — if your code has high cognitive complexity, it’s potentially hard to understand, and you probably should rework it to be more straightforward.
The addition of this feature into Clippy has a long history. In the end, the linter of cognitive complexity replaced the cyclomatic complexity one, which maintainers considered to be not useful. You can find a brief description of what Clippy counts to determine the cognitive complexity here.
Parameter cognitive-complexity-threshold determines maximal cognitive complexity allowed for functions in your project.
Example
If Clippy finds a function with too high cognitive complexity in your project, it will give you the following error:
Default value
The default value for this parameter is 25.
Commonly used options
As you can see on the histogram, preferred values of maximal cognitive complexity highly vary. There are a few bizarre cases (like 1 or 1300), which we didn’t count. In the end, the mean value, according to the dataset, is 43.
Recommendations
In our opinion, the default value of 25 for this option is already too high to be useful. However, the data shows just the opposite — that a lot of people are uncomfortable seeing this lint triggered so often. So if you don’t want to spend too much time thinking about that, just set a value at 45. If, however, you’d instead prioritize writing a more easily understandable code, you can lower the value to 15 or even 10 — which still should accommodate a lot of idiomatic code in Rust. We checked this on a couple of our projects, and even pretty complex code, written idiomatically, might not always hit the complexity of 5 once.
too-many-arguments-threshold
A config parameter for Clippy lint too many arguments
Functions with bloated parameter lists are hard to understand, use, and maintain. They’re generally considered to be substandard code style and even more — a code smell.
A long list of parameters often appears to distinguish between several possible algorithms the method can use to combine them in a single method.
Parameter too-many-arguments-threshold defines how many arguments each function is allowed to have.
Example
Imagine function with the following signature:
fn foo(too: i32, many: i64, redundant: u32, parameters: i16, in_this: i8, func: i64)
Yes, it’s awful.
Clippy would react like this:
And God, Clippy is right.
Default value
The default value for this parameter is 7
Commonly used options
There is one weird case of 100 parameters allowed. We don’t know why, so let’s just ignore it.
As we can see, distribution is concentrated in bounds from 1 to 20. The mean value is 11.
Recommendations
A good number of parameters highly depends on what exactly you are developing. In some cases, you just have to pass a massive amount of flags into functions, so you cannot reduce the number of arguments without breaking the code. Generally, a shorter parameter list is better, but you always have to consider your particular case.
In short — default value 7 is excellent for most cases. If you feel that it’s too low and you just can’t fit it, you can increase it as you need. But if you seem to need more than 20 parameters somewhere, you are most likely doing something wrong.
type-complexity-threshold
A config parameter for Clippy lint type complexity
Type complexity is a metric that represents how complex the type is, how hard it is to understand, use, and maintain. There’s not much information on how it’s defined and counted, so we had to inspect that checker’s implementation.
In short — the complexity of the given type is counted using the following rules:
- In the beginning, the score is 0, and the nesting level is 1, and there’s a visitor that traverses the type elements recursively top-down.
- Any type inference, referencing, and dereferencing adds 1 to the score and does not change the nesting level when going downward.
- Any slice, named type, array, or tuple adds 10*nesting_level to the score, then adds 1 to the nesting level when going downward.
- Any function type adds 50*nesting_level to the score, then adds 1 to the nesting level when going downwards.
- When evaluating a trait object, the result depends on whether the trait bounds have lifetimes:
- If they do, it adds 50*nesting_level to the score, then adds 1 to the nesting level when going downwards.
- If they don’t, it adds 20*nesting_level to the score and does not increase the complexity level.
- Any other entity does not change the score and doesn’t change the complexity level.
Example
Given the following type:
struct OverComplicatedType<A, B, C, D> {
_phantom_data: PhantomData<(A, B, C, D)>,
}
Let’s calculate its complexity step by step.
Traversal complete. The resulting type complexity is 150
We’ve checked this result running Clippy twice with type complexity thresholds equal to 149 and 150.
In the first case, it gave us the following warning:
In the second case, it was silent. This experiment proves the correctness of the method described above.
Default value
The default value for this parameter is 250.
Commonly used options
Axis X is scaled logarithmically.
We can see that the most popular option is 10000 and also a lot of people set this option to 999999. The answer is that type complexity threshold 1000 is hard to exceed, and assigning this value to it effectively disables lint because it’s not likely that any of the types you’ll ever use will hit it.
For analysis, we divided the result into two groups.
First one contains values < 10000. There is nothing between 1000 and 10000, so de facto, these values are <= 1000. It represents developers who seem to use this linter.
The second one is for values > 10000. It represents developers who deliberately set this value to turn this lint off.
The geometric mean for the first group is 411.134.
We didn’t calculate it for the second group because it makes no sense. Values set to 10000 and above will produce the same result — disable the linter.
Recommendations
This lint is one of the most controversial in Clippy’s arsenal. As we can see from the dataset, 37% of this config option just disables it.
This metric is certainly not perfect. It cannot unambiguously indicate lousy code, so many developers prefer not to use it because they know better which level of complexity is acceptable (or they think so). If you’re skilled enough to see such things yourself, you may like to set this threshold to 10000 to turn this lint off.
However, if you’re not a very experienced developer, you may find this lint handy. Sometimes it can point out code with redundant type complexity that should be reworked for maintainability and intelligibility. In this case, the default value, 250, can be good enough, and if you feel that your project needs some more complex types, you can increase it, for example, to the mean value calculated by us — 411.
blacklisted-names
A config parameter for Clippy lint blacklisted names.
Probably the simplest lint. It checks all variable names present in code against the list of forbidden identifiers defined in the config. It’s important to know that it checks only variable names, so function names, argument names, type names, etc. left unchecked.
Example
Well, I don’t think we need an example here, but let’s give it a try for the sake of a consistent format.
Consider you’ve got the name “quux” as blacklisted in config. And you’ve got given code snippet somewhere in your project.
let quux = 6
Clippy will react as follows:
Default value
The default value for this parameter is [“foo”, “baz”, “quux”].
Commonly used options
Here’s the list of all options found. As we can see, top-3 exactly repeats the default config because Clippy doesn’t allow to extend the default options list but only to override it.
Besides that, nine names occurred five times or more.
Recommendations
It’s quite hard to recommend something on given lint. If you think that some names should never be used in code — just insert them in config. But if you want to extend the default list a little using data we found, you can assign parameters as follows.
blacklisted-names=[“foo”, “baz”, “quux”, “toto”, “tata”, “titi”, “42”, “bar”, “unreadable_literal”]
Conclusion
This article discussed the most popular linting tools for Rust — Clippy and the five most popular config options. It has a lot of handy options and can be useful for all developers, experienced or not.
Clippy is one of the best and most convenient existing static analysis tools. Its default configs are suitable for most cases, and it’s pretty rarely reconfigured. We got only 796 valid config files among almost 100k Rust repositories at GitHub. Even though not every single Rust developer uses Clippy, it’s ubiquitous, so we can say that just a small fraction of users find default configs not good enough.
Although Clippy is mostly good with the default config, sometimes, one may need to change something. These modifications mainly consist of increasing/decreasing thresholds or extending the default names list, but sometimes it relates to something more complex like cognitive complexity of type complexity. Clippy’s official documentation lacks information on linters details, and we’ve tried to fill the gap with this article.
Below we give the optimal, in our opinion, values for each config parameter discussed in this article again. If you have never used Clippy before and need some config to start, or just never tried to configure this precious linter, you can use it.
doc-valid-idents = [ "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "DirectX", "ECMAScript", "GPLv2", "GPLv3", "GitHub", "GitLab", "IPv4", "IPv6", "ClojureScript", "CoffeeScript", "JavaScript", "PureScript", "TypeScript", "NaN", "NaNs", "OAuth", "GraphQL", "OCaml", "OpenGL", "OpenMP", "OpenSSH", "OpenSSL", "OpenStreetMap", "TensorFlow", "TrueType", "iOS", "macOS", "TeX", "LaTeX", "BibTeX", "BibLaTeX", "MinGW", "CamelCase", “FreeBSD”, “CppCon”, “HashDoS”, “SipHash”, “SwissTable”, “SQLite”, “WebIDL” ]# if you want to write readable, understandable code. In other cases, just leave this parameter default.cognitive-complexity-threshold=15# If you have to pass many arguments to functions and the default value is too low, set it to 10 or 15, depending on your case. Never set it above 20.too-many-arguments-threshold=10# If you are often required to use some complex types. In other cases, leave the parameter defaulttype-complexity-threshold=411blacklisted-names=["foo", "baz", "quux", "toto", "tata", "titi", "42", "bar", "unreadable_literal"].
See you in the next articles :)
P.S.
We integrated Clippy and all the other tools we discuss into monocodus — a code analysis platform that incorporates the best open-source and self-developed static analysis tools with tight integration into GitHub.
It provides reports on code flaws and even more — code suggestions in the form of GitHub pull request code reviews.
Monocodus has friendly default configuration options for all integrated linters, so if you don’t want to dig into the configs yourself, just click “install” and enjoy our automated code reviews.
That’s all for now.
This article was written by:
Pavel Ignatovich, software engineer
Alena Yuryeva, software engineer
on behalf of Monocodus