Introducing Vale, an NLP-powered linter for prose
We’re pleased to announce the v1.0.0 release of Vale, a command-line tool that brings code-like linting to prose. Vale is cross-platform (Windows, macOS, and Linux), written in Go (Google’s open-source programming language), and available on GitHub.
Linting is the process of ensuring that written work (source code or prose) adheres to a particular style — for example, Python’s PEP 8 style guide (code) or the Google’s Documentation Style Guide (prose).
Before we get into the details of what makes Vale useful, there’s one point we’d like to clarify: Vale is not a general-purpose writing aid. It doesn’t teach you how to write; it’s a tool designed to be used by writers. More specifically, Vale focuses on the style of writing rather than its grammatical correctness — making it fundamentally different from, for example, Grammarly.
In other words, Vale focuses on ensuring consistency across multiple authors (according to customizable guidelines) rather than the general “correctness” of a single author’s work.
Your style, our editor
One of Vale’s most important features is its ability to support external styles through its extension system, which only requires some familiarity with the YAML file format (and, optionally, regular expressions).
To get a better idea of how this works, let’s look at an example from the Linode documentation:
In the above example, we’ve defined a few terms that have a particular capitalization style. If Vale finds an instance of a term that matches a pattern on the left of
swap (case-insensitive) but doesn’t exactly match the value on the right, it issues an error. So, for example,
nodebalancer or any other variation that doesn’t exactly match
NodeBalancer will be flagged as an error.
While this example may appear quite simple, it’s possible to achieve fairly high coverage on complete editorial style guides. Check out the vale-boilerplate repository for an example using the 18F Content Guide.
- Improved support for markup (see the next section for more information), including the ability to ignore code blocks and target only certain sections of text (e.g., checking headers for a specific capitalization style).
- Avoiding the need to install and configure npm (Node.js), pip (Python), or other language-specific tools. With Vale, you get all the functionality in a standalone binary available for Windows, macOS, and Linux.
- The ability to easily combine, mismatch, or otherwise customize each style.
Syntax- and context-aware linting
Another feature that separates Vale from other linters is its ability to understand its input at both a syntactic and contextual level.
This level of understanding gives you fine-grained control over the linting process, including the ability to limit rules to certain sections (e.g., only headings) or ignore sections entirely (block and inline code are ignored by default).
Additionally, since Vale is built on top of an NLP library, you can also target specific segments of text — allowing you to, for example, warn about paragraphs that exceed a certain number of words or sentences that end with propositions.
Vale also supports context-specific rule configuration via HTML-style comments:
<!-- vale Style.Rule = NO -->
This is some text that has `Style.Rule` disabled.
<!-- vale Style.Rule = YES -->
This makes it possible to selectively break certain rules without having to disable them globally.
Vale is designed to be fast enough to be included in continuous integration test suites for large (> 1,000 files) repositories. To give you an idea of its performance, we ran Vale on three GitHub repositories of varying sizes and formats:
The tests were performed using a MacBook Pro 2.9 GHz Intel Core i7 running macOS Sierra 10.13.3.
In each case, Vale was configured to lint against its built-in write-good style. As you can see, it took an approximate average of only 0.0441 seconds to lint each file.
If Vale seems like it could be useful for your company or organization, you may want to consider our Integration Assistance consulting service.
While spell-checking services are used by nearly everyone, it’s still surprisingly difficult to employ as part of a continuous integration (CI) test suite. The primary reasons for this are:
- Markup syntax: If you’re writing technical documentation of any kind, there’s a good chance that you need to include code snippets, command-line prompts, and other non-prose sections that shouldn’t cause spelling errors.
- Specialized terminology: Almost every organization uses terminology that you won’t find in standard spell-checking dictionaries (e.g., brand names, technical terms, and websites).
Vale’s spell-checking extension point is capable of addressing both of these issues. For (1), you get to leverage the scoping system we discussed earlier — making it possible to intelligently handle markup, which is impossible with standard spell-checking utilities.
For (2), we’ll generate a custom spelling vocabulary file from your actual content. This will be a case-insensitive, plain-text file consisting of one spelling exception per line:
We’ll create a Vale-compatible version of your house style, allowing you to ensure that all of your content adheres to your internal guidelines.
You’ll receive a complete, easy-to-maintain repository (following the layout of our boilerplate) with unit tests and examples for your style guide.
We’ll create custom scoping patterns that teach Vale to handle these sections correctly, allowing you to keep your output free of false positives.
There’s still a lot planned for future versions of Vale, including adding support for LaTeX, expanding the selection of built-in styles, and adding SEO-related scopes.
If you’d like to get involved or run into any problems, feel free to open an issue over at the GitHub repository.