A Non-Inclusive Language Detector Lint Rule for SwiftKey

Published in

Microsoft Mobile Engineering

6 min readMar 4, 2022

Both Android Studio and SwiftKey’s build chain make use of Lint rules — assertions that code should comply with some particular condition, for example that an indent in a source file should use spaces and not tabs, and that there should be four spaces per “level” and not, say, five or three.

As well as these pre-defined rules, it’s also possible to create custom ones, for example a rule that might be specific to an organisation’s use of language within code, or a rule that discourages use of a particular library. This article walks through a custom rule we built in our team to align with our company culture: a Non-Inclusive Language Detector.

Define what’s operated upon

Rules are broadly split into those which operate upon Resources and those which operate upon code (including Java and Kotlin), although it is also possible to write a single rule which operates on both at the same time. The two however behave slightly differently and implement different interfaces.

Java and Kotlin

Java (which for this discussion also includes Kotlin) is converted by the Source Code Scanner framework into an Universal Abstract Source Tree (UAST). This abstracts away language-specific details and offers everything in a unified way, making rules broadly applicable to both languages. So, to write a rule which targets Java, create a class which implements the SourceCodeScanner interface:

class NonInclusiveLanguageDetector : SourceCodeScanner { … }

We then need to override one or more of the interface’s methods depending upon which features of the code we want to inspect, e.g. constants, method names or classes. These come in pairs, where there’s one method to return an optional list of specific targets (not implementing or returning null will mean all nodes of that type are passed to the visitor receiver), and another visitor receiver which accepts each of those targets in turn and performs some sort of validation on them.

getApplicableMethodNames(): List<String>? visitMethodCall(JavaContext, UCallExpression, PsiMethod)getApplicableConstructorTypes(): List<String>? visitConstructor(JavaContext, UCallExpression, PsiMethod)getApplicableReferenceNames(): List<String>?
visitReference(JavaContext, UReferenceExpression, PsiElement)applicableSuperClasses(): List<String>?
visitClass(JavaContext, UClass)getApplicableUastTypes(): List<Class<out UElement>>?
createUastHandler(JavaContext)

The last of these pairs is intended for more complicated scenarios and works slightly different to the others in that instead of implementing a single visitor method, we instead need to return a UAST handler which itself implements visitor receivers for each of the different types specified in getApplicableUastTypes(). We’re going to implement this in order to inspect the text content of our source files to look for non-inclusive language.

Implement the rule

In our rule, we want to inspect the entire contents of every Java or Kotlin Class, so we return the type UClass (Universal Class) as a single entry in the applicable UAST list:

override fun getApplicableUastTypes(): List<Class<out UElement>> {
  return listOf<Class<out UElement>>(UClass::class.java)
}

Other objects that extend UElement include UFile, UMethod, UVariable and UEnumConstant. There’s a definitive list in the visitor definitions of UElementHandler.

We then need to create a UAST Handler which implements the UElementHandler interface and which, in this case, only has to override a single method because we only specified a single applicable UAST type:

override fun createUastHandler(context: JavaContext): UElementHandler {
  return object : UElementHandler() {
    override fun visitClass(node: UClass) { … }
  }
}

Now that we have a node, we can do various things with it. In our example, we’re interested in any text in the class, so we can just access the text field. However, there are many other properties which can be inspected in the node object, including constructors, comments, initialisers, inner classes or methods. So, extending our code example we now have:

override fun visitClass(node: UClass) {
  val text = node.text.lowercase(Locale.ENGLISH)
  for (term in listOf(“blacklist”, “whitelist”, …) {
    if (text.contains(term)) { // report it // }
  }
}

Reporting the error

The final step in this part of the rule is to actually report the error. To do this, we call the `report` method on the JavaContext that’s passed to us in the createUastHandler call. This takes an Issue object, the node we’re reporting on, a named location of the node, and a string containing the user-facing message we want to show with the error. Together, it all looks like this:

context.report(
  ISSUE,
  node,
  context.getNameLocation(node),
  “There’s an error in %s”.format(term)
)

where ISSUE is an additional object which specifies details about what sort of error we want to treat it as, and contains a unique ID, a brief and a long description, a category (for instance CORRECTNESS, SECURITY or COMPLIANCE), a numerical priority, a severity (e.g. FATAL, ERROR, WARNING or INFORMATIONAL) and an Implementation. For our example, it looks like this:

@JvmField
val ISSUE: Issue = Issue.create(
  “NonInclusiveLanguageUse”,
  “Find usages of non-inclusive language”,
  “Looks for non-inclusive language in XML resources and
    Java/Kotlin files…”,
  Category.CORRECTNESS,
  7,
  Severity.ERROR,
  IMPLEMENTATION
)

The @JvmField annotation above is necessary because we define ISSUE as a val in the class’s Companion object, but we need Java to be able to see the definition as if it was a regular public static field.

The IMPLEMENTATION in the above Issue is another Companion-object field which defines a kind-of scope of operations. It is created by passing in a reference to the detector class (NonInclusiveLanguageDetector::class.java) and a vararg list of scopes. In our case, because we’re writing a class to analyse Java and XML we only need to pass in Scope.JAVA_AND_RESOURCE_FILES, giving an object like this:

private val IMPLEMENTATION = Implementation(
  NonInclusiveLanguageDetector::class.java,
  Scope.JAVA_AND_RESOURCE_FILES,
)

Pre-run set up

There’s one last thing that we do in our implementation of the rule, and that’s to load the list of non-inclusive words we want to scan for from a resources file, rather than hard-code them in the rule. If nothing else, that means we can have a separate non-changing list of words in our tests. This, or any other set up we want to perform before the rule actually runs, can be done by overriding the beforeCheckRootProject() method. Ours looks like this:

override fun beforeCheckRootProject(context: Context) {
  nonInclusiveTerms = File(
    context.project.dir, 
    “nonInclusiveTerms”
  )
  .readLines()
  .map { it.trim() }
  .filter { it.isNotEmpty() && !it.startsWith(“##”) }
  .toHashSet()
 }

That’s it. The entire block for our Java-analysis Lint rule should now look like:

override fun createUastHandler(context: JavaContext): UElementHandler {  return object : UElementHandler() {
    override fun visitClass(node: UClass) {
      val text = node.text.lowercase(Locale.ENGLISH)
      for (term in nonInclusiveTerms) {
        if (text.contains(term)) {
          context.report(
            ISSUE,
            node,
            context.getNameLocation(node),
            "Error in %s".format(term)
          )
        }
      }
    }      
  }
}

All that remains is to add the new rule to the Issue Registry, which in SwiftKey means adding the NonInclusiveLanguageDetector.ISSUE definition from above to the list of issues in SwiftKeyIssueRegistry, a class which extends IssueRegistry. The Implementation field contained in the Issue, as defined in the rule class’s companion object, includes a reference to NonInclusiveLanguageDetector::class.java — it’s this that provides the connection back to the class which actually contains the rule.

XML Resources

Testing non-Java files, such as XML layouts, navigation, values and drawables, requires extending a different abstract class: ResourceXmlDetector(). This needs an implementation of the method appliesTo(), which performs a similar function to getApplicableUastTypes(), except that it returns a Boolean depending upon whether we want to inspect the incoming object. In our example, we implement it like this:

override fun appliesTo(folderType: ResourceFolderType): Boolean {
  return ResourceFolderType.VALUES == folderType ||
    ResourceFolderType.LAYOUT == folderType ||
    ResourceFolderType.XML == folderType ||
    ResourceFolderType.NAVIGATION == folderType
}

We can then further specify which elements we are interested in (we use a pre-defined static which defines a zero-length array, meaning everything), and finally implement a single visitor common to all XML data, which together looks like this:

override fun getApplicableElements(): Collection<String> {
  return XmlScannerConstants.ALL
}override fun visitElement(context: XmlContext, element: Element) {
  findAndReportProblemStrings(context, element, nonInclusiveTerms)
}

From this point, incoming XML Elements can be handled as standard XML Nodes using conventional parsing with the W3C Document Object Model (DOM), for instance node.attributes?.item(0)?.textContent will report the names or values of XML attributes, or there’s this as a way of looking for actual text content, where the element is accessed as its super-type of Node:

node.childNodes?.let {
  if (it.length > 0) {
    for (c in 0 until it.length) {
      val child = it.item(c)
      if (child.nodeType == Node.TEXT_NODE) {
        // check text contents here //
      }
    }
  }
}

Reporting any errors works in the same way as we saw in Java, except that we call report() on the supplied XmlContext, and supply the Element and context.getLocation(element) as its location.

Conclusion

Once your first custom rule has been set up, it’s easy to create as many more as you need. We’ve used this one to check for non-inclusive language (and it will break our toolchain until any issue has been fixed), but we have many others as well, for example one which just warns us about using a particular library in Kotlin when native functions could do the job just as well.

Are you using custom Lint rules already? Got any good ones to share?