How I contributed my first custom rule to Semgrep Rule Registry

Published in

AppSec Untangled

8 min readMar 19, 2024

Since I learned that Semgrep enables users to create and use custom SAST rules, I was instantly intrigued. That is mainly because the value we get from any SAST tool is only as good as the rules built for this tool and what they do/don’t cover, and however experienced the security team writing the rules is, there will always be situations that are just specific to your own company or organization leading to false positives (findings reported that are not valid) and false negatives (valid findings missed).

SAST tools usually have false positives and false negatives

Example of Built-in Rules False Negatives (Misses)

For example, in your company, you could be using a custom middleware for authorization, input validation, CSRF protection, throttling, or other security controls. In this case, any route missing this middleware could be vulnerable, but this would never be detected with the built-in rules of any SAST tool as this is only specific to your company.

Example of Built-in Rules False Positives

Similarly, in your company, you may be using a custom function for input validation/sanitization which gives you protection against specific types of attacks, but also as this is specific to your company, SAST Built-in rules typically would not understand this context, and would still flag a finding if the dangerous sink (e.g. user input passed to SQL statement) is used even if you are using your custom function for protection.

Customization Empowers Security Engineers and Developers

These are just 2 of many examples where built-in rules sometimes fall short of meeting expectations because of a situation specific to the code base being scanned. This is why I really liked the Semgrep custom rules feature, as it empowers security engineers and developers to adapt how the SAST scan tool runs based on their environment and needs. A rule is giving false positives? modify it to make it more accurate. Found a new issue not flagged as a finding? Write a new rule to cover it.

Semgrep Custom Rules help eliminate false positives and false negatives

All of that works because of Semgrep’s versatile SAST engine, and the reasonably easy-to-learn custom rule syntax that enables the use of this engine, which is what I will demonstrate in the remaining part of this story. Semgrep also provided a process to contribute any useful custom rules to the community in an open-source rule registry.

This resulted in the development of a vibrant and active community around Semgrep, making it even easier to get help and guidance when trying to write and use the custom rules and resulting in the open-source rule registry becoming much more useful in many situations than any set of rules built by one security team and organization (e.g. a security researcher can easily create a new Semgrep custom rule for any new vulnerability they discover in a few days, while other tools would take months to add coverage).

Where I found my first chance to write my first custom rule

In the previous story of this blog “What is wrong with this code?”, we discussed a code snippet I found interesting as it had a subtle authentication bypass vulnerability which is typically very hard to spot using code reviews or SAST scans.

To summarize, the issue was related to the behavior of the decode() function of the jwt-simple library, where it expected the 3rd argument to be noVerify and if any value is passed other than false (e.g. a String) the signature of the JWT token is not verified potentially leading to authentication bypass, this made situations like the below line where the developer misplaced the algorithm and put it in the 3rd instead of the 4th argument vulnerable.

const decoded = jwt.decode(token, secretKey, 'HS256');

While the story’s main goal was to encourage not fully relying on code reviews and SAST scans and running dynamic tests for authentication, authorization, and other business logic security controls, this also presented itself as the chance I was waiting for to try the Semgrep custom rules feature. I’ve tried scanning the code with Semgrep and got no findings (this is a false negative), so it was my chance to cover this gap by writing a custom rule.

Writing the custom rule

I went through the “Writing custom rules” official guide to understand the rule syntax and played around with some examples on Semgrep’s Playground which is a perfect place to start for writing and testing custom rules. Then I started writing the rule (with some help from AI to be honest), and ended up with the below rule.

rules:
  - id: jwt-simple-noverify
    message: Detected the decoding of a JWT token without a verify step. JWT tokens
      must be verified before use, otherwise the token's integrity is unknown.
      This means a malicious actor could forge a JWT token with any claims. Set
      'verify' to `true` before using the token.
    severity: ERROR
    metadata:
      owasp:
        - A05:2021 - Security Misconfiguration
        - A07:2021 - Identification and Authentication Failures
      cwe:
        - "CWE-287: Improper Authentication"
        - "CWE-345: Insufficient Verification of Data Authenticity"
        - "CWE-347: Improper Verification of Cryptographic Signature"
      category: security
      subcategory:
        - vuln
      technology:
        - jwt-simple
        - jwt
      confidence: HIGH
      likelihood: MEDIUM
      impact: HIGH
      references:
        - <https://www.npmjs.com/package/jwt-simple>
        - <https://cwe.mitre.org/data/definitions/287>
        - <https://cwe.mitre.org/data/definitions/345>
        - <https://cwe.mitre.org/data/definitions/347>
      license: Commons Clause License Condition v1.0[LGPL-2.1-only]
      vulnerability_class:
        - Cryptographic Issues
        - Improper Authentication
    languages:
      - javascript
      - typescript
    patterns:
      - pattern-inside: |
          $JWT = require('jwt-simple');
          ...
      - pattern: $JWT.decode($TOKEN, $SECRET, $NOVERIFY, ...)
      - metavariable-pattern:
          metavariable: $NOVERIFY
          patterns:
            - pattern-either:
                - pattern: |
                    true
                - pattern: |
                    "..."

Let me explain this to show how simple it is once you get the hang of it. As you can see, custom rules are YAML files. The first part of the rule has the metadata like the rule id, description, severity, related CWEs and OWASP Top 10 items, references, relevant languages … etc. The actual logic of the role is in the 2nd part shown below:

patterns:
      - pattern-inside: |
          $JWT = require('jwt-simple');
          ...
      - pattern: $JWT.decode($TOKEN, $SECRET, $NOVERIFY, ...)
      - metavariable-pattern:
          metavariable: $NOVERIFY
          patterns:
            - pattern-either:
                - pattern: |
                    true
                - pattern: |
                    "..."

Let’s explain how the above part works. Under patterns there are 3 fields:

pattern-inside

- pattern-inside: |
          $JWT = require('jwt-simple');
          ...

$JWT = require('jwt-simple'); matches the line where the library is imported and saves the name of the variable it is imported to the metavariable $JWT to be used in other places, and the ellipses operator ... matches zero or more lines after. This field effectively means that any finding should be preceded by importing the jwt-simple library.

2. pattern

$JWT.decode($TOKEN, $SECRET, $NOVERIFY, ...) uses the $JWT metavariable we got above, and looks for its use to call the decode() function with 3 or more arguments (... means zero or more arguments in this context) and saves the 3 first arguments in metavariables. The 3rd one $NOVERIFY is the one we will use in the next part.

3. metavariable-pattern

This specifies the values of the $NOVERIFY that would generate findings, the first value is true as this means signature verification is skipped, but also "..." which matches any string as this would also skip verification.

All these 3 parts work together to match any instances of using the jwt-simple library decode() function with the 3rd argument set to true or to any string value. I invite you to go through the “Writing custom rules” official guide for more details about the rule syntax if you are interested.

Writing tests for the rule

Before submitting the rule, we need to create a test file to confirm the rule is working as expected. For that, I used the code snippet I shared in the last story and just added a few lines to it to cover different positive and negative cases. The positive cases should be preceded with the comment // ruleid: <my-rule-id> , and the negative cases should be preceded with the comment // ok: <my-rule-id>. For example:

    // ruleid: jwt-simple-noverify  
    const decoded = jwt.decode(token, secretKey, 'HS256');

    // ruleid: jwt-simple-noverify   
    const decoded = jwt.decode(token, secretKey, true);

    // ruleid: jwt-simple-noverify    
    const decoded = jwt.decode(token, secretKey, 'false');

    // ok: jwt-simple-noverify   
    const decoded = jwt.decode(token, secretKey);

    // ok: jwt-simple-noverify    
    const decoded = jwt.decode(token, secretKey, false);

Here is how it looks like in Semgprep’s Playground:

Submitting the rule

Now, after I’ve written and tested the rule it is time to submit it to the open-source Semgrep Rule Registry, and the process for that is pretty straightforward and is fully documented here. To summarize, you can submit the rule directly from the Semgrep Playground which will open a Pull Request for you, or alternatively, you can open the Pull request directly yourself if you prefer.

The Pull Request review process took ~2 days, and once merged I was able to see my new rule in the Semgrep Rule Registry here: https://github.com/semgrep/semgrep-rules/blob/develop/javascript/jwt-simple/security/jwt-simple-noverify.yaml

Using the rule

With the rule in the Semgrep Rule Registry, anyone using a Ruleset it is included in will run the new rule. For example, our rule was added under the path javascript/jwt-simple/security/jwt-simple-noverify.yaml , this means that using the ruleset r/javascript or r/javascript.jwt-simple would include the new rule.

Here’s how the finding would look like:

$ semgrep --config="r/javascript.jwt-simple"

┌─────────────┐
│ Scan Status │
└─────────────┘
  Scanning 4209 files (only git-tracked) with 1 Code rule:
  CODE RULES
  Scanning 2 files.
  SUPPLY CHAIN RULES
  💎 Run `semgrep ci` to find dependency
     vulnerabilities and advanced cross-file findings.
  PROGRESS
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┌────────────────┐
│ 1 Code Finding │
└────────────────┘
    app.js
   ❯❯❱ javascript.jwt-simple.security.jwt-simple-noverify.jwt-simple-noverify
          Detected the decoding of a JWT token without a verify step. JWT tokens must be verified before use,
          otherwise the token's integrity is unknown. This means a malicious actor could forge a JWT token
          with any claims. Set 'verify' to `true` before using the token.
          Details: <https://sg.run/zdjod>
           66┆ const decoded = jwt.decode(token, secretKey, 'HS256');
┌──────────────┐
│ Scan Summary │
└──────────────┘
Some files were skipped or only partially analyzed.
  Scan skipped: 1149 files matching .semgrepignore patterns
  For a full list of skipped files, run semgrep with the --verbose flag.
Ran 1 rule on 2 files: 1 finding.

Conclusion

Accuracy is usually one of the main issues affecting SAST tools for both false positives and false negatives, and custom rules is one of the best ways to bridge this gap as it can create the needed feedback loop to add and modify the rules to continuously increase accuracy and coverage, and with Segmrep’s versatile and easy-to-learn rule syntax, it now possible for both security engineers and developers to make use of this feature to get the best value out of their SAST scans.

As for me, I will be looking forward to my next chance to submit another rule to Semgrep’s Rule Registry, and maybe next time you run a Sempgrep scan you will get a finding from one of my rules, who knows!