How to “Shift-Left” SAST scans (Semgrep as an example)

Published in

AppSec Untangled

9 min readJan 26, 2024

“Shift-Left” has become quite the buzzword recently and I assume you probably have already seen a dozen talks and blog posts discussing the security benefits of “Shifting-left” and giving high-level suggestions on achieving that. Instead, in this story I want to focus on what “Shifting-Left” looks like in the real world. To do that, we will take one Application Security activity which is SAST scans, discuss two ways to “Shift it left”, and illustrate its benefits with a real-world example using Semgrep.

What does “Shift-left” actually mean?

The gist of “Shifting-Left” is running security activities and scans as early as possible in the SDLC process to provide feedback to developers early in the process which helps avoid unnecessary friction and delays. The other consideration here is embedding the activities and scans within the tools and workflows the developers already use to make rather than create new disruptive workflows.

Hence, to be able to “Shift-left” a security activity or scan, we need to follow the tasks, tools, and workflows the developers go through to deliver a project, and embed the security activity or scan in the most fitting place within the existing workflow. So let’s try to do that!

Let’s follow the development process

Let’s try to dive deep into what developers do when they are developing a new feature or project. Most of the features and projects would typically follow a workflow like the one below:

A “User story” is submitted and goes through the normal sprint planning process until it is assigned and scheduled to start.
The feature or project goes through the design phase until a design and implementation plan is completed.
Once development starts, typically a new feature branch is added to an existing repo.
Then developers start committing code to the new branch and testing the new code locally or on a test environment.
Once the developers are satisfied with the new code changes, they submit a pull request to merge them to the main branch of the repo.
This is typically when the code changes go through a code review and different types of tests.
Once the review and tests are complete, the pull request is approved and the code is merged to the main branch.
This triggers a pipeline to deploy the code to production.

Where NOT to run SAST scans

To see the value of “Shifting-left” let’s first see the many disadvantages of running the SAST scan at the step which is furthest to the right which is the final pipeline that deploys the new code to production:

First of all, this gives very little time to triage and fix any issues and could lead to either delaying launches or deciding to launch without fixing the findings.
Also, if we run a full repo scan you will probably get many findings that were not introduced by the new feature and are not relevant to it. While it is still needed to fix these findings, it still doesn’t make sense to block the launch over these irrelevant findings, and this usually causes a lot of friction between security and engineering teams.
The needed fixes will need to redo many of the steps already performed for the new code like testing on a test environment, code review, .. etc, which adds a lot of unnecessary repetitive work.
A blocked pipeline for one feature that has findings could sometimes lead to another feature that has no findings also becoming delayed until the pipeline is unblocked.

As you can see, this is a terrible experience for everyone, security teams, developers, and even other teams that may not be involved at all.

Now let’s Shift-Left

Arguably a much better place to run the SAST scan would be When the Pull Request is submitted, as this is the stage where a “code review” is performed, and having the SAST scan findings provides useful input to be used for this review. This also gives the developers enough time to triage and fix the findings themselves.

Also, some SAST scanners support running a scan to only show the new findings introduced in the new branch instead of a full repo scan, which ensures the findings are relevant to the new feature and the developers have enough context to triage and fix the findings.

Example: Semgrep scan for Pull Requests

Let’s see how this could work using the open source version of Semgrep, and GitHub Actions. For this example, I’ve forked a sample Node.js application to the repo maboelkheir-test1/nodejs-shopping-cart.

Let’s start by cloning the repo.

git clone https://github.com/maboelkheir-test1/nodejs-shopping-cart
cd nodejs-shopping-cart

Next, let’s install semgrep and run a quick scan with the default auto configuration.

$ python3 -m pip install semgrep
$ semgrep --config auto
┌─────────────┐
│ Scan Status │
└─────────────┘
  Scanning 25 files tracked by git with 1707 Code rules, 606 Pro rules:
  Language      Rules   Files          Origin      Rules
 ─────────────────────────────        ───────────────────
  <multilang>      59      50          Community    1101
  js              241       4          Pro rules     606
  json              4       4
  yaml             28       2
  python          352       1
  bash              4       1
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┌──────────────────┐
│ 12 Code Findings │
└──────────────────┘
... Verbose findings redacted ...
┌──────────────┐
│ Scan Summary │
└──────────────┘
Some files were skipped or only partially analyzed.
  Scan was limited to files tracked by git.
Ran 1707 rules on 25 files: 12 findings.

As you can see we have 12 findings in our backlog. Note that these are existing findings that are not relevant to the feature we are going to develop.

In this repo, we have added the .github/workflows/semgrep.yml file with the below contents. This is the file that configures the GitHub Actions workflow that runs Semgrep when a Pull Request is submitted and uploads the findings as comments.

name: Semgrep CI
on:
 pull_request:
   paths:
     - '*.js'
     - '*.jsx'
jobs:
 semgrep:
   runs-on: ubuntu-latest
   container:
     image: returntocorp/semgrep:latest
   steps:
     - name: Checkout code
       uses: actions/checkout@v2
       with:
         fetch-depth: 0
     - name: Run Semgrep
       run: |
         env
         semgrep --config p/r2c --baseline-commit HEAD~ --json > findings.json
     - name: Set up Python environment
       uses: actions/setup-python@v2
       with:
         python-version: '3.x'
     
     - name: Install dependencies
       run: pip install requests
          
     - name: Post comments for each finding
       run: python post_comments.py
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Note that the command semgrep --config p/r2c --baseline-commit HEAD~ --json > findings.json runs the scan, and the option --baseline-commit HEAD~ only shows findings introduced in the new branch.

We have also added this post_comments.py file which uses GitHub’s REST API to upload the findings as comments. This is used from within the above GitHub Actions workflow:

import json
import os
import requests
from urllib.parse import quote

# Load the findings from the JSON file
with open('findings.json', 'r') as f:
    findings = json.load(f)
# Parse the repository and pull request number from the GITHUB_REF environment variable
pr_number = os.environ['GITHUB_REF'].split('/')[2]
owner = os.environ['GITHUB_REPOSITORY_OWNER']
repo = os.environ['GITHUB_REPOSITORY'].split('/')[1]
# Set up the headers for the GitHub API request
headers = {
    'Authorization': f'token {os.environ["GITHUB_TOKEN"]}',
    'Accept': 'application/vnd.github.v3+json',
}
# Set up the headers for the GitHub API request
headers = {
    'Authorization': f'token {os.environ["GITHUB_TOKEN"]}',
    'Accept': 'application/vnd.github.v3+json',
}
# Make the API request
response = requests.get(
    f'<https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/commits>',
    headers=headers,
)
# Parse the response JSON
commits = response.json()
# The latest commit is the first item in the list
latest_commit = commits[0]['sha']
# Iterate over the findings and post a comment for each one
for finding in findings['results']:
    body = f'''
## <img src="<https://semgrep.dev/docs/img/semgrep.svg>" width="30" height="30"> Semgrep finding
* **Rule ID:** {finding['check_id']}
* **File:** {finding['path']}
* **Line:** {finding['start']['line']}
* **Description:** {finding['extra']['message']}
* **Impact:** {finding['extra']['metadata']['impact']}
* **Confidence:** {finding['extra']['metadata']['confidence']}
* **Semgrep Rule:** [Link](<https://semgrep.dev/r/{quote(finding['check_id'])}>)
    '''
    payload = {
        'body': body,
        'commit_id': latest_commit,
        'path': finding['path'],
        'line': finding['start']['line'],
    }
    response = requests.post(
        f'<https://api.github.com/repos/{owner}/{repo}/pulls/{pr_number}/comments>',
        headers=headers,
        json=payload,
    )
    if response.status_code != 201:
        raise Exception(f'Failed to post comment: {response.content}')

Now let’s start development, let’s start by creating a new branch:

git checkout -b test_semgrep_pr

Now let’s add some extra code to app.js , and for the sake of testing, we will introduce this code snippet which is a route vulnerable to open-redirect at line 57:

app.get('/vulnerable', (req, res) => {
 if (req.query.url) {
 	res.redirect(req.query.url);
 } else {
 	res.redirect('<https://www.example.com>');
 }
});

Then we commit the changes and push the changes.

git commit -m "added vulnerable code to test semgrep PR scan" app.js
git push origin test_semgrep_pr

Now we can create the pull request

This will automatically trigger the GitHub Actions workflow shown above and this will generate a finding for the open-redirect vulnerability and add a comment to the PR as shown below at the line of the findings with the details. Note also that none of the 12 existing findings we initially got are shared here as they are not relevant to the new code.

Semgrep findings as Pull Request comment

As you can see this gives clear and actionable feedback to the developers which is relevant to the feature they are working on, and now they can work on fixing this as part of the code review.

Now let’s “Shift-Left” even more

Another possible place where we can run SAST scans is when code is being committed locally, which can be done through pre-commit hooks. This way developers would get relevant findings once they try to commit a code that introduces a vulnerability. This allows developers to discover and fix issues even earlier. However, the disadvantage of this approach is that it is much harder to enforce and monitor pre-commit hooks than PRs, so my recommendation is to use it along with PR scans not instead of it.

Example: Semgrep scan as a pre-commit hook

In your working directory, create the file .git/hooks/pre-commit with the below content and make it executable.

#!/bin/sh

# Get the last commit hash
last_commit=$(git rev-parse HEAD)
# Run Semgrep with the --baseline-commit option
semgrep --config p/r2c --baseline-commit $last_commit

Now follow the same steps as above, once you try to commit the vulnerable code you should get the below output showing the new finding.

$ git commit -m "added vulnerable code to test semgrep PR scan" app.js

┌─────────────┐
│ Scan Status │
└─────────────┘
  Scanning 1 file tracked by git with 137 Code rules, 62 Pro rules:
  Language      Rules   Files          Origin      Rules
 ─────────────────────────────        ───────────────────
  js               33       1          Community      75
  <multilang>       1       1          Pro rules      62
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
  Current version has 2 findings.
Creating git worktree from '5dddadc6c197aa1553541aaf54fa8104c118f823' to scan baseline.
┌─────────────┐
│ Scan Status │
└─────────────┘
  Scanning 1 file tracked by git with 2 Code rules:
  Scanning 1 file with 2 js rules.
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
┌────────────────┐
│ 1 Code Finding │
└────────────────┘
    app.js
       javascript.express.security.audit.express-open-redirect.express-open-redirect
          The application redirects to a URL specified by user-supplied input `req` that is not
          validated. This could redirect users to malicious locations. Consider using an allow-list
          approach to validate URLs, or warn users they are being redirected to a third-party website.
          Details: <https://sg.run/EpoP>

           60┆ res.redirect(req.query.url);

┌──────────────┐
│ Scan Summary │
└──────────────┘
Some files were skipped or only partially analyzed.
  Scan was limited to files changed since baseline commit.
Ran 137 rules on 1 file: 1 finding.
A new version of Semgrep is available. See <https://semgrep.dev/docs/upgrading>
[master c4faad1] added vulnerable code to test semgrep PR scan

Conclusion

To “Shift-Left” we need to understand the full development process in our organization and embed the needed security scans and activity at the earliest stage possible, and within the tools and workflows the developers already use. This makes the best use of the security scans or activities without adding unnecessary delays or friction with development teams, making the experience of developers and security teams much better, and strengthening collaboration.

In this story, we discussed one example for one activity (SAST scans) through a generic development process, but you can use the same approach and adapt it to whatever tool you want and to the process used in your organization.