Supercharging Your Code Quality with Semgrep SAST in GitHub Actions

Mostafa Mahmoud
7 min readJul 18, 2023

--

In today’s fast-paced world of software development, ensuring code quality and security is paramount. As developers, we strive to deliver bug-free, secure, and maintainable code to our users. One powerful tool that can help us achieve these goals is Semgrep. Semgrep is an open-source, fast, and efficient static analysis tool that can detect bugs, vulnerabilities, and enforce coding standards. By seamlessly integrating Semgrep into our GitHub Actions workflows, we can take our code quality to the next level.

In this article, we’ll explore the world of Semgrep and how it can supercharge our code quality efforts within the GitHub Actions ecosystem. We’ll dive into the benefits of using Semgrep as a static analysis tool, discuss its key features, and most importantly, learn how to integrate it into our GitHub Actions workflows. So, let’s embark on this exciting journey together and unlock the true potential of Semgrep!

Topics:

  • Installing Semgrep Locally
  • Trying Semgrep on Your Computer
  • Writing Custom Rules for Semgrep
  • Creating a Semgrep Workflow in GitHub Actions
  • Creating a Repository with Custom Rules
  • Establishing a Reusable Workflow
  • Pushing Towards Continuous Code Quality
  • Conclusion

Installing Semgrep Locally

To install Semgrep locally using Python.

Install:
python3 -m pip install semgrep

Conform it is installed:
semgrep --version

Trying Semgrep on Your Computer

Before diving into GitHub Actions, let’s put Semgrep to the test on our local codebase. We’ll demonstrate how to run Semgrep on your computer, scanning your code and uncovering potential bugs and vulnerabilities. This hands-on experience will help you familiarize yourself with Semgrep’s capabilities and see its impact firsthand.

Lets now run the semgrep on a Vulpy — Web Application Security Lab and save the result in a json file.

— config=auto will automatically obtains rules tailored to this project.

Writing Custom Rules for Semgrep

One of the strengths of Semgrep lies in its ability to enforce custom rules tailored to your project’s requirements. In this section, we’ll delve into the art of writing custom Semgrep rules. We’ll guide you through the process, sharing best practices and examples, enabling you to create rules that align perfectly with your coding standards and security policies.

To add a custom rule to the Semgrep scan, create a custom_rule.yaml file that detects the use of “print(‘Hello, World!’)” in the code

rules:
- id: Test_custome_rule
message: |
a custome rule for testing
languages: [python]
severity: ERROR
metadata:
category: security
cwe: "N/A"
subcategory: [N/A]
confidence: N/A
likelihood: N/A
impact: N/A
description: "Testing custome rule"
references:
- N/A

pattern: |
print("hello world")

Additionally, create a test.py file containing the code to print “Hello, World!”. Run Semgrep with the custom rule and save the results to a JSON file using the command

semgrep --config custom_rule.yaml --json test.py > results.json.

Creating a Semgrep Workflow in GitHub Actions

Now that we have a clear understanding of Semgrep and its capabilities, let’s dive into creating a Semgrep workflow within GitHub Actions. This workflow will automate the process of running Semgrep on our codebase and saving the results for further analysis. Follow these steps to set up the workflow:

Step 1: Clone the codebase to the runner.

Step 2: Run Semgrep on the codebase and save the results in a JSON file.

Step 3: Save the Semgrep results as a pipeline artifact.

Step 4: Download the Semgrep results from the pipeline artifact.

name: Semgrep Full Scan

on:
workflow_dispatch:
branches:
- main
schedule:
- cron: '0 1 * * 6'

jobs:

semgrep-full:
runs-on: ubuntu-latest
container:
image: returntocorp/semgrep

steps:

# step 1
- name: clone application source code
uses: actions/checkout@v3

# step 2
- name: full scan
run: |
semgrep \
--sarif --output report.sarif \
--metrics=off \
--config="p/default"

# step 3

- name: save report as pipeline artifact
uses: actions/upload-artifact@v3
with:
name: report.sarif
path: report.sarif

# step 4
- name: Download report
uses: actions/download-artifact@v2
with:
name: report.sarif

Creating a Repository with Custom Rules

To promote consistency and reusability, we’ll embark on a quest to create a repository dedicated to housing our custom Semgrep rules. This central repository will serve as a treasure trove of rules, ensuring all your projects can easily access and benefit from the same set of rules. We’ll guide you through the process of organizing, versioning, and sharing your rules repository.

Step 1: Checkout Semgrep Rules

Step 2: Checkout Code

Step 3: Install Semgrep

Step 4: Run Semgrep with Custom Rules

Step 5: Get the Current Date to Save the Report Date Name

Step 6: Push the Report to the Artifact Pipeline

Step 7: Read the Scan Summary Result

Step 8: Authenticate with AWS

Step 9: Upload the Report to AWS S3 Bucket

name: Semgrep weekly Scan
on:
workflow_call:
secrets:
Repo_token:
required: true
AWS_STS:
required: true
env:
BUCKET_NAME : "semgrep-report"
AWS_REGION : "eu-west-1"

permissions:
id-token: write # This is required for requesting the JWT
contents: read # This is required for actions/checkout

jobs:
semgrep-full:
runs-on: ubuntu-latest
steps:

#step1: Checkout semgrep rules
- name: Checkout Semgrep rules
uses: actions/checkout@v3
with:
repository: xxxx/Semgrep_Rules
ref: main
path: ./SemgrepRules
token: ${{ secrets.Repo_token }}


#step2: Checkout Code.
- name: clone application source code
uses: actions/checkout@v3
with:
path: ./code

#step3: Install Semgrep
- name: Install Semgrep
run: |
python -m pip install --upgrade pip
python3 -m pip install semgrep

#step4: Code scan
- name: Scan
env:
SEMGREP_RULES_PATH: ./SemgrepRules
run: |
python -m semgrep --config "p/ci" --config auto . --output $(date +"%d-%m-%Y").json --json > scanSummry.txt 2>&1

#step5: get current date
- name: Get current date
id: date
run: echo "::set-output name=date::$(date +'%d-%m-%Y')"

#step6: Push report to artifact.
- name: save report as pipeline artifact
uses: actions/upload-artifact@v3
with:
name: ${{ steps.date.outputs.date }}.json
path: .

#step7: cat file
- name: Get current file
id: scan_summary
run: cat scanSummry.txt

# step8: authenticate with AWS.
- name: configure aws credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ secrets.AWS_STS }}
role-session-name: GHASeesion
aws-region: ${{ env.AWS_REGION }}

# step9: Upload a file to AWS s3
- name: Uploading report to s3
run: |
aws s3 cp ./${{ steps.date.outputs.date }}.json s3://${{ env.BUCKET_NAME }}/${{ github.event.repository.name }}/

Note: When you need to checkout two different repositories within the same workflow, it’s important to ensure that your workflow has the necessary permissions to access both repositories. You can achieve this by using a token, such as ${{ secrets.Repo_token }}, with appropriate access rights. Additionally, make sure that the two repositories are checked out into different paths within the runner environment. This is because GitHub Actions will overwrite the first checkout repository with the second checkout repository if they are checked out into the same path. To avoid conflicts, specify different paths for each repository checkout, such as path: ./SemgrepRules and path: ./code.

Leveraging Reusable Workflows and Repository Templates

To promote consistency and efficiency across our repositories, we’ll explore the concept of reusable workflows and repository templates. We’ll discuss how to create a reusable Semgrep workflow that can be shared and used across multiple repositories.

name: Reusable Workflow user

on:
workflow_dispatch:

jobs:
Static_analysis:
uses: xxxx/DevSecOps_CICD/.github/workflows/SAST.yaml@main
secrets:
Repo_token: ${{ secrets.REPO_READ_PAT }}

By incorporating the reusable Semgrep workflow and setting repositories as templates, we unlock the power of consistency and efficiency in our code quality practices. With the reusable workflow, any changes made to the Semgrep configuration or rules will automatically apply to all repositories that utilize this workflow, ensuring a unified code analysis process. By setting repositories as templates with the Semgrep workflow already in place, we streamline the onboarding process for new projects, eliminating the need to manually configure Semgrep for each repository. This approach empowers development teams to maintain high standards of code quality effortlessly across their entire organization.

Conclusion

In this code quality quest, we have explored the seamless integration of Semgrep and GitHub Actions. We covered various aspects, including the installation and utilization of Semgrep on local machines, creating custom rules, incorporating Semgrep workflows into GitHub Actions, establishing a repository for custom rules, and implementing reusable workflows for consistency.

By integrating Semgrep into GitHub Actions, we have equipped ourselves with a powerful tool to elevate code quality efforts. Remember that the pursuit of code excellence is an ongoing journey. With Semgrep and GitHub Actions as trusted companions, we can continuously strive for continuous code quality, eliminate bugs, fortify code against vulnerabilities, and maintain coding standards.

Now, as developers, let’s forge ahead with confidence, knowing that Semgrep and GitHub Actions empower us to deliver code of impeccable quality and heightened security. The possibilities for enhanced code quality are boundless with Semgrep and GitHub Actions as invaluable allies!

references

how to create a custom rule https://semgrep.dev/docs/writing-rules/overview/

Configure the OIDC between github and AWS you can follow https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/

--

--