How to introduce Semgrep to your organization

MalBot · January 12, 2024, 2:30pm

By Maciej Domanski, Application Security Engineer

Semgrep, a static analysis tool for finding bugs and specific code patterns in more than 30 languages, is set apart by its ease of use, many built-in rules, and the ability to easily create custom rules. We consider it an essential automated tool for discovering security issues in a codebase. Since Semgrep can directly improve your code’s security, it’s easy to say, “Just use it!” But what does that mean?

Semgrep is designed to be flexible to fit your organization’s specific needs. To get the best results, it’s important to understand how to run Semgrep, which rules to use, and how to integrate it into the CI/CD pipeline. If you are unsure how to get started, here is our seven-step plan to determine how to best integrate Semgrep into your SDLC, based on what we’ve learned over the years.

The 7-step Semgrep plan

Review the list of supported languages to understand whether Semgrep can help you.

Explore: Try Semgrep on a small project to evaluate its effectiveness. For example, navigate into the root directory of a project and run:
$ semgrep --config auto

There are a few important notes to consider when running this command:
- The --config auto option submits metrics to Semgrep, which may not be desirable.
- Invoking Semgrep in this way will present an overview of identified issues, including the number and severity. In general, you can use this CLI flag to gain a broad view of the technologies covered by Semgrep.
- Semgrep identifies programming languages by file extensions rather than analyzing their contents. Some paths are excluded from scanning by default using the default .semgrepignore file. Additionally, Semgrep excludes untracked files listed in a .gitignore file.

Dive deep: Instead of using the auto option, use the Semgrep Registry to select rulesets based on key security patterns, and your tech stack and needs.

Fine-tune: Obtain your ideal rulesets chain by reviewing the effectiveness of currently used rulesets.
- Check out non-security rulesets, too, such as best practices rules. This will enhance code readability and may prevent the introduction of vulnerabilities in the future. Also, consider covering other aspects of your project:
  - Shell scripts, configuration files, generic files, Dockerfiles
  - Third-party dependencies (Semgrep Supply Chain, a paid feature, can help you detect if you are using the vulnerable package in an exploitable way)
- To ignore the incorrect code pattern by Semgrep, use a comment in your code on the first line of a preceding line of the pattern match, e.g., // nosemgrep: go.lang.security.audit.xss. Also, explain why you decided to disable a rule or provide a risk-acceptance reason.
- Create a customized .semgrepignore file to reduce noise by excluding specific files or folders from the Semgrep scan. Semgrep ignores files listed in .gitignore by default. To maintain this, after creating a .semgrepignore file, add .gitignore to your .semgrepignore with the pattern :include .gitignore.

Create an internal repository to aggregate custom Semgrep rules specific to your organization. A README file should include a short tutorial on using Semgrep, applying custom rules from your repository, and an inventory table of custom rules. Also, a contribution checklist will allow your team to maintain the quality level of the rules (see the Trail of Bits Semgrep rule development checklist). Ensure that adding a new Semgrep rule to your internal Semgrep repository includes a peer review process to reduce false positives/negatives.

Evangelize: Train developers and other relevant teams on effectively using Semgrep.
- Present pilot test results and advice on improving the organization’s code quality and security. Show potential Semgrep limitations (single-file analysis only).
- Include the official Learn Semgrep resource and present the Semgrep Playground with “simple mode” for easy rule creation.
- Provide an overview of how to write custom rules and emphasize that writing custom Semgrep rules is easy. Mention that the custom rules can be extended with the auto-fix feature using the fix: key. Encourage using metadata (i.e., CWE, confidence, likelihood, impact) in custom rules to support the vulnerability management process.
- To help a developer answer the question, “Should I create a Semgrep rule for this problem?” you can use these follow-up questions:
  - Can we detect a specific security vulnerability?
  - Can we enforce best practices/conventions or maintain code consistency?
  - Can we optimize the code by detecting code patterns that affect performance?
  - Can we validate a specific business requirement or constraint?
  - Can we identify deprecated/unused code?
  - Can we spot any misconfiguration in a configuration file?
  - Is this a recurring question as you review your code?
  - How is code documentation handled, and what are the requirements for documentation?
- Create places for the team to discuss Semgrep, write custom rules, troubleshoot (e.g., a Slack channel), and jot down ideas for Semgrep rules (e.g., on a Trello board). Also, consider writing custom rules for bugs found during your organization’s security audits/bug bounty program. A good idea is to aggregate quick notes to help your team use Semgrep (see the appendix below).
- Pay attention to the Semgrep Community Slack, where the Semgrep community helps with problems or writing custom rules.
- Encourage the team to report existing limitations/bugs while using Semgrep to the Semgrep team by filling out GitHub issues (see this example issue submitted by Trail of Bits).

Implement Semgrep in the CI/CD pipeline by getting acquainted with the Semgrep documentation related to your CI vendor. Incorporating Semgrep incrementally is important to avoid overwhelming developers with too many results. So, try out a pilot test first on a repository. Then, implement the full Semgrep scan on a schedule on the main branch in the CI/CD pipeline. Finally, include a diff-aware scanning approach when an event triggers (e.g., a pull/merge request). A diff-aware approach scans only changes in files on a trigger, maintaining efficiency. This approach should examine a fine-tuned set of rules that provide high confidence and true positive results. Once the Semgrep implementation is mature, configure Semgrep in the CI/CD pipeline to block the PR pipeline with unresolved Semgrep findings.

What’s next? Maximizing the value of Semgrep in your organization

As you introduce Semgrep to your organization, remember that it undergoes frequent updates. To make the most of its benefits, assign one person in your organization to be responsible for analyzing new features (e.g., Semgrep Pro, which extends codebase scanning with inter-file coding paradigms instead of Semgrep’s single-file approach), informing the team about external repositories of Semgrep rules, and determining the value of the paid subscription (e.g., access to premium rules).

Furthermore, use the Trail of Bits Testing Handbook, a concise guide that helps developers and security professionals maximize the potential of static and dynamic analysis tools. The first chapter of this handbook focuses specifically on Semgrep. Check it out to learn more!

Appendix: Things I wish I’d known before I started using Semgrep

Using Semgrep

Use the --sarif output flag with the Sarif Viewer extension in Visual Studio Code to efficiently navigate through the identified code.
The --config auto option may miss some vulnerabilities. Manual language selection (--lang) and rulesets can be more effective.
You can use the alias: alias semgrep="semgrep --metrics=off" or SEMGREP_SEND_METRICS environment variable to remember to disable metrics.
Use the ephemeral rules, e.g., semgrep -e ‘exec(...)’ —lang=py ./, to quickly use Semgrep in the style of the grep tool.
You can use the autocomplete feature to use the TAB key to work faster with the command line.
You can run several predefined configurations simultaneously: semgrep --config p/cwe-top-25 --config p/jwt.
A Semgrep Pro Engine feature removes Semgrep’s limitations in analyzing only single files.
Rules from the Semgrep Registry can be tested in a playground (see Trail of Bits anonymous-race-condition rule).
Metavariable analysis supports two analyzers: redos and entropy.
You can use metavariable-pattern to match patterns across different languages within a single file (e.g., JavaScript embedded in HTML).
The focus-metavariable can reduce false positives in taint mode.

Writing rules

Metavariables must be capitalized: $A, not $a
Use pattern-regex: (?s)\A.*\Z pattern to identify a file that does not contain a specific string (see example)
When writing a regular expression in multiple lines, use the >- characters, not |. The | character writes a newline character (\n) and will likely cause the regex to fail (see example)
You can use typed metavariables, e.g., $X == (String $Y)
Semgrep supports variable assignment statements in the following way:
You can use the method chaining:
The Deep Expression Operator matches complex, nested expressions using the syntax
<... pattern ...>
It is possible to apply specific rules to specific paths using the paths keyword (see the avoid-apt-get-upgrade rule, which applies only to Dockerfiles):

    paths:
      include:
        - "*dockerfile*"
        - "*Dockerfile*"

And last, Trail of Bits has a public Semgrep rules repository! Check it out here and use it immediately with the semgrep --config p/trailofbits command.

Useful links

For more on creating custom rules, read our blogs on machine learning libraries and discovering goroutine leaks.

We’ve compiled a list of additional resources to further assist you in your Semgrep adoption process. These links provide a variety of perspectives and detailed information about the tool, its applications, and the community that supports it:

Languages and technologies supported by Semgrep
Semgrep Privacy Policy
p/default, p/owasp-top-ten, p/cwe-top-25 rulesets
Ignoring files, folders, or code in Semgrep
Experimental feature: generic pattern matching
Tips and tricks for writing fixes
Getting started with Semgrep in continuous integration
Semgrep Community Slack

Article Link: How to introduce Semgrep to your organization | Trail of Bits Blog