By Matt Schwager and Travis Peters
Introduction to Malware Binary Triage (IMBT) Course
Looking to level up your skills? Get 10% off using coupon code: MWNEWS10 for any flavor of the Malware Binary Triage (IMBT) course starting this Black Friday and Cyber Monday!
Enroll Now and Save 10%: Coupon Code MWNEWS10
Note: This is an affiliate link – your enrollment helps support this platform at no extra cost to you.
We are publishing another set of custom Semgrep rules, bringing our total number of public rules to 115. This blog post will briefly cover the new rules, then explore two Semgrep features in depth: regex mode (especially how it compares against generic mode), and HCL language support for technologies such as Terraform and Nomad. With these features, we can search for security vulnerabilities in more than just application code. This new release joins our existing collection of Semgrep rules, our public CodeQL queries, and our Testing Handbook as part of our long-term effort to share our technical expertise with the security community.
Semgrep is a vast and capable tool, and it contains many nooks and crannies that can be exploited to get the most value possible out of a static analysis tool. Like our previous Semgrep rules release post, this post will highlight some interesting Semgrep functionality. Publicly releasing rules is a great start, but we feel that we can do even better by explaining why rules are written the way they are.
For this release, we focused on supply chain issues related to a lack of short-lived OIDC tokens in GitHub Actions; infrastructure concerns in Terraform code, Nomad jobs, and insecure database connections; and general application security concerns in Ruby code. Many of these Ruby rules were written during our recent Ruby Central (rubygems.org) audit. We will be publishing more information about this audit shortly.
Without further ado, here are our new rules:
Mode | Rule ID | Rule description |
---|---|---|
Ruby | action-dispatch-insecure-ssl |
Found Rails application with insecure SSL setting. |
Ruby | action-mailer-insecure-tls |
Found ActionMailer SMTP configuration with insecure TLS setting. These settings do not require a successful, encrypted, verified TLS connection is made. Set enable_starttls: true and openssl_verify_mode to verify peer. |
Ruby | active-record-encrypts-misorder |
Found an ActiveRecord value with encryption before serialization. The declaration of the serialized attribute should go before the encryption declaration. |
Ruby | active-record-hardcoded-encryption-key |
Found hard-coded ActiveRecord encryption key. |
Ruby | global-timeout |
Found Timeout::timeout (or timeout ) use. Setting a global timeout can cause an exception to be raised anywhere in the passed block of code. This precludes any possible clean up action typically associated with rescuing from exceptions. This can lead to denial-of-service, data integrity failure, and general availability concerns. Instead prefer to use the library’s built in timeout functionality, if it has any, to ensure processing happens as expected. If it does not have built in timeout functionality, then consider implementing it. |
Ruby | faraday-disable-verification |
Found Faraday HTTP request disabling SSL/TLS verification. |
Ruby | ruby-saml-skip-validation |
SAML response validation disabled for $KEY . |
Ruby | yaml-unsafe-load |
Found YAML call to unsafe_load . This can lead to deserialization bugs and RCE. |
Ruby | rails-cookie-attributes |
Found Rails cookie set with insecure attribute. |
Ruby | rails-cache-store-marshal |
Found Rails cache store configured to allow Marshaling. As of Rails 7.1 the default serializer is :marshal_7_1 . If an attacker can inject data into the cache store (SSRF, etc.), then they can achieve code execution when the object is later deserialized. Consider using the :message_pack serializer or a custom serializer. |
Ruby | json-create-deserialization |
Found json_create class method. This implies custom JSON deserialization is occuring. This can lead to RCE and other deserialization-type bugs. Usage should be audited and, at least, fuzzed. |
Ruby | insecure-rails-cookie-session-store |
Found Rails session cookie missing SameSite=Secure . As of Rails 7.2, session cookies default to SameSite=Lax . |
Ruby | rest-client-disable-verification |
Found RestClient HTTP request disabling SSL/TLS verification. |
Regex | postgres-insecure-sslmode |
Found PostgreSQL connection string disabling SSL verification. |
Regex | mongodb-insecure-transport |
Found insecure MongoDB connection, prefer TLS encrypted transport by setting the tls=true connection option and ensuring proper verification. |
Regex | mysql-insecure-sslmode |
Found MySQL connection string disabling SSL verification. |
Generic | amqp-unencrypted-transport |
Found unencrypted AMQP connection, prefer TLS encrypted amqps:// transport. |
Generic | redis-unencrypted-transport |
Found unencrypted Redis connection, prefer TLS encrypted rediss:// transport. |
Generic | node-disable-certificate-validation |
Setting this environment variable disables TLS certificate validation. This makes TLS, and HTTPS by extension, insecure. The use of this environment variable is strongly discouraged. |
HCL | aws-oidc-role-policy-duplicate-condition |
Found AWS role policy for GitHub Actions with duplicate condition. This overrides previous conditions, and the last condition with the duplicated key “wins.” This likely breaks access controls and allows unauthorized access. |
HCL | aws-oidc-role-policy-missing-sub |
Found AWS role policy for GitHub Actions missing OIDC subject. This means any GitHub repository can assume this role in CI. |
HCL | vault-hardcoded-token |
Found Terraform Vault instance with hard-coded token. |
HCL | vault-skip-tls-verify |
Found Terraform Vault instance with TLS verification disabled. |
HCL | root-user |
Found Nomad task using root user. |
HCL | docker-hardcoded-password |
Found Nomad task using Docker auth with hard-coded password. |
HCL | docker-privileged-mode |
Found Nomad task using Docker containers in privileged mode. |
HCL | tls-hostname-verification-disabled |
Found Nomad tls block with server hostname verification disabled. |
HCL | podman-tls-verify-disabled |
Found Nomad task using Podman with registry TLS verification disabled. |
YAML | jfrog-hardcoded-credential |
Found long-term access key. Instead prefer JFrog temporary OIDC security credentials. |
YAML | aws-secret-key |
Found long-term access key. Instead prefer AWS role assumption and temporary OIDC security credentials. |
YAML | gcp-credentials-json |
Found long-term access key. Instead prefer GCP workload identity federation and temporary OIDC security credentials. |
YAML | rubygems-publish-key |
Found long-term access key. Instead prefer RubyGems trusted publishing and temporary OIDC security credentials. |
YAML | vault-token |
Found long-term access key. Instead prefer Vault role assumption and temporary OIDC security credentials. |
YAML | pypi-publish-password |
Found long-term access key. Instead prefer PyPI trusted publishing and temporary OIDC security credentials. |
YAML | azure-principal-secret |
Found long-term access key. Instead prefer Azure subscription ID and temporary OIDC security credentials. |
Semgrep isn’t just for programming languages
The first post in this series included perspectives on two lesser-known Semgrep features: generic mode and YAML support. This post introduces two additional considerations: regular expressions vs. generic mode and HashiCorp Configuration Language (HCL) support for infrastructure-as-code (IaC) security. We will continue the trend of bringing Semgrep to all forms of textual data.
Heuristics: Regular expressions vs. generic-mode
Regular expression patterns are another lesser-known feature of Semgrep. This is the so-called pattern-regex
operator and regex
language. But why would you want to use regular expressions in Semgrep rules? Doesn’t that defeat the purpose of static analysis tools like Semgrep? Why not simply use ripgrep
or classic grep
? Doesn’t generic
mode obviate the need for regex
mode?
The following heuristics will help you understand when to use regex
mode. The more “yeses” you answer below, the more likely you should be using regex
mode.
Heuristic #1: Does the text you are looking for generally span a single line of code?
Dealing with multi-line whitespace in a regular expression is a pain. If you find yourself searching for multi-line patterns, and language-specific rules aren’t possible, then you will probably be best served by generic
mode. So remember: when using regex
mode, the text you’re searching for will almost always span a single line.
Heuristic #2: Does this pattern exist in many languages or types of text files?
The beauty of Semgrep is that it’s a one-stop-shop for all things textual analysis. If the text you are searching for may exist in many languages, then it may be a good fit for regex mode. For example, consider URL parameters. If you’re searching for, say, sslmode=disable
, then the following regular expression would be a good start: [?&]sslmode=(disable|allow|prefer)
. This is great because it will find this insecure URL parameter in any connection URI to any PostgreSQL library in any language. We don’t have to write separate rules for separate libraries and languages. It will also find this pattern in shell scripts, documentation, CI jobs, and more.
Heuristic #3: Do you want to share your regular expressions with others?
Again, the beauty of Semgrep is that it consolidates the functionality of tools like ripgrep
or classic grep under a single tool. ripgrep
can be useful when you’re quickly iterating on regular expressions and searching through your code for patterns, but Semgrep rules really shine once it comes time to codify, test, and publish a regex. Your regex findings will exist next to your Python and Kubernetes findings, and you can track all of your findings and manage rules from a single location.
Heuristic #4: Do you need to match specific characters or character classes?
Regex
mode and generic mode often serve similar needs. Our previous post discussed the advantages of generic
mode, so when should you use regex
mode? Regex
mode is preferred over generic
mode when you would like to match specific characters or character classes, or use other regular expression functionality such as alternation. For example, in the sslmode
regular expression above, we search for sslmode prefixed by a character class with ?
and &
. These two prefixes give us additional confidence that what we find will in fact be a URL parameter. As far as we know, there is not an easy way to express this in generic
mode. We can always use pattern-either
, but this can get quite verbose for more complex expressions. On the other hand, generic mode’s primary advantage is that it supports the ellipsis operator (i.e., ...
), which allows easily skipping non-matching elements and whitespace used in multi-line patterns.
As you can see, there are often multiple ways to approach searching for specific code patterns in Semgrep. The heuristics above provide a good baseline for when you may want to use regex
mode. The more important consideration is that regex
mode exists, and it’s a valuable tool in your toolbelt when searching through textual data.
HCL support and IaC security
Infrastructure as Code (IaC) has transformed cloud management. It brings faster deployments, improved consistency and repeatability, and better security through version-control environments that previously relied on manual configurations. By codifying infrastructure, organizations can seamlessly integrate these definitions with CI/CD pipelines thus enabling automated testing, deployment, and static analysis.
HashiCorp Configuration Language (HCL) is foundational to many IaC tools, including Terraform, Nomad, and Consul. Recognizing the increasing importance of IaC, Semgrep introduced HCL support back in 2021. With dedicated HCL support, Semgrep now allows for a unified approach, bringing the same level of scrutiny to both application code and infrastructure configurations, ensuring they work together harmoniously within CI/CD pipelines.
We’ve learned that even the most straightforward Semgrep rules can uncover significant issues that continue to pose risks in 2024. Take, for example, the common practice of disabling TLS verification during development. If this configuration is inadvertently deployed, it could expose sensitive data. Here’s how easy it is to detect such vulnerabilities in Vault infrastructure with Semgrep:
rules: - id: vault-skip-tls-verify message: | Found Terraform Vault instance with TLS verification disabled languages: [hcl] severity: WARNING patterns: - pattern-inside: provider "vault" { ... } - pattern: skip_tls_verify = true
Figure 1: Semgrep rule searching for disabled TLS verification (hcl/terraform/vault-skip-tls-verify.yaml)
Another frequent misstep is hard-coding credentials—a security risk that Semgrep can easily catch:
rules: - id: vault-hardcoded-token message: | Found Terraform Vault instance with hardcoded token languages: [hcl] severity: WARNING patterns: - pattern-inside: provider "vault" { ... } - pattern: token = "..."
Figure 2: Semgrep rule search for hardcoded Vault tokens (hcl/terraform/vault-hardcoded-token.yaml)
By coupling this step with configuring your CI/CD pipelines to block PRs with unresolved Semgrep findings (one of our recommended practices), you can easily keep these issues out of production infrastructure.
HCL’s structured nature also makes it particularly effective for detecting more complex patterns and ensuring that we keep false positives as low as possible. For instance, consider the following rule that identifies AWS role policies for GitHub Actions that are missing the OIDC subject—a critical misconfiguration that could allow any GitHub repository to assume the role in CI:
rules: - id: aws-oidc-role-policy-missing-sub message: | Found AWS role policy for GitHub Actions missing OIDC subject. This means any GitHub repository can assume this role in CI. languages: [hcl] severity: WARNING patterns: - pattern-inside: | { ... Statement = [...] ... } - pattern-inside: | { ..., "Action": "sts:AssumeRoleWithWebIdentity", ... } - pattern: | { ... "Condition": { ... "StringEquals": { ... "token.actions.githubusercontent.com:aud": ..., ... } ... } ... } - pattern-not: | { ... "Condition": { ... "StringEquals": { ... "token.actions.githubusercontent.com:sub": ..., ... "token.actions.githubusercontent.com:aud": ..., ... } ... } ... } # Remain pattern-nots truncated to save space
Figure 3: Semgrep rule searching for missing OIDC subjects (hcl/terraform/aws-oidc-role-policy-missing-sub.yaml)
Role policies for GitHub Actions can be configured in many ways, and we can use pattern-inside
and pattern-not
to properly contextualize the pattern we are looking for (i.e., instances where the subject is not defined). This rule is a powerful example of how Semgrep can help enforce security policies and prevent configuration errors that could lead to serious vulnerabilities.
Text is the universal interface
If text is the universal interface, then Semgrep can help secure arbitrary interfaces, from bytes and strings to IaC, YAML, and more. Combining the power of Semgrep with regular expressions, generic
mode, YAML, and IaC support allows us to go beyond just code in programming languages. As the industry moves everything toward “as-code” solutions, we need to be able to apply scalable tooling to domains like supply chain, CI/CD, and IaC.
With IaC, you can apply the same rigor of static analysis to your infrastructure as you do to your application code, catching issues early and avoiding costly mistakes in production—“shifting left,” as it were. Manual audits and dynamic scans against production environments are slow and do not scale well. We encourage you to try out our newly released Terraform and Nomad rules, explore Semgrep’s terraform
rules, and consider incorporating them into your projects. To our knowledge, these are the first open-source Semgrep rules targeting Nomad—a fact we’re excited to share with the community, hoping to inspire others to build upon them.
If you’d like to read more about our work on Semgrep, we have used its capabilities in several ways, such as securing machine learning pipelines, discovering goroutine leaks, and securing Apollo GraphQL servers.
Contact us if you’re interested in custom Semgrep rules for your project!
Article Link: 35 more Semgrep rules: infrastructure, supply chain, and Ruby | Trail of Bits Blog