Publishing Trail of Bits’ CodeQL queries

MalBot · December 6, 2023, 2:20pm

By Paweł Płatek

We are publishing a set of custom CodeQL queries for Go and C. We have used them to find critical issues that the standard CodeQL queries would have missed. This new release of a continuously updated repository of CodeQL queries joins our public Semgrep rules and Automated Testing Handbook in an effort to share our technical expertise with the community.

For the initial release of our internal CodeQL queries, we focused on issues like misused cryptography, insecure file permissions, and bugs in string methods:

Language	Query name	Vulnerability description
Go	Message not hashed before signature verification	This query detects calls to (EC)DSA APIs with a message that was not hashed. If the message is longer than the expected hash digest size, it is silently truncated.
Go	File permission flaws	This query finds non-octal (e.g., `755` vs `0o755`) and unsupported (e.g., `04666`) literals used as a filesystem permission parameter (`FileMode`).
Go	`Trim` functions misuse	This query finds calls to `string.{Trim,TrimLeft,TrimRight}` with the second argument not being a cutset but a continuous substring to be trimmed.
Go	Missing `MinVersion` in `tls.Config`	This query finds cases when you do not set the `tls.Config.MinVersion` explicitly for servers. By default, version 1.0 is used, which is considered insecure. This query does not mark explicitly set insecure versions.
C	`CStrNFinder`	This query finds calls to functions that take a string and its size as separate arguments (e.g., `strncmp`, `strncat`) but the size argument is wrong.
C	Missing null terminator	This query finds incorrectly initialized strings that are passed to functions expecting null-byte-terminated strings.

CodeQL 101

CodeQL is the static analysis tool powering GitHub Advanced Security and is widely used throughout the community to discover vulnerabilities. CodeQL operates by transforming the code being tested into a database that is queryable using a Datalog-like language. While the core engine of CodeQL remains proprietary and closed source, the tool offers open-source libraries implementing various analyses and sets of security queries.

To test our queries, install the CodeQL CLI by following the official documentation. Once the CodeQL CLI is ready, download Trail of Bits’ query packs and check whether the new queries are detected:

codeql pack download trailofbits/{cpp,go}-queries
codeql resolve qlpacks | grep trailofbits

Now go to your project’s root directory and generate a CodeQL database, specifying either go or cpp as the programming language:

codeql database create codeql.db --language go

If the generation hasn’t succeeded or the project has a complex build system, use the command flag. Finally, execute Trail of Bits’ queries against the database:

codeql database analyze database.db --format=sarif-latest --output=./tob.sarif -- trailofbits/go-queries

Output of the analysis is in the Static Analysis Results Interchange Format (SARIF). Use Visual Studio Code with SARIF Viewer plugin to open it and triage findings. Alternatively, upload results to GitHub or use --format csv to get results in text form.

(EC)DSA silent input truncation in Go

Let’s sign the /etc/passwd file using ECDSA. Is the following implementation secure?

func main() {
    privateKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil { panic(err) }
data, err := os.ReadFile("/etc/passwd")
if err != nil { panic(err) }

sig, err := ecdsa.SignASN1(rand.Reader, privateKey, data)
if err != nil { panic(err) }
fmt.Printf("signature: %x\n", sig)

valid := ecdsa.VerifyASN1(&amp;privateKey.PublicKey, data, sig)
fmt.Println("signature verified:", valid)

}

Figure 1: An example signature generation and verification function

Of course it isn’t. The issue lies in passing raw, unhashed, and potentially long data to the ecdsa.SignASN1 and ecdsa.VerifyASN1 methods, while the Go crypto/ecdsa package (and a few other packages) expects data for signing and verification to be a hash of the actual data.

This behavior means that the code signs and verifies only the first 32 bytes of the file, as the size of the P-256 curve used in the example is 32 bytes.

The silent truncation of input data occurs in the hashToNat method, which is used internally by the ecdsa.{SignASN1,VerifyASN1} methods:

// hashToNat sets e to the left-most bits of hash, according to
// SEC 1, Section 4.1.3, point 5 and Section 4.1.4, point 3.
func hashToNat[Point nistPoint[Point]](c *nistCurve[Point], e *bigmod.Nat, hash []byte) {
    // ECDSA asks us to take the left-most log2(N) bits of hash, and use them as
    // an integer modulo N. This is the absolute worst of all worlds: we still
    // have to reduce, because the result might still overflow N, but to take
    // the left-most bits for P-521 we have to do a right shift.
    if size := c.N.Size(); len(hash) > size {
        hash = hash[:size]

Figure 2: The silent truncation of input data (crypto/ecdsa/ecdsa.go)

We have seen this vulnerability in real-world codebases and the impact was critical. To address the issue, there are a couple of approaches:

Length validation. A simple approach to prevent the lack-of-hashing issues is to validate the length of the provided data, as done in the go-ethereum library.
```
func VerifySignature(pubkey, msg, signature []byte) bool {
    if len(msg) != 32 || len(signature) != 64 || len(pubkey) == 0 {
        return false
    }
```
Figure 3: Validation function from the go-ethereum library
(go-ethereum/crypto/secp256k1/secp256.go#126–129)
Static detection. Another approach is to statically detect the lack of hashing. For this purpose, we developed the tob/go/msg-not-hashed-sig-verify query, which detects all data flows to potentially problematic methods, ignoring flows that initiate from or go through a hashing function or slicing operation.
An interesting problem we had to solve was how to set starting points (sources) for the data flow analysis? We could have used the UntrustedFlowSource class for that purpose. Then the analysis would be finding flows from any input potentially controlled by an attacker. However, UntrustedFlowSource often needs to be extended per project to be useful, so using it for our analysis would result in a lot of flows missed for a lot of projects. Therefore, our query focuses on finding the longest data flows, which are more likely to indicate potential vulnerabilities.

File permissions flaws in Go

Can you spot a bug in the code below?

if err := os.Chmod(“./secret_key”, 400); err != nil {
    return
}

Figure 4: Buggy Go code

Okay, so file permissions are usually represented as octal integers. In our case, the secret key file would end up with the permission set to 0o620 (or rw--w----), allowing non-owners to modify the file. The integer literal used in the call to the os.Chmod method is—most probably—not the one that a developer wanted to use.

To find unexpected integer values used as FileModes, we implemented a WYSIWYG (“what you see is what you get”) heuristic in the tob/go/file-perms-flaws CodeQL query. The “what you see” is a cleaned-up integer literal (a hard-coded number of the FileMode type)—with removed underscores, a removed base prefix, and left-padded zeros. The “what you get” is the same integer converted to an octal representation. If these two parts are not equal, there may be a bug present.

// what you see
fileModeAsSeen = ("000" + fileModeLitStr.replaceAll("_", "").regexpCapture("(0o|0x|0b)?(.+)", 2)).regexpCapture("0*(.{3,})", 1)
// what you get

and fileModeAsOctal = octalFileMode(fileModeInt)
// what you see != what you get

and fileModeAsSeen != fileModeAsOctal

Figure 5: The WYSIWYG heuristic in CodeQL

To minimize false positives, we filter out numbers that are commonly used constants (like 0755 or 0644) but in decimal or hexadecimal form. These known, valid constants are explicitly defined in the isKnownValidConstant predicate. Here is how we implemented this predicate:

predicate isKnownValidConstant(string fileMode) {
  fileMode = ["365", "420", "436", "438", "511", "509", "493"]
  or
  fileMode = ["0x16d", "0x1a4", "0x1b4", "0x1b6", "0x1ff", "0x1fd", "0x1ed"]
}

Figure 6: The CodeQL predicate that filters out common file permission constants

Using non-octal representation of numbers isn’t the only possible pitfall when dealing with file permissions. Another issue to be aware of is the use of more than nine bits in calls to permission-changing methods. File permissions are encoded only as the first nine bits, and the other bits encode file modes such as sticky bit or setuid. Some permission changing methods—like os.Chmod or os.Mkdir—ignore a subset of the mode bits, depending on the operating system. The tob/go/file-perms-flaws query warns about this issue as well.

String trimming misuses in Go

API ambiguities are a common source of errors, especially when there are multiple methods with similar names and purposes accepting the same set of arguments. This is the case for Go’s strings.Trim family of methods. Consider the following calls:

strings.TrimLeft("file://FinnAndHengest", "file://")
strings.TrimPrefix("file://FinnAndHengest", "file://")

Figure 7: Ambiguous Trim methods

Can you tell the difference between these calls and determine which one works “as expected”?

According to the documentation, the strings.TrimLeft method accepts a cutset (i.e., a set of characters) for removal, rather than a prefix. Consequently, it deletes more characters than one would expect. While the above example may seem innocent, a bug in a cross-site scripting (XSS) sanitization function, for example, could have devastating consequences.

When looking for misused strings.Trim{Left,Right} calls, the tricky part is defining what qualifies as “expected” behavior. To address this challenge, we developed the tob/go/trim-misuse CodeQL query with simple heuristics to differentiate between valid and possibly mistaken calls, based on the cutset argument. We consider a Trim operation invalid if the argument contains repeated characters or meets all of the following conditions:

Is longer than two characters
Contains at least two consecutive alphanumeric characters
Is not a common list of continuous characters

While the heuristics look oversimplified, they worked well enough in our audits. In CodeQL, the above rules are implemented as shown below. The cutset is a variable corresponding to the cutset argument of a strings.Trim{Left,Right} method call.

// repeated characters imply the bug
cutset.length() != unique(string c | c = cutset.charAt(_) | c).length()
or

(

// long strings are considered suspicious

cutset.length() > 2
// at least one alphanumeric

and exists(cutset.regexpFind(“[a-zA-Z0-9]{2}”, _, _))
// exclude probable false-positives

and not cutset.matches(“%1234567%”)

and not cutset.matches(“%abcdefghijklmnopqrstuvwxyz%”)

)

Figure 8: CodeQL implementation of heuristics for a Trim operation

Interestingly, misuses of the strings.Trim methods are so common that Go developers are considering deprecating and replacing the problematic functions.

Identifying missing minimum TLS version configurations in Go

When using static analysis tools, it’s important to know their limitations. The official go/insecure-tls CodeQL query finds TLS configurations that accept insecure (outdated) TLS versions (e.g., SSLv3, TLSv1.1). It accomplishes that task by comparing values provided to the configuration’s MinVersion and MaxVersion settings against a list of deprecated versions. However, the query does not warn about configurations that do not explicitly set the MinVersion.

Why should this be a concern? The reason is that the default MinVersion for servers is TLSv1.0. Therefore, in the example below, the official query would mark only server_explicit as insecurely configured, despite both servers using the same MinVersion.

server_explicit := &http.Server{
    TLSConfig: &tls.Config{MinVersion: tls.VersionTLS10}
}
server_default := &http.Server{TLSConfig: &tls.Config{}}

Figure 9: Explicit and default configuration of the MinVersion setting

The severity of this issue is rather low since the default MinVersion for clients is a secure TLSv1.2. Nevertheless, we filled the gap and developed the tob/go/missing-min-version-tls CodeQL query, which detects tls.Config structures without the MinVersion field explicitly set. The query skips reporting configurations used for clients and limits false positives by filtering out findings where the MinVersion is set after the structure initialization.

String bugs in C and C++

Building on top of the insightful cstrnfinder research conducted by one of my Trail of Bits colleagues, we developed the tob/cpp/cstrnfinder query. This query aims to identify invalid numeric constants provided to calls to functions that expect a string and its corresponding size as input—such as strncmp, strncpy, and memmove. We focused on detecting three erroneous cases:

Buffer underread. This occurs when the size argument (number 20 in the example below) is slightly smaller than the source string’s length:
```
if (!strncmp(argv[1], "org/tob/test/SafeData", 20)) {
    puts("Secure");
} else {
    puts("Not secure");
}
```
Figure 10: A buffer underread bug example

Here, the length of the "org/tob/test/SafeData" string is 21 bytes (22 if we count the terminating null byte). However, we are comparing only the first 20 bytes. Therefore, a string like "org/tob/test/SafeDatX" is incorrectly matched.
Buffer overread. This arises when the size argument (14 in the example below) is greater than the length of the input string, causing the function to read out of bounds.
```
int check(const char *password) {
    const char pass[] = "Silmarillion";
    return memcmp(password, pass, 14);
}
```
Figure 11: A buffer overread bug example

In the example, the length of the "Silmarillion" string is 12 bytes (13 with the null byte). If the password is longer than 13 bytes and starts with the "Silmarillion" substring, then the memcmp function reads data outside of the pass buffer. While functions operating on strings stop reading input buffers on a null byte and will not overread the input, the memcmp function operates on bytes and does not stop on null bytes.
Incorrect use of string concatenation function. If the size argument (BUFSIZE-1 in the example below) is greater than the source string’s length (the length of “, Beowulf\x00”, so 10 bytes), the size argument may be incorrectly interpreted as the destination buffer’s size (BUFSIZE bytes in the example), instead of the input string’s size. This may indicate a buffer overflow vulnerability.
```
#define BUFSIZE 256
char all_books[BUFSIZE];

FILE *books_f = fopen(“books.txt”, “r”);

fgets(all_books, BUFSIZE, books_f);

fclose(books_f);
strncat(all_books, “, Beowulf”, BUFSIZE-1);

// safe version: strncat(all_books, “, Beowulf”, BUFSIZE-strlen(dest)-1);
```
Figure 12: A strncat function misuse bug example

In the code above, the all_books buffer can hold a maximum 256 bytes of data. If the books.txt file contains 250 characters, then the remaining space in the buffer before the call to the strncat function is 6 bytes. However, we instruct the function to add up to 255 (BUFSIZE-1) bytes to the end of the all_books buffer. Therefore, a few bytes of the “, Beowulf” string will end up outside the allocated space. What we should do instead is instruct the strncat to add at most 5 bytes (leaving 1 byte for the terminating \x00).

There is a similar built-in query with ID cpp/unsafe-strncat, but it doesn’t work with constant sizes.

Missing null terminator bug in C

Both C and C++ allow developers to construct fixed-size strings with an initialization literal. If the length of the literal is greater than or equal to the allocated buffer size, then the literal is truncated and the terminating null byte is not appended to the string.

char b1[18] = "The Road Goes Ever On";  // missing null byte, warning
char b2[13] = "Ancrene Wisse";  // missing null byte, NO WARNING
char b3[] = "Farmer Giles of Ham"; // correct initialization
char b4[3] = {'t', 'o', 'b'} // not a string, lack of null byte is expected

Figure 13: Example initializations of C strings

Interestingly, C compilers warn against initializers longer than the buffer size, but don’t raise alarms for initializers of a length equal to the buffer size—even though neither of the resulting strings are null-terminated. C++ compilers return errors for both cases.

The tob/cpp/no-null-terminator query uses data flow analysis to find incorrectly initialized strings passed to functions expecting a null-terminated string. Such function calls result in out-of-bounds read or write vulnerabilities.

CodeQL: past, present, and future

This will be a continuing project from Trail of Bits, so be on the lookout for more soon! One of our most valuable developments is our expertise in automated bug finding. This new CodeQL repository, the Semgrep rules, and the Automated Testing Handbook are key methods to helping others benefit from our work. Please use these resources and report any issues or improvements to them!

If you’d like to read more about our work on CodeQL, we have used its capabilities in several ways, such as detecting iterator invalidations, identifying unhandled errors, and uncovering divergent representations.

Contact us if you’re interested in customizing CodeQL queries for your project.

Article Link: Publishing Trail of Bits’ CodeQL queries | Trail of Bits Blog