Catching OpenSSL misuse using CodeQL

By Damien Santiago

I’ve created five CodeQL queries that catch potentially potent bugs in the OpenSSL libcrypto API, a widely adopted but often unforgiving API that can be misused to cause memory leaks, authentication bypasses, and other subtle cryptographic issues in implementations. These queries—which I developed during my internship with my mentors, Fredrik Dahlgren and Filipe Casal—help prevent misuse by ensuring proper key handling and entropy initialization and checking if bignums are cleared.

To run our queries on your own codebase, you must first download them from the repository using the following command:

codeql pack download trailofbits/cpp-queries

To run the queries on a pre-generated C or C++ database using the CodeQL CLI, simply pass the name of the query pack to the tool as follows:

codeql database analyze database.db \
    --format=sarif-latest        \
    --output=./tob-cpp.sarif -- trailofbits/cpp-queries

Now, with that out of the way, let’s dig into the actual queries I wrote during my internship.

Oh no, not my keys!

Using a too-short key when initializing a cipher using OpenSSL can lead to a serious problem: the OpenSSL API will still accept this key as valid and simply read out of bounds when the cipher is initialized, potentially initializing the cipher with a weak key and leaving your data vulnerable. For this reason, we decided to make a query that tested for too-short keys by checking the key size against the algorithm being used. Fortunately for us, OpenSSL uses a naming scheme that makes it easy to implement this query. (More on that later.)

Below is the definition of the function EVP_EncryptInit_ex, which is used to initialize a new symmetric cipher.

Notice how the function takes a key as the fourth argument. With this in mind, we can use CodeQL to define a Key type in CodeQL using data flow analysis. If there is data flow from a variable into the key parameter of EVP_EncryptInit_ex, the variable most likely represents a key (or, at the very least, is used as one). Thus, we can define what a key is using CodeQL as follows:

Here, we use data flow to ensure that the key flows into the key parameter of a call to EVP_EncryptInit_ex. This works since the statement containing the cast will evaluate to true only if init satisfies the CodeQL definition of EVP_EncryptInit_ex (i.e., if it represents a call to a function with the name EVP_EncryptInit_ex). The call to getKey() simply returns the position of the key parameter in the call to EVP_EncryptInit_ex.

Next, we need to be able to evaluate the size of a key using CodeQL. In order to check if a given key has the correct size, we need to know two things: the size of the key and the key size of the cipher the key is passed to. Obtaining the size of the key is simple, as Codeql has a getSize() predicate that returns the size of the type in bytes. The call to getUnderlyingType() is used to resolve typedefs and get the underlying type of the key.

Now, we need to identify what the size of the key should be. This clearly depends on which cipher that is used. However, CodeQL doesn’t know what a cipher is. In OpenSSL, each cipher exposed by the high-level EVP API is an instance of the type EVP_CIPHER, and each cipher is initialized using a particular function from the API. For example, if we want to use AES-256 in CBC-mode, we pass an instance of EVP_CIPHER returned from EVP_aes_256_cbc() to EVP_EncryptInit_ex. Since the API name contains the name of the cipher, we can use the getName() and matches() predicates in CodeQL to compare the names of function calls to patterns in the names of the ciphers.

Since the cipher is given by (the return value of) a function call, and we want to match against the name of the target function, we need to use getTarget() to get the underlying target of the call. To constrain the key size of the cipher, we add a field for the key size and constrain the value of the field in the constructor.

Next, we need to check if the key passed to the cipher is equal to the expected size. However, we have to be careful and check that the cipher we’re comparing against is actually used together with the key, as opposed to grabbing some random cipher instance from the codebase. Let’s first define a member predicate on the Key type that checks the size of the key against the key size of a given cipher.

As we have noted, this predicate does not restrict the cipher to ensure that the key is used together with the cipher. Let’s add another predicate to Key that can be used to obtain all ciphers that the key is used together with. This means that the cipher is passed as a parameter in the call to EVP_EncryptInit_ex where the key is used. (Note that the key may be used with different ciphers in different locations in the codebase.)

That’s it! The final query, as well as a small test case to demonstrate how the Key and EVP_CIPHER types work, can be found on GitHub.

My engine’s falling apart!

OpenSSL 1.1.1 supports dynamic loading of cryptographic modules called engines at runtime. This can be used to load custom algorithms not implemented by the library or to interface with hardware. However, to be able to use an engine, it must first be initialized, which requires the user to call a few different functions in a specific order. First, you must select an engine to load, call the engine initialization function, and then set the mode of operation for the engine. Failing to initialize the engine could potentially lead to invalid outputs or segmentation faults. Failing to set the engine as the default could mean that a different implementation is used by OpenSSL. To create a query to detect if a loaded engine is properly initialized, we decided to use data flow to check if the correct functions were called to initialize the loaded engine.

After reading the documentation on the OpenSSL engine API, it seems that the API user can create an engine object in a few different ways. We decided to write a CodeQL class that simultaneously captured the four different functions a user could use to load a new engine. (These functions either create a new unselected instance, create a new instance selected by ID, or select an engine from a list using “previous” and “next” style function names.)

Next, we needed to check that the user initialized the newly created engine object using ENGINE_init, which takes the engine object as a parameter. Not only does this function initialize the engine, it also performs error checking to make sure the engine is working properly. As a result, it’s important that the user does not forget to call this function.

The third and final function that the user needs to call is ENGINE_set_default, which is used to register the engine as the default implementation of the specified algorithms. Engine_set_default takes an engine and a flag parameter. We create a CodeQL type that represents this function ENGINE_init above.

Now that we have defined the functions used to initialize a new engine using CodeQL, we need to define what the corresponding data flow should look like. We want to make sure that data flows from CreateEngine to ENGINE_init and ENGINE_set_default.

To finalize this query and put it all together, we flag if a loaded engine is not passed to either ENGINE_init or ENGINE_set_default. The complete query and a corresponding test case can be found on GitHub.

Moving forward

The OpenSSL libcrypto API is full of sharp edges that could create problems for developers. As with every cryptographic implementation, the smallest of mistakes can lead to serious vulnerabilities. Tools such as CodeQL help shine a light on these issues by allowing developers and code reviewers the opportunity to build and share queries to secure their code. I invite you not only to try out our queries found in our GitHub repository (which also contains additional queries for both Go and C++), but to open your IDE of choice and create some of your own amazing queries!

Article Link: Catching OpenSSL misuse using CodeQL | Trail of Bits Blog