Enhancing trust for SGX enclaves

MalBot · January 26, 2024, 2:50pm

By Artur Cygan

Creating reproducible builds for SGX enclaves used in privacy-oriented deployments is a difficult task that lacks a convenient and robust solution. We propose using Nix to achieve reproducible and transparent enclave builds so that anyone can audit whether the enclave is running the source code it claims, thereby enhancing the security of SGX systems.

In this blog post, we will explain how we enhanced trust for SGX enclaves through the following steps:

Analyzed reproducible builds of Signal and MobileCoin enclaves
Analyzed a reproducible SGX SDK build from Intel
Packaged the SGX SDK in Nixpkgs
Prepared a reproducible-sgx repository demonstrating how to build SGX enclaves with Nix

Background

Introduced in 2015, Intel SGX is an implementation of confidential (or trusted) computing. More specifically, it is a trusted execution environment (TEE) that allows users to run confidential computations on a remote computer owned and maintained by an untrusted party. Users trust the manufacturer (Intel, in this case) of a piece of hardware (CPU) to protect the execution environment from tampering, even by the highest privilege–level code, such as kernel malware. SGX code and data live in special encrypted and authenticated memory areas called enclaves.

During my work at Trail of Bits, I observed a poorly addressed trust gap in systems that use SGX, where the user of an enclave doesn’t necessarily trust the enclave author. Instead, the user is free to audit the enclave’s open-source code to verify its functionality and security. This setting can be observed, for instance, in privacy-oriented deployments such as Signal’s contact discovery service or MobileCoin’s consensus protocol. To validate trust, the user must check whether the enclave was built from trusted code. Unfortunately, this turns out to be a difficult task because the builds tend to be difficult to reproduce and rely on a substantial amount of precompiled binary code. In practice, hardly anyone verifies the builds and has no option but to trust the enclave author.

To give another perspective—a similar situation happens in the blockchain world, where smart contracts are deployed as bytecode. For instance, Etherscan will try to reproduce on-chain EVM bytecode to attest that it was compiled from the claimed Solidity source code. Users are free to perform the same operation if they don’t trust Etherscan.

A solution to this problem is to build SGX enclaves in a reproducible and transparent way so that multiple parties can independently arrive at the same result and audit the build for any supply chain–related issues. To achieve this goal, I helped port Intel’s SGX SDK to Nixpkgs, which allows building SGX enclaves with the Nix package manager in a fully reproducible way so any user can verify that the build is based on trusted code.

To see how reproducible builds complete the trust chain, it is important to first understand what guarantees SGX provides.

How does an enclave prove its identity?

Apart from the above mentioned TEE protection (nothing leaks out and execution can’t be altered), SGX can remotely prove the enclave’s identity, including its code hash, signature, and runtime configuration. This feature is called remote attestation and can be a bit foreign for someone unfamiliar with this type of technology.

When an enclave is loaded, its initial state (including code) is hashed by the CPU into a measurement hash, also known as MRENCLAVE. The hash changes only if the enclave’s code changes. This hash, along with other data such as the signer and environment details, is placed in a special memory area accessible only to the SGX implementation. The enclave can ask the CPU to produce a report containing all this data, including a piece of enclave-defined data (called report_data),and then passes it to the special Intel-signed quoting enclave to sign the report (called a quote from now on) so that it can be delivered to the remote party and verified.

Next, the verifier checks the quote’s authenticity with Intel and the relevant information from the quote. Although there are a few additional checks and steps at this point, in our case, the most important thing to check is the measurement hash, which is a key component of trust verification.

What do we verify the hash against? The simplest solution is to hard code a trusted MRENCLAVE value into the client application itself. This solution is used, for instance, by Signal, where MRENCLAVE is placed in the client’s build config and verified against the hash from the signed quote sent by the Signal server. Bundling the client and MRENCLAVE makes sense; after all, we need to audit and trust the client application code too. The downside is that the client application has to be rebuilt and re-released when the enclave code changes. If the enclave modifications are expected to be frequent or if it is important to quickly move clients to another enclave—for instance, in the event of a security issue—clients can use a more dynamic approach and fetch MRENCLAVE values from trusted third parties.

Secure communication channel

SGX can prove the identity of an enclave and a piece of report_data that was produced by it, but it’s up to the enclave and verifier to establish a trusted and secure communication channel. Since SGX enclaves are flexible and can freely communicate with the outside world over the network through ECALLS and OCALLs, SGX itself doesn’t impose any specific protocol or implementation for the channel. The enclave developer is free to decide, as long as the channel is encrypted, is authenticated, and terminates inside the enclave.

For instance, the SGX SDK implements an example of an authenticated key exchange scheme for remote attestation. However, the scheme assumes a DRM-like system where the enclave’s signer is trusted and the server’s public key is hard coded in the enclave’s code, so it’s unsuitable for use in a privacy-oriented deployment of SGX such as Signal.

If we don’t trust the enclave’s author, we can leverage the report_data to establish such a channel. This is where the SGX guarantees essentially end, and from now on, we have to trust the enclave’s source code to do the right thing. This fact is not obvious at first but becomes evident if we look, for instance, at the RA-TLS paper on how to establish a secure TLS channel that terminates inside an enclave:

The enclave generates a new public-private RA-TLS key pair at every startup. The RA-TLS key need not be persisted since generating a fresh key on startup is reasonably cheap. Not persisting the key reduces the key’s exposure and avoids common problems related to persistence such as state rollback protection. Interested parties can inspect the source code to convince themselves that the key is never exposed outside of the enclave.

To maintain the trust chain, RA-TLS uses the report_data from the quote that commits to the enclave’s public key hash. A similar method can be observed in the Signal protocol implementing Noise Pipes and committing to the handshake hash in the report_data.

SGX encrypts and authenticates the enclave’s memory, but it’s up to the code running in the enclave to protect the data. Nothing stops the enclave code from disclosing any information to the outside world. If we don’t know what code runs in the enclave, anything can happen.

Fortunately, we know the code because it’s open source, but how do we make sure that the code at a particular Git commit maps to the MRENCLAVE hash an enclave is presenting? We have to reproduce the enclave build, calculate its MRENCLAVE hash, and compare it with the hash obtained from the quote. If the build can’t be reproduced, our remaining options are either to trust someone who confirmed the enclave is safe to use or to audit the enclave’s binary code.

Why are reproducible builds hard?

The reproducibility type we care about is bit-for-bit reproducibility. Some software might be semantically identical despite minor differences in their artifacts. SGX enclaves are built into .dll or .so files using the SGX SDK and must be signed with the author’s RSA key. Since we calculate hashes of artifacts, even a one-bit difference will produce a different hash. We might get away with minor differences, as the measurement process omits some details from the enclave executable file (such as the signer), but having full file reproducibility is desirable. This is a non-trivial task and can be implemented in multiple ways.

Both Signal and MobileCoin treat this task seriously and aim to provide a reproducible build for their enclaves. For example, Signal claims the following:

The enclave code builds reproducibly, so anyone can verify that the published source code corresponds to the MRENCLAVE value of the remote enclave.

The initial version of Signal’s contact discovery service build (archived early 2023) is based on Debian and uses a .buildinfo file to lock down the system dependencies; however, locking is done based on versions rather than hashes. This is a limitation of Debian, as we read on the BuildinfoFiles page. The SGX SDK and a few other software packages are built from sources fetched without checking the hash of downloaded data. While those are not necessarily red flags, more trust than necessary is placed in third parties (Debian and GitHub).

From the README, it is unclear how the .buildinfo file is produced because there is no source for the mentioned derebuild.pl script. Most likely, the .buildinfo file is generated during the original build of the enclave’s Debian package and checked into the repository. It is unclear whether this mechanism guarantees capture of all the build inputs and doesn’t let any implicit dependencies fall through the cracks. Unfortunately, I couldn’t reproduce the build because both the Docker and Debian instructions from the README failed, and shortly after that, I noticed that Signal moved to a new iteration of the contact discovery service.

The current version of Signal’s contact discovery service build is slightly different. Although I didn’t test the build, it’s based on a Docker image that suffers from similar issues such as installing dependencies from a system package manager with network access, which doesn’t guarantee reproducibility.

Another example is MobileCoin, which provides a prebuilt Docker image with a build environment for the enclave. Building the same image from Dockerfile most likely won’t result in a reproducible hash we can validate, so the image provided by MobileCoin must be used to reproduce the enclave. The problem here is that it’s quite difficult to audit Docker images that are hundreds of megabytes large, and we essentially need to trust MobileCoin that the image is safe.

Docker is a popular choice for reproducing environments, but it doesn’t come with any tools to support bit-for-bit reproducibility and instead focuses on delivering functionally similar environments. A complex Docker image might reproduce the build for a limited time, but the builds will inevitably diverge, if no special care is taken, due to filesystem timestamps, randomness, and unrestricted network access.

Why Nix can do it better

Nix is a cross-platform source-based package manager that features the Nix language to describe packages and a large collection of community-maintained packages called Nixpkgs. NixOS is a Linux distribution built on top of Nix and Nixpkgs, and is designed from the ground up to focus on reproducibility. It is very different from the conventional package managers. For instance, it doesn’t install anything into regular system paths like /bin or /usr/lib. Instead, it uses its own /nix/store directory and symlinks to the packages installed there. Every package is prefixed with a hash capturing all the build inputs like dependency graph or compilation options. This means that it is possible to have the same package installed in multiple variants differing only by build options; from Nix’s perspective, it is a different package.

Nix does a great job at surfacing most of the issues that could render the build unreproducible. For example, a Nix build will most likely break during development when an impurity (i.e., a dependency that is not explicitly declared as input to the build) is encountered, forcing the developer to fix it. Impurities are often captured from the environment, which includes environment variables or hard-coded system-wide directories like /usr/lib. Nix aims to address all those issues by sandboxing the builds and fixing the filesystem timestamps. Nix also requires all inputs that are fetched from the network to be pinned. On top of that, Nixpkgs contain many patches (gnumake, for instance) to fix reproducibility issues in common software such as compilers or build systems.

Reducing impurities increases the chance of build reproducibility, which in turn increases the trust in source-to-artifact correspondence. However, ultimately, reproducibility is not something that can be proven or guaranteed. Under the hood, a typical Nix build runs compilers that could rely on some source of randomness that could leak into the compiled artifacts. Ideally, reproducibility should be tracked on an ongoing basis. An example of such a setup is the r13y.com site, which tracks reproducibility of the NixOS image itself.

Apart from strong reproducibility properties, Nix also shines when it comes to dependency transparency. While Nix caches the build outputs by default, every package can be built from source, and the dependency graph is rooted in an easily auditable stage0 bootstrap, which reduces trust in precompiled binary code to the minimum.

Issues in Intel’s use of Nix

Remember the quoting enclave that signs attestation reports? To deliver all SGX features, Intel needed to create a set of privileged architectural enclaves, signed by Intel, that perform tasks too complex to implement in CPU microcode. The quoting enclave is one of them. These enclaves are a critical piece of SGX because they have access to hardware keys burned into the CPU and perform trusted tasks such as remote attestation. However, a bug in the quoting enclave’s code could invalidate the security guarantees of the whole remote attestation protocol.

Being aware of that, Intel prepared a reproducible Nix-based build that builds SGX SDK (required to build any enclave) and all architectural enclaves. The solution uses Nix inside a Docker container. I was able to reproduce the build, but after a closer examination, I identified a number of issues with it.

First, the build doesn’t pin the Docker image or the SDK source hashes. The SDK can be built from source, but the architectural enclaves build downloads a precompiled SDK installer from Intel and doesn’t even check the hash. Although Nix is used, there are many steps that happen outside the Nix build.

The Nix part of the build is unfortunately incorrect and doesn’t deliver much value. The dependencies are hand picked from the prebuilt cache, which circumvents the build transparency Nix provides. The build runs in a nix-shell that should be used only for development purposes. The shell doesn’t provide the same sandboxing features as the regular Nix build and allows different kinds of impurities. In fact, I discovered some impurities when porting the SDK build to Nixpkgs. Some of those issues were also noticed by another researcher but remain unaddressed.

Bringing SGX SDK to Nixpkgs

I concluded that the SGX SDK should belong to Nixpkgs to achieve truly reproducible and transparent enclave builds. It turned out there was already an ongoing effort, which I joined and helped finish. The work has been expanded and maintained by the community since then. Now, any SGX enclave can be easily built with Nix by using the sgx-sdk package. I hope that once this solution matures, Nixpkgs maintainers can maintain it together with Intel and bring it into the official SGX SDK repository.

We prepared the reproducible-sgx GitHub repository to show how to build Intel’s sample enclaves with Nix and the ported SDK. While this shows the basics, SGX enclaves can be almost arbitrarily complex and use different libraries and programming languages. If you wish to see another example, feel free to open an issue or a pull request.

In this blog post, we discussed only a slice of the possible security issues concerning SGX enclaves. For example, numerous security side-channel attacks have been demonstrated on SGX, such as the recent attack on Blu-ray DRM. If you need help with security of a system that uses SGX or Nix, don’t hesitate to contact us.

Resources

Article Link: Enhancing trust for SGX enclaves | Trail of Bits Blog