We build X.509 chains so you don’t have to

MalBot · January 25, 2024, 2:50pm

By William Woodruff

For the past eight months, Trail of Bits has worked with the Python Cryptographic Authority to build cryptography-x509-verification, a brand-new, pure-Rust implementation of the X.509 path validation algorithm that TLS and other encryption and authentication protocols are built on. Our implementation is fast, standards-conforming, and memory-safe, giving the Python ecosystem a modern alternative to OpenSSL’s misuse- and vulnerability-prone X.509 APIs for HTTPS certificate verification, among other protocols. This is a foundational security improvement that will benefit every Python network programmer and, consequently, the internet as a whole.

Our implementation has been exposed as a Python API and is included in Cryptography’s 42.0.0 release series, meaning that Python developers can take advantage of it today! Here’s an example usage, demonstrating its interaction with certifi as a root CA bundle:

As part of our design we also developed x509-limbo, a test vector and harness suite for evaluating the standards conformance and consistent behavior of various X.509 path validation implementations. x509-limbo is permissively licensed and reusable, and has already found validation differentials across Go’s crypto/x509, OpenSSL, and two popular pre-existing Rust X.509 validators.

X.509 path validation

X.509 and path validation are both too expansive to reasonably summarize in a single post. Instead, we’ll grossly oversimplify X.509 to two basic facts:

X.509 is a certificate format: it binds a public key and some metadata for that key (what it can be used for, the subject it identifies) to a signature, which is produced by a private key. The subject of a certificate can be a domain name, or some other relevant identifier.
Verifying an X.509 certificate entails obtaining the public key for its signature, using that public key to check the signature, and (finally) validating the associated metadata against a set of validity rules (sometimes called an X.509 profile). In the context of the public web, there are two profiles that matter: RFC 5280 and the CA/B Forum Baseline Requirements (“CABF BRs”).

These two facts make X.509 certificates chainable: an X.509 certificate’s signature can be verified by finding the parent certificate containing the appropriate public key; the parent, in turn, has its own parent. This chain building process continues until an a priori trusted certificate is encountered, typically because of trust asserted in the host OS itself (which maintains a pre-configured set of trusted certificates).

Chain building (also called “path validation”) is the cornerstone of TLS’s authentication guarantees: it allows a web server (like x509-limbo.com) to serve an untrusted “leaf” certificate along with zero or more untrusted parents (called intermediates), which must ultimately chain to a root certificate that the connecting client already knows and trusts.

As a visualization, here is a valid certificate chain for x509-limbo.com, with arrows representing the “signed by” relationship:

In this scenario, x509-limbo.com serves us two initially untrusted certificates: the leaf certificate for x509-limbo.com itself, along with an intermediate (Let’s Encrypt R3) that signs for the leaf.

The intermediate in turn is signed for by a root certificate (ISRG Root X1) that’s already trusted (by virtue of being in our OS or runtime trust store), giving us confidence in the complete chain, and thus the leaf’s public key for the purposes of TLS session initiation.

What can go wrong?

The above explanation of X.509 and path validation paints a bucolic picture: to build the chain, we simply iterate through our parent candidates at each step, terminating on success once we reach a root of trust or with failure upon exhausting all candidates. Simple, right?

Unfortunately, the reality is far messier:

The abstraction above (“one certificate, one public key”) is a gross oversimplification. In reality, a single public key (corresponding to a single “logical” issuing authority) may have multiple “physical” certificates, for cross-issuance purposes.
Because the trusted set is defined by the host OS or language runtime, there is no “one true” chain for a given leaf certificate. In reality, most (leaf, [intermediates]) tuples have several candidate solutions, of which any is a valid chain.
- This is the “why” for the first bullet: a web server can’t guarantee that any particular client has any particular set of trusted roots, so intermediate issuers typically have multiple certificates for a single public key to maximize the likelihood of a successfully built chain.
Not all certificates are made equal: certificates (including different “physical” certificates for the same “logical” issuing authority) can contain constraints that prevent otherwise valid paths: name restrictions, overall length restrictions, usage restrictions, and so forth. In other words, a correct path building implementation must be able to backtrack after encountering a constraint that eliminates the current candidate chain.
The X.509 profile itself can impose constraints on both the overall chain and its constituent members: the CABF BRs, for example, forbid known-weak signature algorithms and public key types, and many path validation libraries additionally allow users to constrain valid chain constructions below a configurable maximum length.

In practice, these (non-exhaustive) complications mean that our simple recursive linear scan for chain building is really a depth-first graph search with both static and dynamic constraints. Failing to treat it as such has catastrophic consequences:

Failing to implement a dynamic search typically results in overly conservative chain constructions, sometimes with Internet-breaking outcomes. OpenSSL 1.0.x’s inability to build the “chain of pain” in 2020 is one recent example of this.
Failing to honor the interior constraints and profile-wide certificate requirements can result in overly permissive chain constructions. CVE-2021-3450 is one recent example of this, causing some configurations of OpenSSL 1.1.x to accept chains built with non-CA certificates.

Consequently, building both correct and maximal (in the sense of finding any valid chain) X.509 path validator is of the utmost importance, both for availability and security.

Quirks, surprises, and ambiguities

Despite underpinning the Web PKI and other critical pieces of Internet infrastructure, there are relatively few independent implementations of X.509 path validation: most platforms and languages reuse one of a small handful of common implementations (OpenSSL and its forks, NSS, Go’s crypto/x509, GnuTLS, etc.) or the host OS’s implementation (CryptoAPI on Windows, Security on macOS). This manifests as a few recurring quirks and ambiguities:

A lack of implementation diversity means that mistakes and design decisions (such as overly or insufficiently conservative profile checks) leak into other implementations: users complain when a PKI deployment that was only tested on OpenSSL fails to work against crypto/x509, so implementations frequently bend their specification adherence to accommodate real-world certificates.
The specifications often mandate surprising behavior that (virtually) no client implements correctly. RFC 5280, for example, stipulates that path length and name constraints do not apply to self-issued intermediates, but this is widely ignored in practice.
Because the specifications themselves are so infrequently interpreted, they contain still-unresolved ambiguities: treating roots as “trust anchors” versus policy-bearing certificates, handling of serial numbers that are 20 bytes long but DER-encoded with 21 bytes, and so forth.

Our implementation needed to handle each of these families of quirks. To do so consistently, we leaned on three basic strategies:

Test first, then implement: To give ourselves confidence in our designs, we built x509-limbo and pre-validated it against other implementations. This gave us both a coverage baseline for our own implementation, and empirical justification for relaxing various policy-level checks, where necessary.
Keep everything in Rust: Rust’s performance, strong type system and safety properties meant that we could make rapid iterations to our design while focusing on algorithmic correctness rather than memory safety. It certainly didn’t hurt that PyCA Cryptography’s X.509 parsing is already done in Rust, of course.
Obey Sleevi’s Laws: Our implementation treats path construction and path validation as a single unified step with no “one” true chain, meaning that the entire graph is always searched before giving up and returning a failure to the user.
Compromise where necessary: As mentioned above, implementations frequently maintain compatibility with OpenSSL, even where doing so violates the profiles defined in RFC 5280 and the CABF BRs. This situation has improved dramatically over the years (and improvements have accelerated in pace, as certificate issuance periods have shortened on the Web PKI), but some compromises are still necessary.

Looking forward

Our initial implementation is production-ready, and comes in at around 2,500 lines of Rust, not counting the relatively small Python-only API surfaces or x509-limbo:

From here, there’s much that could be done. Some ideas we have include:

Expose APIs for client certificate path validation. To expedite things, we’ve focused the initial implementation on server validation (verifying that a leaf certificate attesting to a specific DNS name or IP address chains up to a root of trust). This ignores client validation, wherein the client side of a connection presents its own certificate for the server to verify against a set of known principals. Client path validation shares the same fundamental chain building algorithm as server validation, but has a slightly different ideal public API (since the client’s identity needs to be matched against a potentially arbitrary number of identities known to the server).
Expose different X.509 profiles (and more configuration knobs). The current APIs expose very little configuration; the only things a user of the Python API can change are the certificate subject, the validation time, and the maximum chain depth. Going forward, we’ll look into exposing additional knobs, including pieces of state that will allow users to perform verifications with the RFC 5280 certificate profile and other common profiles (like Microsoft’s Authenticode profile). Long term, this will help bespoke (such as corporate) PKI use cases to migrate to Cryptography’s X.509 APIs and lessen their dependency on OpenSSL.
Carcinize existing C and C++ X.509 users. One of Rust’s greatest strengths is its native, zero-cost compatibility with C and C++. Given that C and C++ implementations of X.509 and path validation have historically been significant sources of exploitable memory corruption bugs, we believe that a thin “native” wrapper around cryptography-x509-verification could have an outsized positive impact on the security of major C and C++ codebases.
Spread the gospel of x509-limbo. x509-limbo was an instrumental component in our ability to confidently ship an X.509 path validator. We’ve written it in such a way that should make integration into other path validation implementations as simple as downloading and consuming a single JSON file. We look forward to helping other implementations (such as rustls-webpki) integrate it directly into their own testing regimens!

If any of these ideas interests you (or you have any of your own), please get in touch! Open source is key to our mission at Trail of Bits, and we’d love to hear about how we can help you and your team take the fullest advantage of and further secure the open-source ecosystem.

Acknowledgments

This work required the coordination of multiple independent parties. We would like to express our sincere gratitude to each of the following groups and individuals:

The Sovereign Tech Fund, whose vision for OSS security and funding made this work possible.
The PyCA Cryptography maintainers (Paul Kehrer and Alex Gaynor), who scoped this work from the very beginning and offered constant feedback and review throughout the development process.
The BetterTLS development team, who both reviewed and merged patches that enabled x509-limbo to vendor and reuse their (extensive) testsuite.

Article Link: We build X.509 chains so you don’t have to | Trail of Bits Blog