Releasing the Attacknet: A new tool for finding bugs in blockchain nodes using chaos testing

MalBot · March 18, 2024, 1:30pm

By Benjamin Samuels (@thebensams)

Today, Trail of Bits is publishing Attacknet, a new tool that addresses the limitations of traditional runtime verification tools, built in collaboration with the Ethereum Foundation. Attacknet is intended to augment the EF’s current test methods by subjecting their execution and consensus clients to some of the most challenging network conditions imaginable.

Blockchain nodes must be held to the highest level of security assurance possible. Historically, the primary tools used to achieve this goal have been exhaustive specification, tests, client diversity, manual audits, and testnets. While these tools have traditionally done their job well, they collectively have serious limitations that can lead to critical bugs manifesting in a production environment, such as the May 2023 finality incident that occurred on Ethereum mainnet. Attacknet addresses these limitations by subjecting devnets to a much wider range of network conditions and misconfigurations than is possible on a conventional testnet.

How Attacknet works

Attacknet uses chaos engineering, a testing methodology that proactively injects faults into a production environment to verify that the system is tolerant to certain failures. These faults reproduce real-world problem scenarios and misconfigurations, and can be used to create exaggerated scenarios to test the boundary conditions of the blockchain.

Attacknet uses Chaos Mesh to inject faults into a devnet environment generated by Kurtosis. By building on top of Kurtosis and Chaos Mesh, Attacknet can create various network topologies with ensembles of different kinds of faults to push a blockchain network to its most extreme edge cases.

Some of the faults include:

Clock skew, where a node’s clock is skewed forwards or backwards for a specific duration. Trail of Bits was able to reproduce the Ethereum finality incident using a clock skew fault, as detailed in our TrustX talk last year.
Network latency, where a node’s connection to the network (or its corresponding EL/CL client) is delayed by a certain amount of time. This fault can help reproduce global latency conditions or help detect unintentional synchronicity assumptions in the blockchain’s consensus.
Network partition, where the network is split into two or more halves that cannot communicate with each other. This fault can test the network’s fork choice rule, ability to re-org, and other edge cases.
Network packet drop/corruption, where gossip packets are dropped or have their contents corrupted by a certain amount. This fault can test a node’s gossip validation and test the robustness of the network under hostile network conditions.
Forced node crashes/offlining, where a certain client or type of client is ungracefully shut down. This fault can test the network’s resilience to validator inactivity, and test the ability of clients to re-sync to the network.
I/O disk faults/latency, where a certain amount of latency or error rate is applied to all I/O operations a node makes. This fault can help profile nodes to understand their resource requirements, as I/O is often the largest limiting factor of node performance.

Once the fault concludes, Attacknet performs a battery of health checks against each node in the network to verify that they were able to recover from the fault. If all nodes recover from the fault, Attacknet moves on to the next configured fault. If one or more nodes fail health checks, Attacknet will generate an artifact of logs and test information to allow debugging.

Future work

In this first release, Attacknet supports two run modes: one with a manually configured network topology and fault parameters, and a “planner mode” where a range of faults are run against a specific client with loosely defined topology parameters. In the future, we plan on adding an “Exploration mode” that will dynamically define fault parameters, inject them, and monitor network health repeatedly, similar to a fuzzer.

Attacknet is currently being used to test the Dencun hard fork, and is being regularly updated to improve coverage, performance, and debugging UX. However, Attacknet is not an Ethereum-specific tool, and was designed to be modular and easily extended to support other types of chains with drastically different designs and topologies. In the future, we plan on extending Attacknet to target other chains, including other types of blockchain systems such as L2s.

If you’re interested in integrating Attacknet with your chain/L2’s testing process, please contact us.

Article Link: Releasing the Attacknet: A new tool for finding bugs in blockchain nodes using chaos testing | Trail of Bits Blog