A Merkle Tree, central to blockchain technology, is a sophisticated data structure that forms the backbone of data verification and integrity in blockchain networks. Patented by Ralph Merkle in 1979, this tree-like structure organizes data in a manner that ensures efficient and secure verification of large data volumes. Each leaf node in a Merkle Tree is a hash of individual data blocks, encapsulating transaction details. Non-leaf nodes are hashes of their children’s hashes, creating a network of verifiable data that leads up to a single hash at the tree’s root. This architecture is a key component in prominent blockchain platforms like Bitcoin and Ethereum, emphasizing its vital role in digital ledger technology.
OpenZeppelin introduced many libraries to Solidity developers, many of which have become standard tools in the industry. Among these libraries, Merkle Trees are incorporated into various contracts, including the MerkleProof.sol library. This library verifies whether a given element is present in the Merkle tree through provided proofs.
In this article, we want to concentrate on enhancing the efficiency of using the MerkleProof library.
There is an issue of unhashed leaves, highlighted by OpenZeppelin:
It causes an intermediate node to be presented as leaves.
The root cause of the problem lies in the scenario where leaves are unhashed 32-byte values (LeafA, LeafB, LeafC, LeafD
). In this case, internal nodes (Hash_AB, Hash_CD
) can be passed as leaves to the proof verification function, even though they are not actual leaves.
To understand the nature of this issue, let’s examine this example of Merkle Tree structure:
In this Merkle Tree:
LeafA
, LeafB
, LeafC
, and LeafD
represent the actual data elements.Hash_AB
and Hash_CD
are computed by hashing the concatenation of LeafA
and LeafB
, and LeafC
and LeafD
respectively.Hash_AB
and Hash_CD
.The leaves are unhashed 32-byte values, and the internal nodes Hash_AB
and Hash_CD
could inadvertently be passed as leaves to the proof verification function, leading to incorrect verification outcomes.
To further explain the root cause of this issue, let’s define an example function that implements the MerkleProof.verify()
function from OpenZeppellin, which does not hash the given leaf parameter. For example:
This claim()
function exemplifies a common use case of the MerkleProof library.
However, an issue arises in the claim()
function: it passes unhashed leaves to the verify()
function, as the OpenZeppelin MerkleProof library does not internally apply hashing. This can lead to incorrect verification outcomes. For instance, if LeafA is a 32-byte unhashed data, the leaf parameter could be misidentified as an intermediate node, compromising the accuracy of verification. In our example of the Merkle tree, [Hash_AB]
and [Hash_CD]
could erroneously be identified as leaves, though they are actually intermediate nodes.
To illustrate, suppose we aim to verify the inclusion of LeafA
in the Merkle tree. The following proof might be applied to the MerkleProof.verify()
function as a proof parameter:
To verify that LeafA is included in the Merkle tree, the following intermediate node can be validly applied to MerkleProof.verify()
function as a leaf parameter:
Hash_AB
is not a leaf. It is an intermediate node. This misrepresentation allows for incorrect proof verification using an intermediate node rather than the actual leaves (LeafA, LeafB, LeafC, LeafD
).
To address the issue of incorrect proof verification using intermediate nodes instead of actual leaves in the OpenZeppelin MerkleProof library, we can extend the functionality by implementing additional hashing to the leaves. For example:
In this function:
keccak256()
function to ensure it is presented as a hashed value.verify()
function from the OpenZeppelin MerkleProof library is then called with the hashed leaf parameter.Above is an example of a function written in the MerkleProofExtension library that extends the OpenZeppelin verify()
function and applies additional hashing. If leaves are presented as an unhashed 32-byte value, the function will hash the leaf parameter internally, preventing the misinterpretation of internal nodes as leaves.
This issue, highlighted by the community (Rareskills article), can lead to the intermediate node passing as a leaf value.
If the leaves are produced by hashing 64 bytes, then you may prove non-existing leaves that match some internal nodes
In this example, all leaves are hashed. Unlike our first example, each leaf is formed by concatenating two integer values. For instance, LeafA
is derived as follows:
keccak256(abi.encodePacked(1, 2))
Assume the Merkle Tree is defined by:
Here, LeafA
equates to [1,2]
, and similarly, LeafB, LeafC
, and LeafD
correspond to elements in the pairs list.
Envision a claim() function that incorporates the MerkleProof.verify()
function. This function creates a leaf by hashing the provided ‘one’ and ‘two’ parameters, exemplified as follows:
In scenarios where keccak256(abi.encodePacked(one, two))
is 64 bytes long(uint256 + uint256 = 32 bytes + 32 bytes = 64 bytes) Consequently, an intermediate node may be used to circumvent the verification process. Suppose we wish to confirm that LeafA is in the Merkle tree using the claim function.
To validate LeafA’s presence in the Merkle tree, the following proof parameter can be utilized in the claim function:
[LeafB, Hash_CD]
To verify the proof, we need to provide a leaf parameter, which will be constructed by hashing the provided one and two parameters. In this case, for the same verification, the following one and two parameters can be employed:
To create a leaf parameter, these parameters will be hashed inside the claim()
function using keccak256. However, hashing these values results in the creation of an intermediate node:
[Hash_AB] = keccak256(abi.encodePacked(one, two))
In this instance, [Hash_AB]
is mistakenly provided as a leaf parameter to the MerkleProof.verify().[Hash_AB]
is not a leaf but an intermediate node with a hash value of 64 bytes:
(uint256 + uint256 = 32 bytes + 32 bytes = 64 bytes)
As it matches the 64-byte criterion and represents an intermediate node, the verify function can erroneously be executed with the supplied intermediate node instead of the actual leaf. To counteract this, implementing a safeVerify()
function is advisable. This function incorporates an additional hashing step, ensuring that internal nodes cannot be misinterpreted as leaves. Even if the intermediate node [Hash_AB]
is sent to the function as a leaf parameter, the function’s additional hashing will ensure that [Hash_AB]
, when hashed inside the safeVerify()
function, does not correspond to any intermediate node.
This article has explored the intricate details of Merkle Trees’ use in blockchain implementations, mainly focusing on OpenZeppelin’s MerkleProof.sol library. We provided an overview of how misuse or misunderstanding by developers can lead to vulnerabilities in the proof verification process. Suggested workflows help avoid issues, particularly incorrect verification outcomes and preimage attacks.
Be careful and perform your own research on the code imported to your system.
Be the first to receive our latest company updates, Web3 security insights, and exclusive content curated for the blockchain enthusiasts.
Table of contents
Tell us about your project
26 min read
Insights
8 min read
Insights
3 min read
Insights