Written by

Ciara Nightingale

Published on

August 16, 2024

What is a Merkle Tree, Merkle proof, and Merkle Root

Deep dive into a Merkle Tree's data structure and how Merkle proofs prove data is there. Answer the question: What is a Merkle Tree, Merkle Proof, and Merkle Root?

Table of Contents

What is a Merkle tree, Merkle proof, and Merkle root? Learn what they are, how they work, and how to use them.

By the end of this article, you'll understand what Merkle Trees, Merkle Proofs, and Merkle Roots are, why they are needed, how they work, and how you can build them into your project! Let's get started!

Before getting started, make sure you have a solid understanding of the following:

Knowledge of hash functions
Proficiency with Solidity Smart Contracts

‍

What is a Merkle Tree, Merkle Proof, and Merkle Root?

‍

Why do we need Merkle Trees?

The problem: How can I verify if some piece of arbitrary data is in a group of data? For example, let’s say we have a list of names of the VIPs in our club, how do we determine if a specific name is on the list?

Can’t we just loop (or manually search) through a list of data to check if the specific data we are looking for is present?

This would be challenging, especially if the list of names is extremely large (think thousands). Now, imagine this is an array of strings in a smart contract. Looping through this array would become extremely costly as the array of names grew. At some point, a denial of service could occur, and a name could not be checked due to insufficient gas. Instead, we use a Merkle Tree!

‍

What is a Merkle Tree?

Merkle trees are a data structure, and Merkle proofs are used to prove that some data is in that Merkle tree. They were invented in 1979 by Ralph Merkle, a computer scientist, mathematician, and one of the inventors of public-key cryptography.

But what actually is a Merkle Tree?

Imagine a tree... but upside down.

The leaves are at the bottom, and the root is at the top. Each leaf in our tree is a piece of data, and the branches and the root are built from this data by hashing it together. This is a Merkle tree. By the end of this article, you will understand this diagram:

An inverted tree illustrating the form of a Merkle tree including data strings, hashes, and the Root Hash. — Merkle tree illustration

‍

A leaf node is some hashed data, e.g., the string “Ciara” in the VIP example. Adjacent leaf nodes are hashed together to produce the intermediate nodes. Adjacent intermediate nodes are hashed together recursively until a single hash is produced. This hash is known as the root hash.

Merkle proofs are used to prove that some specific data is present in a group of data (in the Merkle tree). The data can be anything, such as the names of VIPs in a club, and I want to prove my name is on the VIP list.

‍ A Merkle tree is a binary tree where each leaf node represents a hash of some data, and each intermediate node is a hash of its two child nodes. The topmost node is known as the Merkle root.

‍

How do Merkle Trees work?

Let’s break down how to create a Merkle tree:

The individual pieces of data (the string “Ciara”) are hashed to produce the leaf node hashes (using a hashing algorithm like keccak256): some text
1. Hash 1 = Hash(Data 1)
2. Hash 2 = Hash(Data 2)
3. Hash 3 = Hash(Data 3)
4. Hash 4 = Hash(Data 4)

Diagram of a leaf node (Hash1) of a Merkle tree including data string and hash. — Diagram of Hash 1.

Adjacent leaf nodes are then hashed together to produce the intermediate nodes:some text
1. Hash 1-2 = Hash(Hash 1, Hash 2)
2. Hash 3-4 = Hash(Hash 3, Hash 4)

Diagram illustrating adjacent leaf nodes (Hash 1 and Hash 2) combining into Hash 1-2 . — Diagram of Hash 1-2.

‍

Adjacent intermediate nodes are then hashed together to produce the root hash (the expected root hash)some text
1. Root Hash = Hash(Hash 1-2, Hash 3-4)

Diagram illustrating intermediate nodes combining into the expected Root Hash. — Diagram of the Root Hash.

‍

Putting that all together, we can visualize the Merkle tree as:

Diagram of the full Merkle tree including all leaf nodes and intermediate nodes. — Diagram of the full Merkle tree.

‍

To prove that some data is present in the Merkle tree–prove that “Ciara” is part of the VIP club:

The prover, (which, for our example, would be the person attempting to enter the club), provides:some text
1. The data, e.g., for our VIP example, their name, “Ciara.”
2. An array of the other intermediate nodes required to reconstruct the tree and calculate a root hash. This is known as the Merkle proof.

So, to prove the name “Ciara” is in the tree, they would need to provide an array Proof = [Proof 1, Proof 2]

Where:some text
- Proof 1 = Hash 2
- Proof 2 = Hash 3-4.

The hash of the data is then hashed with the first element of the proof, Proof 1, and then that resulting hash is hashed with the second element, Proof 2, to produce a calculated root hash. In our VIP example,some text
1. Hash(“Ciara”) is hashed with the first proof array element: Hash 2
2. The result is then hashed with the second proof element: Hash 3-4, to produce an expected root hash
This calculated root hash is then compared with an expected root hash to check whether the name is in the tree. In our VIP example, only the club owner would have access to this expected root.

Using a secure hash function like keccak256 makes it practically impossible to create a hash collision; the likelihood of finding two different sets of inputs that result in the same hash is so low that it won't happen in practice. This is because when using `keccak256`, the hash of that data is unique for distinct pieces of data. If a matching root hash exists in the Merkle proof, you know this item must have been part of the original root hash calculation.

Diagram of the full Merkle tree with notes and including all leaf nodes and intermediate nodes. — Diagram of the full Merkle tree with notes.

‍

Let’s look at our VIP example in JSON. Below, you can see the inputs. This is the string of the names on the VIP list. The proofs are the two intermediate nodes required to reconstruct the tree; the root is the expected root hash, and the leaf is the leaf node - the hash of the input data, e.g., keccak256(“Ciara”).

[
   {
       "inputs": [
           "Ciara"
       ],
       "proof": [
           "0xfcedff7fd13c323a60b257d4553f350831c1694fd3a953ed9bb5af1abd127fb8",
           "0x7c1a4a44f893dd1de1745304d1b2a6413f937d10c458bdd45ede7c53b87e3a0e"
       ],
       "root": "0x6e10420ea854c1d5c4c2e7ecf3e1f79e6963f43f924a009f5bf6d45da592d1cf",
       "leaf": "0x7c03745f570a678cbf8956b39d092a3653cd96d15b52ee66f408571923aa69b8"
   },
   {
       "inputs": [
           "Sarah"
       ],
       "proof": [
           "0x7c03745f570a678cbf8956b39d092a3653cd96d15b52ee66f408571923aa69b8",
           "0x7c1a4a44f893dd1de1745304d1b2a6413f937d10c458bdd45ede7c53b87e3a0e"
       ],
       "root": "0x6e10420ea854c1d5c4c2e7ecf3e1f79e6963f43f924a009f5bf6d45da592d1cf",
       "leaf": "0xfcedff7fd13c323a60b257d4553f350831c1694fd3a953ed9bb5af1abd127fb8"
   },
   {
       "inputs": [
           "Bob"
       ],
       "proof": [
           "0xe060618ff524040946d843f7db0f20bc36d6823d1dd1adf0cefbbd8c4283eda1",
           "0x68247294d6ac7c96da06755b7e71415e1b46163839ab6a67b9f7cc896369f8f7"
       ],
       "root": "0x6e10420ea854c1d5c4c2e7ecf3e1f79e6963f43f924a009f5bf6d45da592d1cf",
       "leaf": "0xfa584e70da2cdafeba16aae05f550a31fac5d37ea2de2232d2fb02401b0527da"
   },
   {
       "inputs": [
           "Dylan"
       ],
       "proof": [
           "0xfa584e70da2cdafeba16aae05f550a31fac5d37ea2de2232d2fb02401b0527da",
           "0x68247294d6ac7c96da06755b7e71415e1b46163839ab6a67b9f7cc896369f8f7"
       ],
       "root": "0x6e10420ea854c1d5c4c2e7ecf3e1f79e6963f43f924a009f5bf6d45da592d1cf",
       "leaf": "0xe060618ff524040946d843f7db0f20bc36d6823d1dd1adf0cefbbd8c4283eda1"
   }
]

‍

Using Merkle Trees in a smart contract

To implement Merkle trees into a smart contract and verify that some arbitrary data is in a Merkle tree, we can use OpenZeppelin’s `MerkleProof.sol` smart contract. Let’s take a look at how it works:

verify takes the proof, the Merkle root, and the leaf we want to verify. Typically, the root will be stored in the smart contract, and the proof will be calculated off-chain and passed as an argument to a function call.
processProof will iterate through each element in the proof array.
Starting with the leaf node, the computed hash is updated at each step by hashing it with the next element in the proof.some text
1. Hashing both hashes together will always take the smaller value first.
2. OpenZeppelin uses assembly with the mstore and keccak256 opcodes for efficiency rather than simply using keccak256(abi.encodePacked(a, b)).
The computed hash is returned and compared to the expected root to determine whether the provided leaf was present in the Merkle tree.

function verify(bytes32[] memory proof, bytes32 root, bytes32 leaf) internal pure returns (bool) {
    return processProof(proof, leaf) == root;
}

function processProof(bytes32[] memory proof, bytes32 leaf) internal pure returns (bytes32) {
    bytes32 computedHash = leaf;
    for (uint256 i = 0; i < proof.length; i++) {
        computedHash = _hashPair(computedHash, proof[i]);
    }
    return computedHash;
}

function _hashPair(bytes32 a, bytes32 b) private pure returns (bytes32) {
    return a < b ? _efficientHash(a, b) : _efficientHash(b, a);
}

function _efficientHash(bytes32 a, bytes32 b) private pure returns (bytes32 value) {
    /// @solidity memory-safe-assembly
    assembly {
        mstore(0x00, a)
        mstore(0x20, b)
        value := keccak256(0x00, 0x40)
    }
}

‍

For an example of how to use OpenZeppelin’s MerkleProof.sol smart contract, check out the MinimalMerkle example repository.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.24;


import {MerkleProof} from "@openzeppelin/contracts/utils/cryptography/MerkleProof.sol";


contract MinimalMerkle {
   bytes32 immutable public i_root;


   error MinimalMerkle__NotInClub();


   constructor(bytes32 _root) {
       i_root = _root;
   }


   function verifyIfInClub(string memory name, bytes32[] memory proof) public view returns (bool) {
       bytes32 leaf = keccak256(bytes.concat(keccak256(abi.encode(name))));


       if (!MerkleProof.verify(proof, i_root, leaf)) {
           revert MinimalMerkle__NotInClub();
       }


       return true;
   }
}

‍

Generating Merkle Trees & second preimage attacks

Two main libraries can be used to generate Merkle trees:

Both of these libraries hash the input data twice to prevent what is known as a second pre-image attack.

In a second preimage attack, an intermediate node is presented as a leaf node and, therefore, passes the checks since it is present in the Merkle tree.

This is possible if the function accepts 62-byte leaves as input. This means that an attacker can pass two leaf node hashes encoded together as input (before their hashing). This is known as the intermediate node's preimage rather than its hash value. Since hashes in Solidity are 32 bytes long, the result of encoding the two leaf node hashes will be 64 bytes long.

This is mitigated by not accepting leaf node data that is 64 bytes long and/or using a different hash function for the leaf nodes than the intermediate hashes. This can be achieved by hashing the data twice, analogous to using a different hash function. Intermediate hashes are then produced by only hashing once. When we used the two scripts to produce our Merkle trees, the resulting Merkle trees were resistant to preimage attacks because the data to produce the leaf nodes were hashed twice.

For more information, Rareskills published a fantastic article on second preimage attacks.

‍

Real-world Merkle use cases

Airdrops

Rather than using an array or mapping of addresses, Merkle trees can be used in airdrop smart contracts to improve performance and reduce claiming gas costs.

Merkle trees are utilized in efficient airdropping to securely and verifiably distribute tokens to a large number of recipients without needing to store the entire list of recipients on-chain. In this process, the list of eligible recipients and their respective token allocations are hashed to create leaf nodes in a Merkle tree. The root is then published on-chain within the airdrop smart contract. Each recipient can claim their tokens by providing a Merkle proof computed off-chain.

Blockchains

Block headers contain information about the block itself and contain the following fields:

Transaction root: the Keccak hash of the Merkle root of the transaction trie
Receipt root: the Keccak hash of the Merkle root of the transaction receipt trie‍
State root: the Keccak hash of the Merkle root of the state (post-execution) trie

These “roots” act like Merkle roots. The state, transactions, and receipts are all encoded into Merkle Patricia tries, and the roots are contained in the block header!

For the purpose of simplicity, you can think of a Merkle Patricia trie as the same as a Merkle tree, as the hash is dependent on the child nodes, and if any of these children change, the root hash will also change.

For more information, refer to this noxx EVM deep dive article.

Rollups

Rollups are a Layer 2 scaling solution designed to increase the throughput of blockchain networks like Ethereum. They bundle multiple off-chain transactions, and the root hash of the batch is then posted to the main chain. Here’s how Merkle trees are used in rollups:

Batch roots: Transactions are bundled and encoded into a single root hash using a Merkle tree. The batch root comprises all transactions in the batch. To prove a transaction is included in the batch, the transaction details, the batch root, and the Merkle proof showing the inclusion path need to be provided.
State roots: The rollup’s pre- and post-state changes (the previous and new batch states) are represented as Merkle trees, known as state trees, and the state roots are submitted on the layer 1 chain.

These roots are then posted on the layer 1 chain, and Merkle proofs are used to verify a specific transaction or state and prove its inclusion in the rollup.

‍

Summary

Merkle trees are a data structure that hashes adjacent arbitrary pieces of data together iteratively to create a singular node–known as the Merkle root–that contains information about all the child nodes in the tree. Merkle proofs are then used to prove whether some arbitrary data was present in the tree.

Merkle trees are useful in efficient airdropping, storing blockchain transaction, receipt, and state data, and in rollups to consolidate and prove batch and state data.