Table of Contents

What are Opcodes?

Operation Codes (Opcodes) are machine-readable code instructions with human-readable labels that specify what operations a central processing unit (CPU) or virtual machine must perform. 

For example, the ADD opcode in the EVM instruction set is assigned 0x01, while in Python the BINARY_ADD opcode is assigned 0x17. These instructions are a large part of the infrastructure that makes modern computing possible.

The actions performed by opcodes range from stack management and simple arithmetic to bitwise operations and interactions with blockchain storage. 

When a software program is run, the execution of opcodes is expressed as binary via electrical signals that toggle small portions of the billions of available transistors in a CPU.

Opcodes in VM-based languages

Virtual machine (VM) based languages compile high-level code into bytecode that is independent of CPU architecture. Opcode labels like BINARY_OR correspond to specific byte values, such as 0x42. The VM acts as middleware to interpret the opcodes. 

These languages may rely on a Just-In-Time (JIT) compiler to convert them into machine code (binary) that CPUs can execute.

Web3 languages

Languages used to create smart contracts for blockchains use a VM-based approach. 

Blockchain related VMs have specialized opcodes for cryptographic hashing and blockchain interactions, alongside standard opcodes commonly found in other languages. Different blockchains may use customized VMs that contain a unique set of opcodes needed for their specific needs.

Opcodes in a blockchain VM are assigned a gas value positively correlated to their complexity. Gas fees exist to prevent spam on the network by charging users based on their requested compute to the network.

A smart contract function compiles into an unchangeable group of opcodes. When calldata reaches a contract's function and the transaction is placed into a specific block, it produces a deterministic output based on the input data and the function's opcodes. Functions that rely on block header items are non-deterministic until the transaction is placed into a specific block.

Within the context of the Ethereum Virtual Machine (EVM), validators execute smart contract logic through client software like Go-Ethereum (Geth), which processes calldata and interprets function opcodes via the EVM stack for the validator’s CPU to execute.

Regardless of CPU architecture used by validators, the client software successfully translates EVM opcodes into the proper machine code.

Solidity, Vyper, Yul, Huff

These common web3 languages create smart contracts that compile according to the same set of opcodes distinguished by the EVM.

  • High level languages like Solidity and Vyper focus on readability by limiting the ability to work with opcodes directly. 
  • Intermediate and Low level languages like Yul and Huff use syntax closer to opcodes. Yul is often used for gas optimization, while Huff is often used to formally verify code by allowing direct access to raw opcodes and explicit stack management.
// SOLIDITY
mapping(address => uint256) balances;

function deposit() public payable returns (address, uint256) {

      // The EVM automatically places 0x80 at 0x40
      // 0x40 is the location of the free memory pointer (FMP)
      // 0x80 is the first available location in memory
      // This means the FMP initially points to 0x80
	// PUSH1 0x80, PUSH1 0x40, MSTORE
	
	// CALLER, PUSH1 0x00, MSTORE, CALLVALUE, PUSH1 0x20, MSTORE
	address user = msg.sender;
	uint256 depositAmount = msg.value;

	// PUSH1 0x20, MLOAD, ISZERO, PUSH1 <to_jump_dest>, JUMPI,  
	// PUSH1 0x00, MSTORE, PUSH1 0x20, REVERT
	require(depositAmount != 0, "Zero Amount");

	// Hashing must be done to obtain the mapping slot for the user
	// PUSH1 0x00, PUSH1 0x00, MLOAD, DUP2, KECCAK256,
	// DUP1, SLOAD, PUSH1 0x20, MLOAD, ADD, SWAP1, SSTORE
	balances[user] += depositAmount;

	// PUSH1 0x00, MLOAD, PUSH1 0x00, MSTORE, 
	// PUSH1 0x20, MLOAD, PUSH1 0x20, MSTORE, 
	// PUSH1 0x40, PUSH1 0x00, RETURN
	return (user, depositAmount);
}

// 0x01 = ADD
// 0x15 = ISZERO
// 0x20 = KECCAK256
// 0x33 = CALLER
// 0x34 = CALLVALUE
// 0x52 = MSTORE
// 0x54 = SLOAD
// 0x57 = JUMPI
// 0x60 = PUSH1
// 0x80 = DUP1
// 0x81 = DUP2
// 0x90 = SWAP1
// 0xF3 = RETURN
// 0xFD = REVERT

Rust

Rust code can mimic VM-based languages when its compilation target is Berkeley Packet Filter (BPF) bytecode.

Most Solana smart contracts are written in Rust and are compiled into Solana BPF (sBPF). This is a customized version of the extended BPF (eBPF), tailored to include blockchain related operations. 

The opcodes in the sBPF instruction set are different from the set of opcodes in the EVM instruction set. Solana uses their own client software similar to Ethereum, where the "Solana Virtual Machine" (SVM) refers to the sBPF VM which executes sBPF bytecode.

Traditional languages

Python

Python uses the Python Virtual Machine (PVM) to execute compiled bytecode, which is interpreted into machine code at runtime. Python opcodes are often executed via an interpreter, but can also be compiled to machine code using something like PyPy, which uses Just-In-Time (JIT) compilation.

def divide_numbers():
	# (LOAD_CONST, STORE_NAME)
	num1 = 7 
	# (LOAD_CONST, STORE_NAME)
	num2 = 21 
	# LOAD_NAME, LOAD_NAME, BINARY_TRUE_DIVIDE, STORE_NAME
	result = num2 / num1 
	# LOAD_NAME, RETURN_VALUE
	return result 

# 0x1B = BINARY_TRUE_DIVIDE
# 0x53 = RETURN_VALUE
# 0x5A = STORE_NAME
# 0x64 = LOAD_CONST
# 0x65 = LOAD_NAME

Java

Java uses the Java Virtual Machine (JVM) to execute compiled bytecode. The JVM typically interprets the bytecode, but it can also use JIT compilation during runtime to translate the opcodes into machine code.

public class AddNumbers {
    public static void main(String[] args) {

	    // push integers onto stack, add them, and store result.
	    // iconst_1, iconst_2, iadd, istore_1
        int result = 1 + 2; 
        
        // load and print result, return void.
        // iload_1, getstatic, invokevirtual, return
        System.out.println(result);
    }
}
// 0x04 = iconst_1
// 0x05 = iconst_2
// 0x1B = iload_1
// 0x3C = istore_1
// 0x60 = iadd
// 0xB1 = return
// 0xB2 = getstatic
// 0xB6 = invokevirtual

Opcodes in compiled languages

In compiled languages, opcodes are byte values that correspond to assembly mnemonics. Each instruction may include additional bytes needed for register specification, immediate value inputs, and memory addressing.

These instructions compile into machine code at build time and are executed directly by the CPU without the need for a runtime interpretation by a VM.

C++

int main() {
    int result = 2 + 3;
    return result;
}
; Compile with ‘g++ -o example example.cpp’
; Display intel assembly syntax with ‘objdump -d -M intel example’

Main:

; Save base pointer (rbp) onto stack
    55                      push   rbp

; Set rbp to current stack pointer (rsp) to mark start of frame
    48 89 e5                mov    rbp,rsp

; Store value 5 (2+3) into local var at memory address [rbp-0x4]
; Constants are already precomputed during compilation
    c7 45 fc 05 00 00 00    mov    DWORD PTR [rbp-0x4],0x5

; Load value into eax register
    8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]

; Restore rbp from stack to clean up stack frame
    5d                      pop    rbp

; Return value found in eax
    c3                      ret

Rust

Rust can be compiled to native machine code for direct CPU execution by targeting architectures like x86 and ARM, rather than targeting BPF during smart contract development.

fn main() {
    let num1 = 5;
    let num2 = 10;
    let _result = num1 * num2;
}
; intel assembly syntax, https://rust.godbolt.org/
; Removed overflow related safety checks for simplicity

Main:

    ; Push (50) RAX onto the stack to save register state
    push    rax    
    
    ; B8 05 00 00 00 = Move the immediate value 5 into EAX       
    mov     eax, 5   
       
    ; B9 0A 00 00 00 = Move the immediate value 10 into ECX  
    mov     ecx, 10    
      
    ; 0F AF C1 = Multiply EAX by the ECX value and store result in EAX 
    imul    eax, ecx  
      
    ; Pop (58) value into RAX from stack to restore register state  
    pop     rax  
    
    ; Return (C3) from the function         
    ret     

Related Terms

No items found.