Operation Codes (Opcodes) are machine-readable code instructions with human-readable labels that specify what operations a central processing unit (CPU) or virtual machine must perform.
For example, the ADD
opcode in the EVM instruction set is assigned 0x01
, while in Python the BINARY_ADD
opcode is assigned 0x17
. These instructions are a large part of the infrastructure that makes modern computing possible.
The actions performed by opcodes range from stack management and simple arithmetic to bitwise operations and interactions with blockchain storage.
When a software program is run, the execution of opcodes is expressed as binary via electrical signals that toggle small portions of the billions of available transistors in a CPU.
Virtual machine (VM) based languages compile high-level code into bytecode that is independent of CPU architecture. Opcode labels like BINARY_OR
correspond to specific byte values, such as 0x42
. The VM acts as middleware to interpret the opcodes.
These languages may rely on a Just-In-Time (JIT) compiler to convert them into machine code (binary) that CPUs can execute.
Languages used to create smart contracts for blockchains use a VM-based approach.
Blockchain related VMs have specialized opcodes for cryptographic hashing and blockchain interactions, alongside standard opcodes commonly found in other languages. Different blockchains may use customized VMs that contain a unique set of opcodes needed for their specific needs.
Opcodes in a blockchain VM are assigned a gas value positively correlated to their complexity. Gas fees exist to prevent spam on the network by charging users based on their requested compute to the network.
A smart contract function compiles into an unchangeable group of opcodes. When calldata reaches a contract's function and the transaction is placed into a specific block, it produces a deterministic output based on the input data and the function's opcodes. Functions that rely on block header items are non-deterministic until the transaction is placed into a specific block.
Within the context of the Ethereum Virtual Machine (EVM), validators execute smart contract logic through client software like Go-Ethereum (Geth), which processes calldata and interprets function opcodes via the EVM stack for the validator’s CPU to execute.
Regardless of CPU architecture used by validators, the client software successfully translates EVM opcodes into the proper machine code.
These common web3 languages create smart contracts that compile according to the same set of opcodes distinguished by the EVM.
// SOLIDITY
mapping(address => uint256) balances;
function deposit() public payable returns (address, uint256) {
// The EVM automatically places 0x80 at 0x40
// 0x40 is the location of the free memory pointer (FMP)
// 0x80 is the first available location in memory
// This means the FMP initially points to 0x80
// PUSH1 0x80, PUSH1 0x40, MSTORE
// CALLER, PUSH1 0x00, MSTORE, CALLVALUE, PUSH1 0x20, MSTORE
address user = msg.sender;
uint256 depositAmount = msg.value;
// PUSH1 0x20, MLOAD, ISZERO, PUSH1 <to_jump_dest>, JUMPI,
// PUSH1 0x00, MSTORE, PUSH1 0x20, REVERT
require(depositAmount != 0, "Zero Amount");
// Hashing must be done to obtain the mapping slot for the user
// PUSH1 0x00, PUSH1 0x00, MLOAD, DUP2, KECCAK256,
// DUP1, SLOAD, PUSH1 0x20, MLOAD, ADD, SWAP1, SSTORE
balances[user] += depositAmount;
// PUSH1 0x00, MLOAD, PUSH1 0x00, MSTORE,
// PUSH1 0x20, MLOAD, PUSH1 0x20, MSTORE,
// PUSH1 0x40, PUSH1 0x00, RETURN
return (user, depositAmount);
}
// 0x01 = ADD
// 0x15 = ISZERO
// 0x20 = KECCAK256
// 0x33 = CALLER
// 0x34 = CALLVALUE
// 0x52 = MSTORE
// 0x54 = SLOAD
// 0x57 = JUMPI
// 0x60 = PUSH1
// 0x80 = DUP1
// 0x81 = DUP2
// 0x90 = SWAP1
// 0xF3 = RETURN
// 0xFD = REVERT
Rust code can mimic VM-based languages when its compilation target is Berkeley Packet Filter (BPF) bytecode.
Most Solana smart contracts are written in Rust and are compiled into Solana BPF (sBPF). This is a customized version of the extended BPF (eBPF), tailored to include blockchain related operations.
The opcodes in the sBPF instruction set are different from the set of opcodes in the EVM instruction set. Solana uses their own client software similar to Ethereum, where the "Solana Virtual Machine" (SVM) refers to the sBPF VM which executes sBPF bytecode.
Python uses the Python Virtual Machine (PVM) to execute compiled bytecode, which is interpreted into machine code at runtime. Python opcodes are often executed via an interpreter, but can also be compiled to machine code using something like PyPy, which uses Just-In-Time (JIT) compilation.
def divide_numbers():
# (LOAD_CONST, STORE_NAME)
num1 = 7
# (LOAD_CONST, STORE_NAME)
num2 = 21
# LOAD_NAME, LOAD_NAME, BINARY_TRUE_DIVIDE, STORE_NAME
result = num2 / num1
# LOAD_NAME, RETURN_VALUE
return result
# 0x1B = BINARY_TRUE_DIVIDE
# 0x53 = RETURN_VALUE
# 0x5A = STORE_NAME
# 0x64 = LOAD_CONST
# 0x65 = LOAD_NAME
Java uses the Java Virtual Machine (JVM) to execute compiled bytecode. The JVM typically interprets the bytecode, but it can also use JIT compilation during runtime to translate the opcodes into machine code.
public class AddNumbers {
public static void main(String[] args) {
// push integers onto stack, add them, and store result.
// iconst_1, iconst_2, iadd, istore_1
int result = 1 + 2;
// load and print result, return void.
// iload_1, getstatic, invokevirtual, return
System.out.println(result);
}
}
// 0x04 = iconst_1
// 0x05 = iconst_2
// 0x1B = iload_1
// 0x3C = istore_1
// 0x60 = iadd
// 0xB1 = return
// 0xB2 = getstatic
// 0xB6 = invokevirtual
In compiled languages, opcodes are byte values that correspond to assembly mnemonics. Each instruction may include additional bytes needed for register specification, immediate value inputs, and memory addressing.
These instructions compile into machine code at build time and are executed directly by the CPU without the need for a runtime interpretation by a VM.
int main() {
int result = 2 + 3;
return result;
}
; Compile with ‘g++ -o example example.cpp’
; Display intel assembly syntax with ‘objdump -d -M intel example’
Main:
; Save base pointer (rbp) onto stack
55 push rbp
; Set rbp to current stack pointer (rsp) to mark start of frame
48 89 e5 mov rbp,rsp
; Store value 5 (2+3) into local var at memory address [rbp-0x4]
; Constants are already precomputed during compilation
c7 45 fc 05 00 00 00 mov DWORD PTR [rbp-0x4],0x5
; Load value into eax register
8b 45 fc mov eax,DWORD PTR [rbp-0x4]
; Restore rbp from stack to clean up stack frame
5d pop rbp
; Return value found in eax
c3 ret
Rust can be compiled to native machine code for direct CPU execution by targeting architectures like x86 and ARM, rather than targeting BPF during smart contract development.
fn main() {
let num1 = 5;
let num2 = 10;
let _result = num1 * num2;
}
; intel assembly syntax, https://rust.godbolt.org/
; Removed overflow related safety checks for simplicity
Main:
; Push (50) RAX onto the stack to save register state
push rax
; B8 05 00 00 00 = Move the immediate value 5 into EAX
mov eax, 5
; B9 0A 00 00 00 = Move the immediate value 10 into ECX
mov ecx, 10
; 0F AF C1 = Multiply EAX by the ECX value and store result in EAX
imul eax, ecx
; Pop (58) value into RAX from stack to restore register state
pop rax
; Return (C3) from the function
ret