Instructions
Now that you understand sBPF's registers and memory regions, let's examine the instructions that manipulate them.
Instructions are the fundamental operations your program performs—adding numbers, loading from memory, or jumping to different locations.
What are Instructions?
Instructions are your program's basic building blocks. Think of them as commands that tell the processor exactly what to do:
add64 r1, r2: "Add the values in registersr1andr2, store result inr1"ldxdw r0, [r10 - 8]: "Load 8 bytes from stack memory into registerr0"jeq r1, 42, +3: "Ifr1equals 42, jump forward 3 instructions"
Each instruction performs exactly one operation and encodes as precisely 8 bytes of data for instant VM decoding.
sBPF instructions work with different data sizes:
byte = 8 bits (1 byte)
halfword = 16 bits (2 bytes)
word = 32 bits (4 bytes)
doubleword = 64 bits (8 bytes)
Most sBPF operations use 64-bit values (doublewords) since registers are 64 bits, but you can load and store smaller sizes when needed for efficiency though.
Instruction Categories and Format
When you compile Rust, C, or assembly code, the toolchain emits a stream of fixed-width, 8-byte instructions packed into your ELF's .text section.
Each instruction follows a consistent structure that the VM can decode in a single pass:
1 byte 4 bits 4 bits 2 bytes 4 bytes
┌──────────┬────────┬────────┬──────────────┬──────────────────┐
│ opcode │ dst │ src │ offset │ imm │
└──────────┴────────┴────────┴──────────────┴──────────────────┘opcode: Defines the operation type. The top 3 bits select the instruction class (arithmetic, memory, jump, call, exit), while the lower 5 bits specify the exact variant (add, multiply, load, jump-if-equal).dst: The destination register number (r0–r10) where results are stored—arithmetic results, loaded values, or helper function returns.src: The source register providing input. For two-operand arithmetic (add r1, r2), it supplies the second value. For memory operations, it can provide the base address. For immediate variants (add r1, 10), these 4 bits fold into the opcode.offset: A small integer that modifies instruction behavior. For loads/stores, it's added to the source address to reach[src + offset]. For jumps, it's a relative branch target measured in instructions.imm:The immediate value field. Arithmetic operations use it for constants (add r1, 42),CALLuses it for syscall numbers (sol_log = 16), and memory operations may treat it as an absolute pointer.
Instruction Categories
Different instruction types use these fields in specific ways:
Data Movement: Move values between registers and memory:
mov64 r1, 42 // Put immediate value 42 into r1
// opcode=move_imm, dst=1, src=unused, imm=42
ldxdw r0, [r10 - 8] // Load 8 bytes from stack into r0
// opcode=load64, dst=0, src=10, offset=-8, imm=unused
stxdw [r1 + 16], r0 // Store r0 to memory at [r1 + 16]
// opcode=store64, dst=1, src=0, offset=16, imm=unusedArithmetic: Perform mathematical operations:
add64 r1, r2 // r1 = r1 + r2
// opcode=add_reg, dst=1, src=2, offset=unused, imm=unused
add64 r1, 100 // r1 = r1 + 100
// opcode=add_imm, dst=1, src=unused, offset=unused, imm=100Control Flow: Change execution sequence:
ja +5 // Jump forward 5 instructions unconditionally
// opcode=jump, dst=unused, src=unused, offset=5, imm=unused
jeq r1, r2, +3 // If r1 == r2, jump forward 3 instructions
// opcode=jump_eq_reg, dst=1, src=2, offset=3, imm=unused
jeq r1, 42, +3 // If r1 == 42, jump forward 3 instructions
// opcode=jump_eq_imm, dst=1, src=unused, offset=3, imm=42Opcode Encoding
The opcode encoding captures multiple pieces of information beyond just the operation type:
Instruction class: Arithmetic, memory, jump, call, etc.
Operation size: 32-bit vs 64-bit operations
Source type: Register vs immediate value
Specific operation: Add vs subtract, load vs store, etc.
This creates distinct opcodes for instruction variants. For example, add64 r1, r2 (register source) uses a different opcode than add64 r1, 42 (immediate source). Similarly, add64 and add32 have different opcodes for different operation sizes.
Arithmetic operations further distinguish between signed and unsigned variants. udiv64 treats values as unsigned (0 to 18 quintillion), while sdiv64 handles signed values (-9 quintillion to +9 quintillion).
Instruction Execution
The opcode determines how the VM interprets the remaining fields.
When the VM encounters add64 r1, r2, it reads the opcode and recognizes this as a 64-bit arithmetic operation using two registers:
The dst field indicates the result goes into r1, the src field specifies r2 as the second operand, and the offset and immediate fields are ignored.
For add64 r1, 42, the opcode changes to indicate an immediate operation. Now dst still points to r1, but src becomes meaningless, and the immediate field provides the second operand (42).
Memory operations combine multiple fields meaningfully:
For ldxdw r1, [r2+8], the opcode indicates a 64-bit memory load, dst receives the loaded value, src provides the base address, and offset (8) is added to create the final address r2 + 8.
Control flow instructions follow the same pattern:
When you write jeq r1, r2, +5, the opcode encodes a conditional jump comparing two registers. If r1 equals r2, the VM adds the offset (5) to the program counter, jumping forward 5 instructions.
Function Calls and Syscalls
sBPF's call mechanism evolved across versions for better clarity and security. Until sBPF v3, call imm served dual purposes: the immediate value determined whether you were calling an internal function or invoking a syscall.
The runtime distinguished between these based on the immediate value range, with syscall numbers typically being small positive integers like 16 for sol_log.
From sBPF v3 onwards, the instructions separated for explicit behavior. call off now handles internal function calls using relative offsets, while syscall imm explicitly invokes runtime functions. This separation makes bytecode intentions clear and enables better verification.
Indirect calls through callx also evolved. Earlier versions encoded the target register in the immediate field, but from v2 onwards, it's encoded in the source register field for consistency with the general instruction format.