Assembly
Introduction to Assembly

Introduction to Assembly

Instructions

Now that you understand sBPF's registers and memory regions, let's examine the instructions that manipulate them.

Instructions are the fundamental operations your program performs—adding numbers, loading from memory, or jumping to different locations.

What are Instructions?

Instructions are your program's basic building blocks. Think of them as commands that tell the processor exactly what to do:

  • add64 r1, r2: "Add the values in registers r1 and r2, store result in r1"
  • ldxdw r0, [r10 - 8]: "Load 8 bytes from stack memory into register r0"
  • jeq r1, 42, +3: "If r1 equals 42, jump forward 3 instructions"

Each instruction performs exactly one operation and encodes as precisely 8 bytes of data for instant VM decoding.

sBPF instructions work with different data sizes:

  • byte = 8 bits (1 byte)
  • halfword = 16 bits (2 bytes)
  • word = 32 bits (4 bytes)
  • doubleword = 64 bits (8 bytes)

Most sBPF operations use 64-bit values (doublewords) since registers are 64 bits, but you can load and store smaller sizes when needed for efficiency tho.

Instruction Categories and Format

When you compile Rust, C, or assembly code, the toolchain emits a stream of fixed-width, 8-byte instructions packed into your ELF's .text section.

Each instruction follows a consistent structure that the VM can decode in a single pass:

   1 byte    4 bits   4 bits     2 bytes         4 bytes
┌──────────┬────────┬────────┬──────────────┬──────────────────┐
│  opcode  │  dst   │  src   │   offset     │      imm         │
└──────────┴────────┴────────┴──────────────┴──────────────────┘
  • opcode: Defines the operation type. The top 3 bits select the instruction class (arithmetic, memory, jump, call, exit), while the lower 5 bits specify the exact variant (add, multiply, load, jump-if-equal).
  • dst: The destination register number (r0–r10) where results are stored—arithmetic results, loaded values, or helper function returns.
  • src: The source register providing input. For two-operand arithmetic (add r1, r2), it supplies the second value. For memory operations, it can provide the base address. For immediate variants (add r1, 10), these 4 bits fold into the opcode.
  • offset: A small integer that modifies instruction behavior. For loads/stores, it's added to the source address to reach [src + offset]. For jumps, it's a relative branch target measured in instructions.
  • imm:The immediate value field. Arithmetic operations use it for constants (add r1, 42), CALL uses it for syscall numbers (sol_log = 16), and memory operations may treat it as an absolute pointer.

Instruction Categories

Different instruction types use these fields in specific ways:

  • Data Movement: Move values between registers and memory:
mov64 r1, 42           # Put immediate value 42 into r1
                       # opcode=move_imm, dst=1, src=unused, imm=42
 
ldxdw r0, [r10 - 8]    # Load 8 bytes from stack into r0  
                       # opcode=load64, dst=0, src=10, offset=-8, imm=unused
 
stxdw [r1 + 16], r0    # Store r0 to memory at [r1 + 16]
                       # opcode=store64, dst=1, src=0, offset=16, imm=unused
  • Arithmetic: Perform mathematical operations:
add64 r1, r2           # r1 = r1 + r2
                       # opcode=add_reg, dst=1, src=2, offset=unused, imm=unused
 
add64 r1, 100          # r1 = r1 + 100  
                       # opcode=add_imm, dst=1, src=unused, offset=unused, imm=100
  • Control Flow: Change execution sequence:
ja +5                  # Jump forward 5 instructions unconditionally
                       # opcode=jump, dst=unused, src=unused, offset=5, imm=unused
 
jeq r1, r2, +3         # If r1 == r2, jump forward 3 instructions
                       # opcode=jump_eq_reg, dst=1, src=2, offset=3, imm=unused
 
jeq r1, 42, +3         # If r1 == 42, jump forward 3 instructions  
                       # opcode=jump_eq_imm, dst=1, src=unused, offset=3, imm=42

Opcode Encoding

The opcode encoding captures multiple pieces of information beyond just the operation type:

  • Instruction class: Arithmetic, memory, jump, call, etc.
  • Operation size: 32-bit vs 64-bit operations
  • Source type: Register vs immediate value
  • Specific operation: Add vs subtract, load vs store, etc.

This creates distinct opcodes for instruction variants. For example, add64 r1, r2 (register source) uses a different opcode than add64 r1, 42 (immediate source). Similarly, add64 and add32 have different opcodes for different operation sizes.

Arithmetic operations further distinguish between signed and unsigned variants. udiv64 treats values as unsigned (0 to 18 quintillion), while sdiv64 handles signed values (-9 quintillion to +9 quintillion).

Instruction Execution

The opcode determines how the VM interprets the remaining fields.

When the VM encounters add64 r1, r2, it reads the opcode and recognizes this as a 64-bit arithmetic operation using two registers:

The dst field indicates the result goes into r1, the src field specifies r2 as the second operand, and the offset and immediate fields are ignored.

For add64 r1, 42, the opcode changes to indicate an immediate operation. Now dst still points to r1, but src becomes meaningless, and the immediate field provides the second operand (42).

Memory operations combine multiple fields meaningfully:

For ldxdw r1, [r2+8], the opcode indicates a 64-bit memory load, dst receives the loaded value, src provides the base address, and offset (8) is added to create the final address r2 + 8.

Control flow instructions follow the same pattern:

When you write jeq r1, r2, +5, the opcode encodes a conditional jump comparing two registers. If r1 equals r2, the VM adds the offset (5) to the program counter, jumping forward 5 instructions.

The opcode determines which fields are meaningful. The instruction format remains constant: the opcode tells you how to interpret each field, eliminating complex addressing modes or special cases.

Function Calls and Syscalls

sBPF's call mechanism evolved across versions for better clarity and security. Until sBPF v3, call imm served dual purposes: the immediate value determined whether you were calling an internal function or invoking a syscall.

The runtime distinguished between these based on the immediate value range, with syscall numbers typically being small positive integers like 16 for sol_log.

From sBPF v3 onwards, the instructions separated for explicit behavior. call off now handles internal function calls using relative offsets, while syscall imm explicitly invokes runtime functions. This separation makes bytecode intentions clear and enables better verification.

Indirect calls through callx also evolved. Earlier versions encoded the target register in the immediate field, but from v2 onwards, it's encoded in the source register field for consistency with the general instruction format.

Opcodes Reference Table

Memory Load Operations

opcodeMnemonicDescription
lddwlddw dst, immLoad 64-bit immediate (first slot)
lddwlddw dst, immLoad 64-bit immediate (second slot)
ldxwldxw dst, [src + off]Load word from memory
ldxhldxh dst, [src + off]Load halfword from memory
ldxbldxb dst, [src + off]Load byte from memory
ldxdwldxdw dst, [src + off]Load doubleword from memory

Memory Store Operations

opcodeMnemonicDescription
stwstw [dst + off], immStore word immediate
sthsth [dst + off], immStore halfword immediate
stbstb [dst + off], immStore byte immediate
stdwstdw [dst + off], immStore doubleword immediate
stxwstxw [dst + off], srcStore word from register
stxhstxh [dst + off], srcStore halfword from register
stxbstxb [dst + off], srcStore byte from register
stxdwstxdw [dst + off], srcStore doubleword from register

Arithmetic Operations (64-bit)

opcodeMnemonicDescription
add64add64 dst, immAdd immediate
add64add64 dst, srcAdd register
sub64sub64 dst, immSubtract immediate
sub64sub64 dst, srcSubtract register
mul64mul64 dst, immMultiply immediate
mul64mul64 dst, srcMultiply register
div64div64 dst, immDivide immediate (unsigned)
div64div64 dst, srcDivide register (unsigned)
sdiv64sdiv64 dst, immDivide immediate (signed)
sdiv64sdiv64 dst, srcDivide register (signed)
mod64mod64 dst, immModulo immediate (unsigned)
mod64mod64 dst, srcModulo register (unsigned)
smod64smod64 dst, immModulo immediate (signed)
smod64smod64 dst, srcModulo register (signed)
neg64neg64 dstNegate

Arithmetic Operations (32-bit)

opcodeMnemonicDescription
add32add32 dst, immAdd immediate (32-bit)
add32add32 dst, srcAdd register (32-bit)
sub32sub32 dst, immSubtract immediate (32-bit)
sub32sub32 dst, srcSubtract register (32-bit)
mul32mul32 dst, immMultiply immediate (32-bit)
mul32mul32 dst, srcMultiply register (32-bit)
div32div32 dst, immDivide immediate (32-bit)
div32div32 dst, srcDivide register (32-bit)
sdiv32sdiv32 dst, immDivide immediate (signed 32-bit)
sdiv32sdiv32 dst, srcDivide register (signed 32-bit)
mod32mod32 dst, immModulo immediate (32-bit)
mod32mod32 dst, srcModulo register (32-bit)
smod32smod32 dst, immModulo immediate (signed 32-bit)
smod32smod32 dst, srcModulo register (signed 32-bit)

Logical Operations (64-bit)

opcodeMnemonicDescription
or64or64 dst, immBitwise OR immediate
or64or64 dst, srcBitwise OR register
and64and64 dst, immBitwise AND immediate
and64and64 dst, srcBitwise AND register
lsh64lsh64 dst, immLeft shift immediate
lsh64lsh64 dst, srcLeft shift register
rsh64rsh64 dst, immRight shift immediate
rsh64rsh64 dst, srcRight shift register
xor64xor64 dst, immBitwise XOR immediate
xor64xor64 dst, srcBitwise XOR register
mov64mov64 dst, immMove immediate
mov64mov64 dst, srcMove register
arsh64arsh64 dst, immArithmetic right shift imm
arsh64arsh64 dst, srcArithmetic right shift reg

Logical Operations (32-bit)

opcodeMnemonicDescription
or32or32 dst, immBitwise OR immediate (32-bit)
or32or32 dst, srcBitwise OR register (32-bit)
and32and32 dst, immBitwise AND immediate (32-bit)
and32and32 dst, srcBitwise AND register (32-bit)
lsh32lsh32 dst, immLeft shift immediate (32-bit)
lsh32lsh32 dst, srcLeft shift register (32-bit)
rsh32rsh32 dst, immRight shift immediate (32-bit)
rsh32rsh32 dst, srcRight shift register (32-bit)
xor32xor32 dst, immBitwise XOR immediate (32-bit)
xor32xor32 dst, srcBitwise XOR register (32-bit)
mov32mov32 dst, immMove immediate (32-bit)
mov32mov32 dst, srcMove register (32-bit)
arsh32arsh32 dst, immArith right shift imm (32-bit)
arsh32arsh32 dst, srcArith right shift reg (32-bit)

Control Flow Operations

opcodeMnemonicDescription
jaja offUnconditional jump (jump 0 = jump to next)
jeqjeq dst, imm, offJump if equal to immediate
jeqjeq dst, src, offJump if equal to register
jgtjgt dst, imm, offJump if greater than immediate (unsigned)
jgtjgt dst, src, offJump if greater than register (unsigned)
jgejge dst, imm, offJump if greater or equal immediate (unsigned)
jgejge dst, src, offJump if greater or equal register (unsigned)
jsetjset dst, imm, offJump if bit set (immediate mask)
jsetjset dst, src, offJump if bit set (register mask)
jnejne dst, imm, offJump if not equal to immediate
jnejne dst, src, offJump if not equal to register
jsgtjsgt dst, imm, offJump if greater than immediate (signed)
jsgtjsgt dst, src, offJump if greater than register (signed)
jsgejsge dst, imm, offJump if greater or equal immediate (signed)
jsgejsge dst, src, offJump if greater or equal register (signed)
jltjlt dst, imm, offJump if less than immediate (unsigned)
jltjlt dst, src, offJump if less than register (unsigned)
jlejle dst, imm, offJump if less or equal immediate (unsigned)
jlejle dst, src, offJump if less or equal register (unsigned)
jsltjslt dst, imm, offJump if less than immediate (signed)
jsltjslt dst, src, offJump if less than register (signed)
jslejsle dst, imm, offJump if less or equal immediate (signed)
jslejsle dst, src, offJump if less or equal register (signed)

Function Call Operations

opcodeMnemonicDescription
callcall imm or syscall immCall function or syscall
callxcallx immIndirect call (register in imm field)
exitexit or returnReturn from function

Byte Swap Operations

opcodeMnemonicDescription
bebe dst, immByte swap (16, 32, or 64 bit)
lele dst, immLittle endian convert (deprecated)
Contents
View Source
Blueshift © 2025Commit: e508535