Rust
Introduction to Pinocchio

Introduction to Pinocchio

Reading and Writing Data

When building optimized Solana programs, efficient data serialization and deserialization can significantly impact performance.

While Pinocchio doesn't require low-level memory operations, understanding how to efficiently read and write account data can help you build faster programs.

The techniques in this guide work with any Solana development framework; whether you're using Pinocchio, Anchor, or the native SDK. The key is designing your data structures thoughtfully and handling serialization safely.

When to Use Unsafe Code

Use unsafe code only when:

  • You need maximum performance and have measured that safe alternatives are too slow
  • You can rigorously verify all safety invariants
  • You document the safety requirements clearly

Prefer safe alternatives when possible.

Safety Principles

When working with raw byte arrays and memory operations, we must be careful to avoid undefined behavior. Understanding these principles is crucial for writing correct and reliable code.

Buffer Bounds Checking

Always validate that your buffer is large enough before any read or write operation. Reading or writing beyond allocated memory is undefined behavior.

// Good: Check bounds first
if data.len() < size_of::<u64>() {
    return Err(ProgramError::InvalidInstructionData);
}
let value = u64::from_le_bytes(data[0..8].try_into().unwrap());
 
// Bad: No bounds checking - could panic or cause UB
let value = u64::from_le_bytes(data[0..8].try_into().unwrap());

Alignment Requirements

Every type in Rust has an alignment requirement that determines where it can be placed in memory. Reading a type from memory that isn't properly aligned results in undefined behavior. Most primitive types require alignment equal to their size:

  • u8: 1-byte alignment
  • u16: 2-byte alignment
  • u32: 4-byte alignment
  • u64: 8-byte alignment

This means a u64 must be stored at memory addresses divisible by 8, while a u16 must start at even addresses.

If this is not respected, the compiler will automatically inserts invisible "padding" bytes between struct fields to ensure each field meets its alignment requirements.

Additionally, the struct's total size must be a multiple of its largest field's alignment requirement.

This is what a bad ordered struct looks like:

#[repr(C)]
struct BadOrder {
    small: u8,    // 1 byte
    // padding: [u8; 7] since `big` needs to be aligned at 8 bytes.
    big: u64,     // 8 bytes  
    medium: u16,  // 2 bytes
    // padding: [u8; 6] since the struct size needs to be aligned to 8 bytes.
}

The compiler inserts 7 padding bytes after small because big requires 8-byte alignment. It then adds 6 more bytes at the end to make the total size (24 bytes) a multiple of 8 wasting 13 bytes.

A better way would be to order the field of the struct like this:

#[repr(C)]
struct GoodOrder {
    big: u64,     // 8 bytes
    medium: u16,  // 2 bytes  
    small: u8,    // 1 byte
    // padding: [u8; 5] since the struct size needs to be aligned to 8 bytes.
}

By placing larger fields first, we reduce padding from 13 bytes to just 5 bytes.

The golden rule is to order struct fields from largest to smallest alignment requirements to minimize padding and reduce memory usage.

There is another, more advanced way, to serialize and deserialize data for maximum space efficiency. We can create Zero-Padding Structs where alignment requirements are eliminated entirely:

#[repr(C)]
struct ByteArrayStruct {
    big: [u8; 8],    // represents u64
    medium: [u8; 2], // represents u16  
    small: u8,
}

The size in this case is exactly 11 bytes because everything is 1 byte aligned.

Valid Bit Patterns

Not all bit patterns are valid for every type. Types like bool, char, and enums have restricted valid values. Reading invalid bit patterns into these types is undefined behavior.

Reading Data

There are several approaches to reading data from account buffers, each with different trade-offs:

Field-by-Field Deserialization (Recommended)

The safest approach is to deserialize each field individually. This avoids all alignment issues because you're working with byte arrays:

pub struct DepositInstructionData {
    pub amount: u64,
    pub recipient: Pubkey,
}
 
impl<'a> TryFrom<&'a [u8]> for DepositInstructionData {
    type Error = ProgramError;
 
    fn try_from(data: &'a [u8]) -> Result<Self, Self::Error> {
        if data.len() < (size_of::<u64>() + size_of::<Pubkey>()) {
            return Err(ProgramError::InvalidInstructionData);
        }
 
        // No alignment issues: we're reading bytes and converting
        let amount = u64::from_le_bytes(
            data[0..8].try_into()
                .map_err(|_| ProgramError::InvalidInstructionData)?
        );
        
        let recipient = Pubkey::try_from(&data[8..40])
            .map_err(|_| ProgramError::InvalidInstructionData)?;
 
        Ok(Self { amount, recipient })
    }
}

Zero-Copy Deserialization

This can be used for maximum performance with properly aligned structs but it requires careful alignment checking:

#[repr(C)]
pub struct Config {
    pub authority: Pubkey,
    pub mint_x: Pubkey, 
    pub mint_y: Pubkey,
    pub seed: u64,        // This field requires 8-byte alignment
    pub fee: u16,         // This field requires 2-byte alignment  
    pub state: u8,
    pub config_bump: u8,
}
 
impl Config {
    pub const LEN: usize = size_of::<Self>();
    
    pub fn from_bytes(data: &[u8]) -> Result<&Self, ProgramError> {
        if data.len() != Self::LEN {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // Critical: Check alignment for the most restrictive field (u64 in this case)
        if (data.as_ptr() as usize) % core::mem::align_of::<Self>() != 0 {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // SAFETY: We've verified length and alignment
        Ok(unsafe { &*(data.as_ptr() as *const Self) })
    }
}
 
// Alternative: Avoid alignment issues entirely by using byte arrays for types with
// alignment requirement greater than 1 and provide accessor methods
#[repr(C)]
pub struct ConfigSafe {
    pub authority: Pubkey,
    pub mint_x: Pubkey, 
    pub mint_y: Pubkey,
    seed: [u8; 8],      // Convert with u64::from_le_bytes when needed
    fee: [u8; 2],       // Convert with u16::from_le_bytes when needed
    pub state: u8,
    pub config_bump: u8,
}
 
impl ConfigSafe {
    pub fn from_bytes(data: &[u8]) -> Result<&Self, ProgramError> {
        if data.len() != size_of::<Self>() {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // SAFETY: No alignment check needed - everything is u8 aligned
        Ok(unsafe { &*(data.as_ptr() as *const Self) })
    }
    
    pub fn seed(&self) -> u64 {
        u64::from_le_bytes(self.seed)
    }
    
    pub fn fee(&self) -> u16 {
        u16::from_le_bytes(self.fee)
    }
}

As you can see, both seed and fee fields are private. This is because we should always use accessor methods to read data, since their values are represented by byte arrays.

When you access a field directly (config.seed), the compiler may need to create a reference to that field's memory location, even temporarily. If that field is not properly aligned, creating the reference is undefined behavior, even if you never explicitly use the reference!

Accessor methods avoid this by performing the read operation within the method scope, where the compiler can optimize away any intermediate references.

#[repr(C, packed)]  // This can cause unaligned fields!
pub struct PackedConfig {
    pub state: u8,
    pub seed: u64,    // This u64 might not be 8-byte aligned due to packing
}
 
impl PackedConfig {
    pub fn seed(&self) -> u64 {
        self.seed  // Safe: Direct value copy, no reference created
    }
}
 
// Usage:
let config = PackedConfig::load(account)?;
 
// ❌ UNDEFINED BEHAVIOR: Creates a reference to potentially unaligned field
let seed_ref = &config.seed; // Compiler must create a reference here!
 
// ❌ UNDEFINED BEHAVIOR: Even this can be problematic
let seed_value = config.seed; // May create temporary reference internally
 
// ✅ SAFE: Accessor method reads value without creating reference
let seed_value = config.seed(); // No intermediate reference

u8 fields are always safe to access directly since u8 has alignment of 1 and is always aligned so we can directly use self.state

In this case we don't have any "special types", but always remember that some types require extra care due to invalid bit patterns:

pub struct StateAccount {
    pub is_active: bool,
    pub state_type: StateType,
    pub data: [u8; 32],
}
 
#[repr(u8)]
pub enum StateType {
    Inactive = 0,
    Active = 1,
    Paused = 2,
}
 
impl StateAccount {
    pub fn from_bytes(data: &[u8]) -> Result<Self, ProgramError> {
        if data.len() < size_of::<Self>() {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // Safely handle bool (only 0 or 1 are valid)
        let is_active = match data[0] {
            0 => false,
            1 => true,
            _ => return Err(ProgramError::InvalidAccountData),
        };
 
        // Safely handle enum
        let state_type = match data[1] {
            0 => StateType::Inactive,
            1 => StateType::Active, 
            2 => StateType::Paused,
            _ => return Err(ProgramError::InvalidAccountData),
        };
 
        let mut data_array = [0u8; 32];
        data_array.copy_from_slice(&data[2..34]);
 
        Ok(Self {
            is_active,
            state_type,
            data: data_array,
        })
    }
}

Dangerous Patterns to Avoid

Here are common patterns that can lead to undefined behavior and should be avoided:

  1. Using transmute() with Unaligned Data
// ❌ UNDEFINED BEHAVIOR: transmute requires proper alignment
let value: u64 = unsafe { core::mem::transmute(bytes_slice) };

transmute() assumes the source data is properly aligned for the target type. If you're working with arbitrary byte slices, this assumption is often violated.

  1. Pointer Casting to Packed Structs
#[repr(C, packed)]
pub struct PackedConfig {
    pub state: u8,
    pub seed: u64,     // This u64 is only 1-byte aligned!
    pub authority: Pubkey,
}
 
// ❌ UNDEFINED BEHAVIOR: Creates references to unaligned fields
let config = unsafe { &*(data.as_ptr() as *const PackedConfig) };
let seed_value = config.seed; // UB: May create reference to unaligned u64

Even though the struct fits in memory, accessing multi-byte fields can create unaligned references.

  1. Direct Field Access on Packed Structs
#[repr(C, packed)]
pub struct PackedStruct {
    pub a: u8,
    pub b: u64,
}
 
let packed = /* ... */;
// ❌ UNDEFINED BEHAVIOR: Creates reference to unaligned field
let b_ref = &packed.b;
// ❌ UNDEFINED BEHAVIOR: May create temporary reference
let b_value = packed.b;
  1. Assuming Alignment Without Verification
// ❌ UNDEFINED BEHAVIOR: No alignment check
let config = unsafe { &*(data.as_ptr() as *const Config) };

Just because data fits doesn't mean it's aligned properly.

  1. Using read_unaligned() Incorrectly
// ❌ WRONG: read_unaligned needs proper layout, not just size
#[repr(Rust)]  // Default layout - not guaranteed!
pub struct BadStruct {
    pub field: u64,
}
 
let value = unsafe { (data.as_ptr() as *const BadStruct).read_unaligned() };

read_unaligned() still requires the struct to have a predictable layout (#[repr(C)]).

Writing Data

Writing data safely follows similar principles to reading:

Field-by-Field Serialization (Recommended)

impl Config {
    pub fn write_to_buffer(&self, data: &mut [u8]) -> Result<(), ProgramError> {
        if data.len() != Self::LEN {
            return Err(ProgramError::InvalidAccountData);
        }
 
        let mut offset = 0;
        
        // Write authority
        data[offset..offset + 32].copy_from_slice(self.authority.as_ref());
        offset += 32;
        
        // Write mint_x  
        data[offset..offset + 32].copy_from_slice(self.mint_x.as_ref());
        offset += 32;
        
        // Write mint_y
        data[offset..offset + 32].copy_from_slice(self.mint_y.as_ref());
        offset += 32;
        
        // Write seed
        data[offset..offset + 8].copy_from_slice(&self.seed.to_le_bytes());
        offset += 8;
        
        // Write fee
        data[offset..offset + 2].copy_from_slice(&self.fee.to_le_bytes());
        offset += 2;
        
        // Write state
        data[offset] = self.state;
        offset += 1;
        
        // Write config_bump
        data[offset] = self.config_bump;
 
        Ok(())
    }
}

This approach it's the safest method because explicitly serializes each field into the byte buffer:

  • No alignment concerns: you're writing into a byte array
  • Explicit endianness: you control byte order with to_le_bytes()
  • Clear memory layout: easy to debug and understand
  • No undefined behavior: all operations are on byte arrays

Direct Mutation (Zero-Copy)

For maximum performance, you can cast the byte buffer to a struct and mutate fields directly. This requires the struct to be properly aligned:

impl Config {
    pub fn from_bytes_mut(data: &mut [u8]) -> Result<&mut Self, ProgramError> {
        if data.len() != Self::LEN {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // Check alignment
        if (data.as_ptr() as usize) % core::mem::align_of::<Self>() != 0 {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // SAFETY: We've verified length and alignment
        Ok(unsafe { &mut *(data.as_mut_ptr() as *mut Self) })
    }
}

When alignment is verified and the struct uses #[repr(C)], direct field mutation doesn't create unaligned references.

Byte Array Approach with Setters (Safest + Fast)

The best of both worlds: we can use byte arrays internally but provide ergonomic setters:

#[repr(C)]
pub struct ConfigSafe {
    pub authority: Pubkey,
    pub mint_x: Pubkey, 
    pub mint_y: Pubkey,
    seed: [u8; 8],
    fee: [u8; 2],
    pub state: u8,
    pub config_bump: u8,
}
 
impl ConfigSafe {
    pub fn from_bytes_mut(data: &mut [u8]) -> Result<&mut Self, ProgramError> {
        if data.len() != size_of::<Self>() {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // No alignment check needed - everything is u8 aligned
        Ok(unsafe { &mut *(data.as_mut_ptr() as *mut Self) })
    }
    
    pub fn seed(&self) -> u64 {
        u64::from_le_bytes(self.seed)
    }
    
    pub fn fee(&self) -> u16 {
        u16::from_le_bytes(self.fee)
    }
 
    // Setters that handle endianness correctly
 
    pub fn set_seed(&mut self, seed: u64) {
        self.seed = seed.to_le_bytes();
    }
    
    pub fn set_fee(&mut self, fee: u16) {
        self.fee = fee.to_le_bytes();
    }
}

This is ideal because:

  • No alignment issues: all fields are byte-aligned
  • Fast direct mutation: no serialization overhead after initial setup
  • Consistent endianness: setters handle byte order conversion
  • Type safety: setters take the expected types, not byte arrays

Dynamically Sized Data

Whenever possible, avoid storing dynamically sized data directly in accounts. However, some use cases require it.

If your account contains dynamic data, always place all statically sized fields at the beginning of your struct, and append the dynamic data at the end.

Single Dynamic Field

This is the simplest case: one variable-length section at the end of your account:

#[repr(C)]
pub struct DynamicAccount {
    pub fixed_data: [u8; 32],
    pub counter: u64,
    // Dynamic data follows after the struct in memory
    // Layout: [fixed_data][counter][dynamic_data...]
}
 
impl DynamicAccount {
    pub const FIXED_SIZE: usize = size_of::<Self>();
    
    /// Safely parse account with dynamic data
    pub fn from_bytes_with_dynamic(data: &[u8]) -> Result<(&Self, &[u8]), ProgramError> {
        if data.len() < Self::FIXED_SIZE {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // SAFETY: We've verified the buffer is large enough for the fixed part
        // The fixed part only contains [u8; 32] and u64, which have predictable layout
        let fixed_part = unsafe { &*(data.as_ptr() as *const Self) };
        
        // Everything after the fixed part is dynamic data
        let dynamic_part = &data[Self::FIXED_SIZE..];
        
        Ok((fixed_part, dynamic_part))
    }
    
    /// Get mutable references to both parts
    pub fn from_bytes_mut_with_dynamic(data: &mut [u8]) -> Result<(&mut Self, &mut [u8]), ProgramError> {
        if data.len() < Self::FIXED_SIZE {
            return Err(ProgramError::InvalidAccountData);
        }
 
        // Split the buffer to avoid borrowing issues
        let (fixed_bytes, dynamic_bytes) = data.split_at_mut(Self::FIXED_SIZE);
        
        // SAFETY: We've verified the size and split safely
        let fixed_part = unsafe { &mut *(fixed_bytes.as_mut_ptr() as *mut Self) };
        
        Ok((fixed_part, dynamic_bytes))
    }
}
 
/// Writing single dynamic field
impl DynamicAccount {
    pub fn write_with_dynamic(
        data: &mut [u8], 
        fixed_data: &[u8; 32], 
        counter: u64,
        dynamic_data: &[u8]
    ) -> Result<(), ProgramError> {
        let total_size = Self::FIXED_SIZE + dynamic_data.len();
        
        if data.len() != total_size {
            return Err(ProgramError::InvalidAccountData);
        }
        
        // Write fixed part field by field (safest approach)
        data[0..32].copy_from_slice(fixed_data);
        data[32..40].copy_from_slice(&counter.to_le_bytes());
        
        // Write dynamic part
        data[Self::FIXED_SIZE..].copy_from_slice(dynamic_data);
        
        Ok(())
    }
    
    /// Update just the dynamic portion
    pub fn update_dynamic_data(&mut self, account_data: &mut [u8], new_data: &[u8]) -> Result<(), ProgramError> {
        if account_data.len() < Self::FIXED_SIZE + new_data.len() {
            return Err(ProgramError::InvalidAccountData);
        }
        
        // Write new dynamic data
        account_data[Self::FIXED_SIZE..Self::FIXED_SIZE + new_data.len()].copy_from_slice(new_data);
        
        Ok(())
    }
}

To avoid undefined behavior, always check that the account data buffer is at least as large as the statically sized portion. The dynamic section may be empty, so this check is essential.

This layout ensures that the offsets for fixed-size fields are always known, regardless of the dynamic data’s length.

There are two main scenarios when reading dynamically sized data:

  • Single dynamic field at the end: You can easily determine the size and offset of the dynamic data at runtime like this:
const DYNAMIC_DATA_START_OFFSET: usize = size_of::<[u8; 32]>();
 
#[repr(C)]
pub struct DynamicallySizedAccount {
    pub sized_data: [u8; 32],
    // pub dynamic_data: &'info [u8], // Not part of the struct, but follows in the buffer
}
 
impl DynamicallySizedAccount {
    /// Returns the length of the dynamic data section.
    #[inline(always)]
    pub fn get_dynamic_data_len(data: &[u8]) -> Result<usize, ProgramError> {
        if data.len().le(&DYNAMIC_DATA_START_OFFSET) {
            return Err(ProgramError::InvalidAccountData);
        }
 
        Ok(data.len() - DYNAMIC_DATA_START_OFFSET)
    }
 
    /// Returns a slice of the dynamic data.
    #[inline(always)]
    pub fn read_dynamic_data(data: &[u8]) -> Result<&[u8], ProgramError> {
        if data.len().le(&DYNAMIC_DATA_START_OFFSET) {
            return Err(ProgramError::InvalidAccountData);
        }
 
        Ok(&data[DYNAMIC_DATA_START_OFFSET..])
    }
}

Multiple Dynamic Fields

This approach is more complex since we will need a way to determine the length of each dynamic field except the last. The most common approach is to prefix each dynamic field (except the last) with its length, so we can parse the buffer correctly.

Here’s a simple and robust pattern: store the length of the first dynamic field as a u8 (or u16, etc. if you need larger sizes) immediately after the statically sized data. The first dynamic field follows, and the second dynamic field occupies the remainder of the buffer.

#[repr(C)]  
pub struct MultiDynamicAccount {
    pub fixed_data: [u8; 32],
    pub timestamp: u64,
    // Layout: [fixed_data][timestamp][len1: u8][data1][data2: remainder]
}
 
impl MultiDynamicAccount {
    pub const FIXED_SIZE: usize = size_of::<Self>();
    pub const LEN_PREFIX_SIZE: usize = size_of::<u8>();
    pub const MIN_SIZE: usize = Self::FIXED_SIZE + Self::LEN_PREFIX_SIZE;
    
    /// Parse account with two dynamic sections
    pub fn parse_dynamic_fields(data: &[u8]) -> Result<(&[u8; 32], u64, &[u8], &[u8]), ProgramError> {
        if data.len() < Self::MIN_SIZE {
            return Err(ProgramError::InvalidAccountData);
        }
        
        // Extract fixed data safely
        let fixed_data = data[..32].try_into()
            .map_err(|_| ProgramError::InvalidAccountData)?;
            
        let timestamp = u64::from_le_bytes(
            data[32..40].try_into()
                .map_err(|_| ProgramError::InvalidAccountData)?
        );
            
        // Read length of first dynamic field (single byte)
        let len = data[Self::FIXED_SIZE] as usize;
        
        // Validate we have enough data
        if data.len() <  Self::MIN_SIZE + len {
            return Err(ProgramError::InvalidAccountData);
        }
        
        let data_1 = &data[Self::MIN_SIZE..Self::MIN_SIZE + len];
        let data_2 = &data[Self::MIN_SIZE + len..]; // Remainder
        
        Ok((fixed_data, timestamp, data_1, data_2))
    }
    
    /// Write account with two dynamic sections
    pub fn write_with_multiple_dynamic(
        buffer: &mut [u8],
        fixed_data: &[u8; 32],
        timestamp: u64,
        data_1: &[u8],
        data_2: &[u8]
    ) -> Result<(), ProgramError> {
        let total_size = Self::MIN_SIZE + data_1.len() + data_2.len();
        
        if buffer.len() != total_size {
            return Err(ProgramError::InvalidAccountData);
        }
        
        // Validate data_1 length fits in u8
        if data_1.len() > u8::MAX as usize {
            return Err(ProgramError::InvalidInstructionData);
        }
        
        let mut offset = 0;
        
        // Write fixed data
        buffer[offset..offset + 32].copy_from_slice(fixed_data);
        offset += 32;
        
        buffer[offset..offset + 8].copy_from_slice(&timestamp.to_le_bytes());
        offset += 8;
        
        // Write length prefix for data1 (single byte)
        buffer[offset] = data_1.len() as u8;
        offset += 1;
        
        // Write data1
        buffer[offset..offset + data_1.len()].copy_from_slice(data_1);
        offset += data_1.len();
        
        // Write data2 (remainder - no length prefix needed)
        buffer[offset..].copy_from_slice(data_2);
        
        Ok(())
    }
}

Resize the account

Every time you update a dynamic field, if the size changes, you must resize the account. Here’s a general-purpose function for resizing an account:

pub fn resize_account(
    account: &AccountInfo,
    payer: &AccountInfo,
    new_size: usize,
    zero_out: bool,
) -> ProgramResult {
    // If the account is already the correct size, return early
    if new_size == account.data_len() {
        return Ok(());
    }
 
    // Calculate rent requirements
    let rent = Rent::get()?;
    let new_minimum_balance = rent.minimum_balance(new_size);
 
    // Adjust lamports to meet rent-exemption requirements
    match new_minimum_balance.cmp(&account.lamports()) {
        core::cmp::Ordering::Greater => {
            // Need more lamports for rent exemption
            let lamports_diff = new_minimum_balance.saturating_sub(account.lamports());
            **payer.try_borrow_mut_lamports()? -= lamports_diff;
            **account.try_borrow_mut_lamports()? += lamports_diff;
        }
        core::cmp::Ordering::Less => {
            // Return excess lamports to payer
            let lamports_diff = account.lamports().saturating_sub(new_minimum_balance);
            **account.try_borrow_mut_lamports()? -= lamports_diff;
            **payer.try_borrow_mut_lamports()? += lamports_diff;
        }
        core::cmp::Ordering::Equal => {
            // No lamport transfer needed
        }
    }
 
    // Reallocate the account
    account.resize(new_size)?;
 
    Ok(())
}
Contents
View Source
Blueshift © 2025Commit: e508535