Reading and Writing Data
When building optimized Solana programs, efficient data serialization and deserialization can significantly impact performance.
While Pinocchio doesn't require low-level memory operations, understanding how to efficiently read and write account data can help you build faster programs.
The techniques in this guide work with any Solana development framework; whether you're using Pinocchio, Anchor, or the native SDK. The key is designing your data structures thoughtfully and handling serialization safely.
When to Use Unsafe Code
Use unsafe code only when:
- You need maximum performance and have measured that safe alternatives are too slow
- You can rigorously verify all safety invariants
- You document the safety requirements clearly
Safety Principles
When working with raw byte arrays and memory operations, we must be careful to avoid undefined behavior. Understanding these principles is crucial for writing correct and reliable code.
Buffer Bounds Checking
Always validate that your buffer is large enough before any read or write operation. Reading or writing beyond allocated memory is undefined behavior.
// Good: Check bounds first
if data.len() < size_of::<u64>() {
return Err(ProgramError::InvalidInstructionData);
}
let value = u64::from_le_bytes(data[0..8].try_into().unwrap());
// Bad: No bounds checking - could panic or cause UB
let value = u64::from_le_bytes(data[0..8].try_into().unwrap());
Alignment Requirements
Every type in Rust has an alignment requirement that determines where it can be placed in memory. Reading a type from memory that isn't properly aligned results in undefined behavior. Most primitive types require alignment equal to their size:
u8
: 1-byte alignmentu16
: 2-byte alignmentu32
: 4-byte alignmentu64
: 8-byte alignment
This means a u64
must be stored at memory addresses divisible by 8, while a u16
must start at even addresses.
If this is not respected, the compiler will automatically inserts invisible "padding" bytes between struct fields to ensure each field meets its alignment requirements.
This is what a bad ordered struct looks like:
#[repr(C)]
struct BadOrder {
small: u8, // 1 byte
// padding: [u8; 7] since `big` needs to be aligned at 8 bytes.
big: u64, // 8 bytes
medium: u16, // 2 bytes
// padding: [u8; 6] since the struct size needs to be aligned to 8 bytes.
}
The compiler inserts 7 padding bytes after small because big requires 8-byte alignment. It then adds 6 more bytes at the end to make the total size (24 bytes) a multiple of 8 wasting 13 bytes.
A better way would be to order the field of the struct like this:
#[repr(C)]
struct GoodOrder {
big: u64, // 8 bytes
medium: u16, // 2 bytes
small: u8, // 1 byte
// padding: [u8; 5] since the struct size needs to be aligned to 8 bytes.
}
By placing larger fields first, we reduce padding from 13 bytes to just 5 bytes.
There is another, more advanced way, to serialize and deserialize data for maximum space efficiency. We can create Zero-Padding Structs where alignment requirements are eliminated entirely:
#[repr(C)]
struct ByteArrayStruct {
big: [u8; 8], // represents u64
medium: [u8; 2], // represents u16
small: u8,
}
The size in this case is exactly 11 bytes because everything is 1 byte aligned.
Valid Bit Patterns
Not all bit patterns are valid for every type. Types like bool
, char
, and enums
have restricted valid values. Reading invalid bit patterns into these types is undefined behavior.
Reading Data
There are several approaches to reading data from account buffers, each with different trade-offs:
Field-by-Field Deserialization (Recommended)
The safest approach is to deserialize each field individually. This avoids all alignment issues because you're working with byte arrays:
pub struct DepositInstructionData {
pub amount: u64,
pub recipient: Pubkey,
}
impl<'a> TryFrom<&'a [u8]> for DepositInstructionData {
type Error = ProgramError;
fn try_from(data: &'a [u8]) -> Result<Self, Self::Error> {
if data.len() < (size_of::<u64>() + size_of::<Pubkey>()) {
return Err(ProgramError::InvalidInstructionData);
}
// No alignment issues: we're reading bytes and converting
let amount = u64::from_le_bytes(
data[0..8].try_into()
.map_err(|_| ProgramError::InvalidInstructionData)?
);
let recipient = Pubkey::try_from(&data[8..40])
.map_err(|_| ProgramError::InvalidInstructionData)?;
Ok(Self { amount, recipient })
}
}
Zero-Copy Deserialization
This can be used for maximum performance with properly aligned structs but it requires careful alignment checking:
#[repr(C)]
pub struct Config {
pub authority: Pubkey,
pub mint_x: Pubkey,
pub mint_y: Pubkey,
pub seed: u64, // This field requires 8-byte alignment
pub fee: u16, // This field requires 2-byte alignment
pub state: u8,
pub config_bump: u8,
}
impl Config {
pub const LEN: usize = size_of::<Self>();
pub fn from_bytes(data: &[u8]) -> Result<&Self, ProgramError> {
if data.len() != Self::LEN {
return Err(ProgramError::InvalidAccountData);
}
// Critical: Check alignment for the most restrictive field (u64 in this case)
if (data.as_ptr() as usize) % core::mem::align_of::<Self>() != 0 {
return Err(ProgramError::InvalidAccountData);
}
// SAFETY: We've verified length and alignment
Ok(unsafe { &*(data.as_ptr() as *const Self) })
}
}
// Alternative: Avoid alignment issues entirely by using byte arrays for types with
// alignment requirement greater than 1 and provide accessor methods
#[repr(C)]
pub struct ConfigSafe {
pub authority: Pubkey,
pub mint_x: Pubkey,
pub mint_y: Pubkey,
seed: [u8; 8], // Convert with u64::from_le_bytes when needed
fee: [u8; 2], // Convert with u16::from_le_bytes when needed
pub state: u8,
pub config_bump: u8,
}
impl ConfigSafe {
pub fn from_bytes(data: &[u8]) -> Result<&Self, ProgramError> {
if data.len() != size_of::<Self>() {
return Err(ProgramError::InvalidAccountData);
}
// SAFETY: No alignment check needed - everything is u8 aligned
Ok(unsafe { &*(data.as_ptr() as *const Self) })
}
pub fn seed(&self) -> u64 {
u64::from_le_bytes(self.seed)
}
pub fn fee(&self) -> u16 {
u16::from_le_bytes(self.fee)
}
}
As you can see, both seed and fee fields are private. This is because we should always use accessor methods to read data, since their values are represented by byte arrays.
When you access a field directly (config.seed
), the compiler may need to create a reference to that field's memory location, even temporarily. If that field is not properly aligned, creating the reference is undefined behavior, even if you never explicitly use the reference!
Accessor methods avoid this by performing the read operation within the method scope, where the compiler can optimize away any intermediate references.
#[repr(C, packed)] // This can cause unaligned fields!
pub struct PackedConfig {
pub state: u8,
pub seed: u64, // This u64 might not be 8-byte aligned due to packing
}
impl PackedConfig {
pub fn seed(&self) -> u64 {
self.seed // Safe: Direct value copy, no reference created
}
}
// Usage:
let config = PackedConfig::load(account)?;
// ❌ UNDEFINED BEHAVIOR: Creates a reference to potentially unaligned field
let seed_ref = &config.seed; // Compiler must create a reference here!
// ❌ UNDEFINED BEHAVIOR: Even this can be problematic
let seed_value = config.seed; // May create temporary reference internally
// ✅ SAFE: Accessor method reads value without creating reference
let seed_value = config.seed(); // No intermediate reference
In this case we don't have any "special types", but always remember that some types require extra care due to invalid bit patterns:
pub struct StateAccount {
pub is_active: bool,
pub state_type: StateType,
pub data: [u8; 32],
}
#[repr(u8)]
pub enum StateType {
Inactive = 0,
Active = 1,
Paused = 2,
}
impl StateAccount {
pub fn from_bytes(data: &[u8]) -> Result<Self, ProgramError> {
if data.len() < size_of::<Self>() {
return Err(ProgramError::InvalidAccountData);
}
// Safely handle bool (only 0 or 1 are valid)
let is_active = match data[0] {
0 => false,
1 => true,
_ => return Err(ProgramError::InvalidAccountData),
};
// Safely handle enum
let state_type = match data[1] {
0 => StateType::Inactive,
1 => StateType::Active,
2 => StateType::Paused,
_ => return Err(ProgramError::InvalidAccountData),
};
let mut data_array = [0u8; 32];
data_array.copy_from_slice(&data[2..34]);
Ok(Self {
is_active,
state_type,
data: data_array,
})
}
}
Dangerous Patterns to Avoid
Here are common patterns that can lead to undefined behavior and should be avoided:
- Using
transmute()
with Unaligned Data
// ❌ UNDEFINED BEHAVIOR: transmute requires proper alignment
let value: u64 = unsafe { core::mem::transmute(bytes_slice) };
transmute()
assumes the source data is properly aligned for the target type. If you're working with arbitrary byte slices, this assumption is often violated.
- Pointer Casting to Packed Structs
#[repr(C, packed)]
pub struct PackedConfig {
pub state: u8,
pub seed: u64, // This u64 is only 1-byte aligned!
pub authority: Pubkey,
}
// ❌ UNDEFINED BEHAVIOR: Creates references to unaligned fields
let config = unsafe { &*(data.as_ptr() as *const PackedConfig) };
let seed_value = config.seed; // UB: May create reference to unaligned u64
Even though the struct fits in memory, accessing multi-byte fields can create unaligned references.
- Direct Field Access on Packed Structs
#[repr(C, packed)]
pub struct PackedStruct {
pub a: u8,
pub b: u64,
}
let packed = /* ... */;
// ❌ UNDEFINED BEHAVIOR: Creates reference to unaligned field
let b_ref = &packed.b;
// ❌ UNDEFINED BEHAVIOR: May create temporary reference
let b_value = packed.b;
- Assuming Alignment Without Verification
// ❌ UNDEFINED BEHAVIOR: No alignment check
let config = unsafe { &*(data.as_ptr() as *const Config) };
Just because data fits doesn't mean it's aligned properly.
- Using
read_unaligned()
Incorrectly
// ❌ WRONG: read_unaligned needs proper layout, not just size
#[repr(Rust)] // Default layout - not guaranteed!
pub struct BadStruct {
pub field: u64,
}
let value = unsafe { (data.as_ptr() as *const BadStruct).read_unaligned() };
read_unaligned()
still requires the struct to have a predictable layout (#[repr(C)]
).
Writing Data
Writing data safely follows similar principles to reading:
Field-by-Field Serialization (Recommended)
impl Config {
pub fn write_to_buffer(&self, data: &mut [u8]) -> Result<(), ProgramError> {
if data.len() != Self::LEN {
return Err(ProgramError::InvalidAccountData);
}
let mut offset = 0;
// Write authority
data[offset..offset + 32].copy_from_slice(self.authority.as_ref());
offset += 32;
// Write mint_x
data[offset..offset + 32].copy_from_slice(self.mint_x.as_ref());
offset += 32;
// Write mint_y
data[offset..offset + 32].copy_from_slice(self.mint_y.as_ref());
offset += 32;
// Write seed
data[offset..offset + 8].copy_from_slice(&self.seed.to_le_bytes());
offset += 8;
// Write fee
data[offset..offset + 2].copy_from_slice(&self.fee.to_le_bytes());
offset += 2;
// Write state
data[offset] = self.state;
offset += 1;
// Write config_bump
data[offset] = self.config_bump;
Ok(())
}
}
This approach it's the safest method because explicitly serializes each field into the byte buffer:
- No alignment concerns: you're writing into a byte array
- Explicit endianness: you control byte order with to_le_bytes()
- Clear memory layout: easy to debug and understand
- No undefined behavior: all operations are on byte arrays
Direct Mutation (Zero-Copy)
For maximum performance, you can cast the byte buffer to a struct and mutate fields directly. This requires the struct to be properly aligned:
impl Config {
pub fn from_bytes_mut(data: &mut [u8]) -> Result<&mut Self, ProgramError> {
if data.len() != Self::LEN {
return Err(ProgramError::InvalidAccountData);
}
// Check alignment
if (data.as_ptr() as usize) % core::mem::align_of::<Self>() != 0 {
return Err(ProgramError::InvalidAccountData);
}
// SAFETY: We've verified length and alignment
Ok(unsafe { &mut *(data.as_mut_ptr() as *mut Self) })
}
}
When alignment is verified and the struct uses #[repr(C)]
, direct field mutation doesn't create unaligned references.
Byte Array Approach with Setters (Safest + Fast)
The best of both worlds: we can use byte arrays internally but provide ergonomic setters:
#[repr(C)]
pub struct ConfigSafe {
pub authority: Pubkey,
pub mint_x: Pubkey,
pub mint_y: Pubkey,
seed: [u8; 8],
fee: [u8; 2],
pub state: u8,
pub config_bump: u8,
}
impl ConfigSafe {
pub fn from_bytes_mut(data: &mut [u8]) -> Result<&mut Self, ProgramError> {
if data.len() != size_of::<Self>() {
return Err(ProgramError::InvalidAccountData);
}
// No alignment check needed - everything is u8 aligned
Ok(unsafe { &mut *(data.as_mut_ptr() as *mut Self) })
}
pub fn seed(&self) -> u64 {
u64::from_le_bytes(self.seed)
}
pub fn fee(&self) -> u16 {
u16::from_le_bytes(self.fee)
}
// Setters that handle endianness correctly
pub fn set_seed(&mut self, seed: u64) {
self.seed = seed.to_le_bytes();
}
pub fn set_fee(&mut self, fee: u16) {
self.fee = fee.to_le_bytes();
}
}
This is ideal because:
- No alignment issues: all fields are byte-aligned
- Fast direct mutation: no serialization overhead after initial setup
- Consistent endianness: setters handle byte order conversion
- Type safety: setters take the expected types, not byte arrays
Dynamically Sized Data
Whenever possible, avoid storing dynamically sized data directly in accounts. However, some use cases require it.
If your account contains dynamic data, always place all statically sized fields at the beginning of your struct, and append the dynamic data at the end.
Single Dynamic Field
This is the simplest case: one variable-length section at the end of your account:
#[repr(C)]
pub struct DynamicAccount {
pub fixed_data: [u8; 32],
pub counter: u64,
// Dynamic data follows after the struct in memory
// Layout: [fixed_data][counter][dynamic_data...]
}
impl DynamicAccount {
pub const FIXED_SIZE: usize = size_of::<Self>();
/// Safely parse account with dynamic data
pub fn from_bytes_with_dynamic(data: &[u8]) -> Result<(&Self, &[u8]), ProgramError> {
if data.len() < Self::FIXED_SIZE {
return Err(ProgramError::InvalidAccountData);
}
// SAFETY: We've verified the buffer is large enough for the fixed part
// The fixed part only contains [u8; 32] and u64, which have predictable layout
let fixed_part = unsafe { &*(data.as_ptr() as *const Self) };
// Everything after the fixed part is dynamic data
let dynamic_part = &data[Self::FIXED_SIZE..];
Ok((fixed_part, dynamic_part))
}
/// Get mutable references to both parts
pub fn from_bytes_mut_with_dynamic(data: &mut [u8]) -> Result<(&mut Self, &mut [u8]), ProgramError> {
if data.len() < Self::FIXED_SIZE {
return Err(ProgramError::InvalidAccountData);
}
// Split the buffer to avoid borrowing issues
let (fixed_bytes, dynamic_bytes) = data.split_at_mut(Self::FIXED_SIZE);
// SAFETY: We've verified the size and split safely
let fixed_part = unsafe { &mut *(fixed_bytes.as_mut_ptr() as *mut Self) };
Ok((fixed_part, dynamic_bytes))
}
}
/// Writing single dynamic field
impl DynamicAccount {
pub fn write_with_dynamic(
data: &mut [u8],
fixed_data: &[u8; 32],
counter: u64,
dynamic_data: &[u8]
) -> Result<(), ProgramError> {
let total_size = Self::FIXED_SIZE + dynamic_data.len();
if data.len() != total_size {
return Err(ProgramError::InvalidAccountData);
}
// Write fixed part field by field (safest approach)
data[0..32].copy_from_slice(fixed_data);
data[32..40].copy_from_slice(&counter.to_le_bytes());
// Write dynamic part
data[Self::FIXED_SIZE..].copy_from_slice(dynamic_data);
Ok(())
}
/// Update just the dynamic portion
pub fn update_dynamic_data(&mut self, account_data: &mut [u8], new_data: &[u8]) -> Result<(), ProgramError> {
if account_data.len() < Self::FIXED_SIZE + new_data.len() {
return Err(ProgramError::InvalidAccountData);
}
// Write new dynamic data
account_data[Self::FIXED_SIZE..Self::FIXED_SIZE + new_data.len()].copy_from_slice(new_data);
Ok(())
}
}
To avoid undefined behavior, always check that the account data buffer is at least as large as the statically sized portion. The dynamic section may be empty, so this check is essential.
This layout ensures that the offsets for fixed-size fields are always known, regardless of the dynamic data’s length.
There are two main scenarios when reading dynamically sized data:
- Single dynamic field at the end: You can easily determine the size and offset of the dynamic data at runtime like this:
const DYNAMIC_DATA_START_OFFSET: usize = size_of::<[u8; 32]>();
#[repr(C)]
pub struct DynamicallySizedAccount {
pub sized_data: [u8; 32],
// pub dynamic_data: &'info [u8], // Not part of the struct, but follows in the buffer
}
impl DynamicallySizedAccount {
/// Returns the length of the dynamic data section.
#[inline(always)]
pub fn get_dynamic_data_len(data: &[u8]) -> Result<usize, ProgramError> {
if data.len().le(&DYNAMIC_DATA_START_OFFSET) {
return Err(ProgramError::InvalidAccountData);
}
Ok(data.len() - DYNAMIC_DATA_START_OFFSET)
}
/// Returns a slice of the dynamic data.
#[inline(always)]
pub fn read_dynamic_data(data: &[u8]) -> Result<&[u8], ProgramError> {
if data.len().le(&DYNAMIC_DATA_START_OFFSET) {
return Err(ProgramError::InvalidAccountData);
}
Ok(&data[DYNAMIC_DATA_START_OFFSET..])
}
}
Multiple Dynamic Fields
This approach is more complex since we will need a way to determine the length of each dynamic field except the last. The most common approach is to prefix each dynamic field (except the last) with its length, so we can parse the buffer correctly.
Here’s a simple and robust pattern: store the length of the first dynamic field as a u8 (or u16, etc. if you need larger sizes) immediately after the statically sized data. The first dynamic field follows, and the second dynamic field occupies the remainder of the buffer.
#[repr(C)]
pub struct MultiDynamicAccount {
pub fixed_data: [u8; 32],
pub timestamp: u64,
// Layout: [fixed_data][timestamp][len1: u8][data1][data2: remainder]
}
impl MultiDynamicAccount {
pub const FIXED_SIZE: usize = size_of::<Self>();
pub const LEN_PREFIX_SIZE: usize = size_of::<u8>();
pub const MIN_SIZE: usize = Self::FIXED_SIZE + Self::LEN_PREFIX_SIZE;
/// Parse account with two dynamic sections
pub fn parse_dynamic_fields(data: &[u8]) -> Result<(&[u8; 32], u64, &[u8], &[u8]), ProgramError> {
if data.len() < Self::MIN_SIZE {
return Err(ProgramError::InvalidAccountData);
}
// Extract fixed data safely
let fixed_data = data[..32].try_into()
.map_err(|_| ProgramError::InvalidAccountData)?;
let timestamp = u64::from_le_bytes(
data[32..40].try_into()
.map_err(|_| ProgramError::InvalidAccountData)?
);
// Read length of first dynamic field (single byte)
let len = data[Self::FIXED_SIZE] as usize;
// Validate we have enough data
if data.len() < Self::MIN_SIZE + len {
return Err(ProgramError::InvalidAccountData);
}
let data_1 = &data[Self::MIN_SIZE..Self::MIN_SIZE + len];
let data_2 = &data[Self::MIN_SIZE + len..]; // Remainder
Ok((fixed_data, timestamp, data_1, data_2))
}
/// Write account with two dynamic sections
pub fn write_with_multiple_dynamic(
buffer: &mut [u8],
fixed_data: &[u8; 32],
timestamp: u64,
data_1: &[u8],
data_2: &[u8]
) -> Result<(), ProgramError> {
let total_size = Self::MIN_SIZE + data_1.len() + data_2.len();
if buffer.len() != total_size {
return Err(ProgramError::InvalidAccountData);
}
// Validate data_1 length fits in u8
if data_1.len() > u8::MAX as usize {
return Err(ProgramError::InvalidInstructionData);
}
let mut offset = 0;
// Write fixed data
buffer[offset..offset + 32].copy_from_slice(fixed_data);
offset += 32;
buffer[offset..offset + 8].copy_from_slice(×tamp.to_le_bytes());
offset += 8;
// Write length prefix for data1 (single byte)
buffer[offset] = data_1.len() as u8;
offset += 1;
// Write data1
buffer[offset..offset + data_1.len()].copy_from_slice(data_1);
offset += data_1.len();
// Write data2 (remainder - no length prefix needed)
buffer[offset..].copy_from_slice(data_2);
Ok(())
}
}
Resize the account
Every time you update a dynamic field, if the size changes, you must resize the account. Here’s a general-purpose function for resizing an account:
pub fn resize_account(
account: &AccountInfo,
payer: &AccountInfo,
new_size: usize,
zero_out: bool,
) -> ProgramResult {
// If the account is already the correct size, return early
if new_size == account.data_len() {
return Ok(());
}
// Calculate rent requirements
let rent = Rent::get()?;
let new_minimum_balance = rent.minimum_balance(new_size);
// Adjust lamports to meet rent-exemption requirements
match new_minimum_balance.cmp(&account.lamports()) {
core::cmp::Ordering::Greater => {
// Need more lamports for rent exemption
let lamports_diff = new_minimum_balance.saturating_sub(account.lamports());
**payer.try_borrow_mut_lamports()? -= lamports_diff;
**account.try_borrow_mut_lamports()? += lamports_diff;
}
core::cmp::Ordering::Less => {
// Return excess lamports to payer
let lamports_diff = account.lamports().saturating_sub(new_minimum_balance);
**account.try_borrow_mut_lamports()? -= lamports_diff;
**payer.try_borrow_mut_lamports()? += lamports_diff;
}
core::cmp::Ordering::Equal => {
// No lamport transfer needed
}
}
// Reallocate the account
account.resize(new_size)?;
Ok(())
}