Process Model¶
This document covers Lonala processes: lightweight execution units within realms, their memory model, scheduling, message passing, and garbage collection.
Process Characteristics¶
Lonala processes are modeled after BEAM/Erlang processes:
| Property | Description |
|---|---|
| Lightweight | ~1-10 µs to spawn, minimal memory overhead (~2-4 KB initial) |
| Isolated heap | Each process has its own heap, no shared mutable state |
| Mailbox | Each process has an incoming message queue |
| Fault isolation | Process crash doesn't affect other processes |
| Millions per realm | A single realm can host millions of processes |
Process Structure¶
PROCESS (Pure Userspace Construct)
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ Identity: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ pid: ProcessId - Unique within realm │ │
│ │ parent: ProcessId - Who spawned this process │ │
│ │ links: Set<ProcessId> - Bidirectional crash notification │ │
│ │ monitors: Set<MonitorRef> - Unidirectional monitoring │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Execution State: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ status: Running | Waiting | Exited │ │
│ │ reductions: u32 - Remaining budget (weighted costs) │ │
│ │ total_reductions: u64 - Lifetime reductions (monitoring) │ │
│ │ ip: usize - Instruction pointer │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Note: X registers (temporaries) are per-Worker, not per-Process. │
│ All processes on a worker share that worker's registers. Context │
│ switch saves/restores values via the process stack, not by │
│ copying registers between processes. │
│ │
│ Memory: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ young_heap: contiguous block (stack + young objects) │ │
│ │ old_heap: contiguous block (promoted objects) │ │
│ │ htop: young heap allocation pointer (grows up) │ │
│ │ stop: stack pointer (grows down) │ │
│ │ old_htop: old heap allocation pointer │ │
│ │ mbuf_list: linked list of heap fragments │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Communication: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ mailbox: Queue<Message> - Incoming messages │ │
│ │ waiting_pattern: Option<Pattern> - What we're waiting for │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Process Memory Model¶
This section describes the BEAM-style memory model used by Lonala processes. Understanding this model is essential for reasoning about performance, garbage collection, and memory usage.
Process Memory Overview¶
Each process owns two memory blocks: a young heap and an old heap.
PROCESS MEMORY OVERVIEW
════════════════════════════════════════════════════════════════════════════════
Each process has two separate memory blocks:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ YOUNG HEAP (where most activity happens): │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ STACK │ FREE │ YOUNG OBJECTS │ │
│ │ (grows down) │ SPACE │ (grows up) │ │
│ │ ▼ │ │ ▲ │ │
│ │ stop │ │ htop │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ OLD HEAP (promoted objects): │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ PROMOTED OBJECTS (survived minor GC) │ FREE │ │
│ │ │ SPACE │ │
│ │ │ ▲ │ │
│ │ │ old_htop │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ HEAP FRAGMENTS (temporary, from message passing): │
│ ┌──────────┐ ┌──────────┐ │
│ │ fragment │─►│ fragment │─► nil │
│ └──────────┘ └──────────┘ │
│ Consolidated into old heap during Minor GC │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Young Heap Structure¶
The young heap is a contiguous block containing both stack and young objects, growing toward each other:
YOUNG HEAP LAYOUT
════════════════════════════════════════════════════════════════════════════════
┌────────────────────────────────────────────────────────────────────────┐
│ │
│ STACK FREE YOUNG HEAP │
│ (grows down) SPACE (grows up) │
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ frame 2 │ │ tuple │ │
│ ├─────────┤ ├─────────┤ │
│ │ frame 1 │ │ pair │ │
│ ├─────────┤ ├─────────┤ │
│ │ frame 0 │◄─ stop htop ─►│ string │ │
│ └─────────┘ └─────────┘ │
│ │ │ │
│ ▼ ▲ │
│ grows toward ──────────────────────────────────── grows toward │
│ lower addresses higher addresses │
│ │
└────────────────────────────────────────────────────────────────────────┘
▲ ▲
│ │
heap (low address) hend (high address)
Key pointers:
┌────────────────────────────────────────────────────────────────────────┐
│ heap = base address of young heap (low address) │
│ hend = end address of young heap (high address) │
│ htop = current heap top (grows UP toward hend) │
│ stop = current stack pointer (grows DOWN toward heap) │
│ heap_size = hend - heap (total young heap size) │
└────────────────────────────────────────────────────────────────────────┘
Out of memory condition: htop >= stop
When this happens, Minor GC is triggered.
Why This Design?¶
This design, taken directly from BEAM, has several advantages:
- Simple allocation: Heap allocation is a pointer bump (O(1))
- Simple stack operations: Push/pop are pointer adjustments
- Fast Minor GC: Only young heap is scanned; old heap untouched
- Memory efficient: Stack and young heap share unused space
- Cache friendly: Related data is spatially close
- Generational hypothesis: Most objects die young, so most GC work is in the small young heap
Initial Heap Size¶
Processes start with a small heap to enable millions of lightweight processes:
INITIAL PROCESS MEMORY
════════════════════════════════════════════════════════════════════════════════
Default initial size: ~2 KB (configurable)
This small default allows:
- Spawning millions of processes with minimal memory
- Processes that allocate little stay small
- Heap grows only when needed
Spawn options can override:
(spawn-opt f {:min-heap-size 8192}) ; Start with 8 KB
Heap Growth Strategy¶
When a process needs more memory than available, the heap grows following BEAM's strategy:
HEAP GROWTH SEQUENCE
════════════════════════════════════════════════════════════════════════════════
Phase 1: Fibonacci-like growth (small heaps)
──────────────────────────────────────────────────────────────────────────────
233 → 377 → 610 → 987 → 1597 → 2584 → 4181 → 6765 → 10946 → ... (words)
In bytes (64-bit, 8 bytes/word):
~2KB → ~3KB → ~5KB → ~8KB → ~13KB → ~21KB → ~33KB → ~54KB → ~87KB → ...
Phase 2: 20% growth (large heaps, after ~1 million words / ~8 MB)
──────────────────────────────────────────────────────────────────────────────
8 MB → 9.6 MB → 11.5 MB → 13.8 MB → 16.6 MB → ...
Rationale: Fibonacci growth is too aggressive for large heaps.
20% growth balances memory efficiency with GC frequency.
Heap Growth Mechanism¶
When a process runs out of memory, the entire heap is reallocated:
HEAP REALLOCATION DURING GC
════════════════════════════════════════════════════════════════════════════════
Before (heap exhausted):
┌────────────────────────────────────────────────────────────────────────────┐
│ STACK STACK STACK ◄─stop htop─► HEAP HEAP HEAP HEAP HEAP │
│ ├─────────────────────────│────────────────────────────────┤ │
│ NO FREE SPACE │
│ (htop and stop have met) │
└────────────────────────────────────────────────────────────────────────────┘
│
▼ GC triggered
│
Step 1: Allocate NEW larger block (next size in growth sequence)
┌────────────────────────────────────────────────────────────────────────────┐
│ │
│ NEW BLOCK (larger) │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ EMPTY - READY FOR COPYING │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────┘
│
▼ Copy live data
│
Step 2: Copy ONLY LIVE data from old block to new block
┌────────────────────────────────────────────────────────────────────────────┐
│ │
│ OLD BLOCK NEW BLOCK │
│ ┌─────────────────────────┐ ┌────────────────────────────────┐ │
│ │ S │ S │ D │ L │ D │ L │ │ ───► │ S │ S │ │ L │ L │ │ │
│ │ │ │ │ │ │ │ │ copy │ │ │ FREE │ │ │ │ │
│ │ │ │ │ │ │ │ │ live │ │ │ │ │ │ │ │
│ └─────────────────────────┘ └────────────────────────────────┘ │
│ │ │
│ ▼ │
│ FREED S = Stack frames (always live) │
│ L = Live heap objects │
│ D = Dead objects (not copied) │
└────────────────────────────────────────────────────────────────────────────┘
Key points:
- Dead objects are NOT copied (garbage collected)
- Live objects are compacted (no fragmentation)
- Old block is freed after copy completes
- Process resumes with more free space
Process Isolation During GC¶
Critical design property: Each process GCs independently. No global pauses.
PER-PROCESS GARBAGE COLLECTION
════════════════════════════════════════════════════════════════════════════════
Process A Process B Process C
│ │ │
│ GC in progress │ executing │ executing
│ ████████████████ │ │
│ (A is paused) │ (B keeps running) │ (C keeps running)
│ │ │
│ │ GC in progress │
│ executing │ ████████████████ │ executing
│ (A keeps running) │ (B is paused) │ (C keeps running)
│ │ │
▼ ▼ ▼
NO COORDINATION between processes.
GC of Process A has ZERO impact on Process B or C.
Only the process being GCd is paused.
This is the key to Erlang/BEAM's soft real-time properties:
- System remains responsive during GC
- GC pauses are per-process (microseconds to milliseconds)
- No stop-the-world pauses
Memory Allocation Architecture¶
This section describes how memory is allocated for process heaps, following BEAM's per-scheduler allocator design.
Per-Worker Allocator Instances¶
To avoid lock contention in multi-worker realms, each worker has its own allocator instance:
PER-WORKER ALLOCATOR ARCHITECTURE
════════════════════════════════════════════════════════════════════════════════
PROCESS POOL MEMORY REGION
(from realm's VSpace)
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ WORKER 0 │ │ WORKER 1 │ │ WORKER 2 │
│ ALLOCATOR │ │ ALLOCATOR │ │ ALLOCATOR │
│ INSTANCE │ │ INSTANCE │ │ INSTANCE │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Carrier 0 │ │ │ │ Carrier 0 │ │ │ │ Carrier 0 │ │
│ │ [blocks] │ │ │ │ [blocks] │ │ │ │ [blocks] │ │
│ ├───────────┤ │ │ ├───────────┤ │ │ ├───────────┤ │
│ │ Carrier 1 │ │ │ │ Carrier 1 │ │ │ │ Carrier 1 │ │
│ │ [blocks] │ │ │ │ [blocks] │ │ │ │ [blocks] │ │
│ └───────────┘ │ │ └───────────┘ │ └───────────┘ │ │
│ │ │ │ │ │
│ LOCK-FREE │ │ LOCK-FREE │ │ LOCK-FREE │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
▼ ▼ ▼
Processes on Processes on Processes on
Worker 0 Worker 1 Worker 2
Each worker's allocator is INDEPENDENT:
- Allocations are lock-free within a worker
- No contention between workers
- Processes running on a worker use that worker's allocator
Carriers¶
Allocators manage memory in large chunks called carriers:
CARRIER STRUCTURE
════════════════════════════════════════════════════════════════════════════════
Carriers are large memory regions obtained from the process pool.
Each carrier contains multiple allocation blocks.
┌─────────────────────────────────────────────────────────────────────────────┐
│ CARRIER (e.g., 1 MB) │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ HEADER │ Block 0 │ Block 1 │ FREE │ Block 2 │ FREE │ │
│ │ │ (P1 heap)│ (P7 heap)│ │ (P3 heap)│ │ │
│ └────────┴──────────┴──────────┴──────────┴──────────┴─────────────────┘ │
│ │
│ Blocks within a carrier: │
│ - Allocated to individual process heaps │
│ - Variable sizes (match heap size requests) │
│ - Freed when process terminates or heap shrinks │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Carrier types:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ MULTIBLOCK CARRIER: Contains multiple smaller blocks │
│ - Used for typical process heaps │
│ - Efficient for small to medium allocations │
│ │
│ SINGLEBLOCK CARRIER: Contains one large block │
│ - Used for very large process heaps (above threshold) │
│ - Dedicated carrier per large heap │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Allocation Path (Lock-Free)¶
The common case - allocating memory for a process - requires no locks:
ALLOCATION PATH
════════════════════════════════════════════════════════════════════════════════
Process P (running on Worker W) needs to grow heap:
1. GC determines: "I need N bytes for new heap"
2. Request goes to Worker W's allocator instance
└── This is the SAME worker running P
└── No other worker touches this allocator
└── LOCK-FREE operation
3. Allocator checks its carriers:
└── Has space in existing carrier? → Allocate from carrier
└── No space? → Get new carrier from pool
4. Return memory to process P
No locks needed for steps 1-4 (common case).
Only rare operations require coordination:
- Getting new carrier from pool (page fault handler)
- Carrier migration between workers (load balancing)
- Cross-worker deallocation (process migrated between workers)
Carrier Migration¶
When workers become imbalanced, carriers can migrate:
CARRIER MIGRATION (Rare Operation)
════════════════════════════════════════════════════════════════════════════════
Scenario: Worker 0 has many free carriers, Worker 1 needs more
Worker 0 (lots of free carriers) Worker 1 (needs carriers)
┌─────────────────────────────────┐ ┌─────────────────────────────────┐
│ Carrier A [25% used] │ │ Carrier X [95% used] │
│ Carrier B [10% used] ◄─────────┼──┼─ "I'll take this one" │
│ Carrier C [50% used] │ │ │
└─────────────────────────────────┘ └─────────────────────────────────┘
Process:
1. Worker 1 notices it needs more carriers
2. Worker 1 checks for "abandoned" carriers from other workers
3. Worker 0 had marked Carrier B as "abandonable" (low utilization)
4. Worker 1 adopts Carrier B (atomic pointer swap)
5. Future allocations in Worker 1 can use Carrier B
This is RARE - only happens during load imbalance.
Normal allocations remain lock-free.
Heap Fragments (M-Bufs)¶
When a process cannot write directly to another process's heap, it uses heap fragments:
HEAP FRAGMENTS (M-BUFS)
════════════════════════════════════════════════════════════════════════════════
Scenario: Process A sends message to Process B
Option 1: B's heap lock available (common case)
──────────────────────────────────────────────────────────────────────────────
A acquires B's heap lock
A copies message DIRECTLY into B's heap
A releases lock
Message is immediately part of B's heap
Option 2: B's heap lock busy (contention case)
──────────────────────────────────────────────────────────────────────────────
A cannot acquire B's heap lock
A allocates a HEAP FRAGMENT (m-buf) outside B's main heap
A copies message into the fragment
A links fragment to B's fragment list (atomic operation)
A continues without waiting
Fragment structure:
┌─────────────────────────────────────────────────────────────────────────┐
│ HEAP FRAGMENT │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ next: *Fragment │ size: usize │ MESSAGE DATA │ │
│ └─────────────────┴─────────────┴───────────────────────────────────┘ │
│ │
│ Linked to B's mbuf_list (process field) │
└─────────────────────────────────────────────────────────────────────────┘
Process B's view:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ Main Heap Block Heap Fragments │
│ ┌─────────────────────────────┐ ┌──────────────┐ │
│ │ STACK │ │ HEAP │ │ Fragment 1 │ │
│ │ │ │ │ │ (msg from A) │ │
│ │ │ │ │ ◄────│ next ────────┼──► Fragment 2 │
│ │ │ │ │ │ │ (msg from C) │
│ └─────────────┴──────┴────────┘ └──────────────┘ │
│ │
│ Fragments are considered part of young generation (above high_water) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Fragment consolidation (during GC):
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ Before GC: │
│ Main heap + Fragment 1 + Fragment 2 (separate memory regions) │
│ │
│ During GC: │
│ Live data from main heap + fragments copied to NEW heap block │
│ │
│ After GC: │
│ Single contiguous heap block (fragments freed, data consolidated) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Benefits of heap fragments:
- Senders never block waiting for receiver's heap lock
- Reduces lock contention in high-throughput scenarios
- Fragments consolidated during normal GC (no extra passes)
Generational Garbage Collection¶
Lonala uses a generational copying collector, following BEAM's design.
Generations¶
GENERATIONAL GC MODEL
════════════════════════════════════════════════════════════════════════════════
Process heap is divided into generations by the HIGH_WATER mark:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ STACK │ │ OLD GENERATION │ YOUNG GENERATION │ FREE │ │
│ │ │ │ (below h_water) │ (above h_water) │ │ │
│ │ ───► │ │ │ │ ◄─── │ │
│ └───────┴──────────┴─────────────────┴──────────────────┴─────────────┘ │
│ ▲ ▲ ▲ ▲ │
│ stop high_water (recent) htop │
│ │
│ Objects below high_water: survived at least one GC (old generation) │
│ Objects above high_water: recently allocated (young generation) │
│ Heap fragments: always considered young generation │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Two-Heap Architecture¶
Each process has two memory blocks: a young heap and an old heap:
PROCESS MEMORY: TWO-HEAP ARCHITECTURE
════════════════════════════════════════════════════════════════════════════════
Young Heap (stack + young objects):
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ STACK FREE YOUNG OBJECTS │
│ (grows down) SPACE (grows up) │
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ frame 1 │ │ new obj │ │
│ ├─────────┤ ├─────────┤ │
│ │ frame 0 │◄─ stop htop ─►│ new obj │ │
│ └─────────┘ └─────────┘ │
│ │
│ Out of memory when htop >= stop → triggers Minor GC │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Old Heap (promoted objects, separate block):
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ PROMOTED OBJECTS FREE │
│ (survived minor GC) SPACE │
│ │
│ ┌──────────┬──────────┬──────────┬──────────┐ │
│ │ promoted │ promoted │ promoted │ │◄─ old_htop │
│ │ (GC N-3) │ (GC N-2) │ (GC N-1) │ │ │
│ └──────────┴──────────┴──────────┴──────────┘ │
│ │
│ Never touched during Minor GC │
│ Collected only during Major GC (fullsweep) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Benefits of two heaps:
- Minor GC only touches young heap (fast, ~10-100 µs)
- Old heap grows independently
- Better cache behavior (young data is hot, old data is cold)
- Heap fragments are considered young (consolidated during GC)
Minor GC (Young Generation Only)¶
Minor GC runs when the young heap is exhausted (htop meets stop). It only processes young objects.
MINOR GC
════════════════════════════════════════════════════════════════════════════════
Triggered when: htop >= stop (young heap exhausted)
Scope: Young heap + heap fragments only. Old heap is NOT touched.
Process:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ 1. Scan roots (stack, registers, process dictionary) │
│ │
│ 2. For each pointer to YOUNG object: │
│ - Copy object to OLD HEAP (promotion) │
│ - Update pointer to new location in old heap │
│ - Object is now "tenured" │
│ │
│ 3. For pointers to OLD objects: │
│ - No action needed (already in old heap) │
│ │
│ 4. Copy live data from heap fragments to old heap │
│ │
│ 5. Reset young heap: htop = heap_start │
│ - All young space is now free for new allocations │
│ - Dead young objects are implicitly reclaimed │
│ │
│ 6. Free heap fragments │
│ │
│ 7. If old heap needs more space: grow old heap │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
BEFORE Minor GC:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Young Heap: [STACK STACK]◄─stop htop─►[live][dead][live][dead] │
│ NO FREE SPACE (htop >= stop) │
│ │
│ Old Heap: [promoted][promoted][promoted][ FREE ] │
└─────────────────────────────────────────────────────────────────────────────┘
AFTER Minor GC:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Young Heap: [STACK STACK]◄─stop [ ALL FREE ]◄─htop │
│ Young heap reset, ready for new allocations │
│ │
│ Old Heap: [promoted][promoted][promoted][new][new][ FREE ] │
│ Live young objects promoted to old heap │
└─────────────────────────────────────────────────────────────────────────────┘
Why minor GC is fast:
- Only scans young heap (typically small)
- Most objects die young (generational hypothesis) - no copy needed
- Old heap not scanned or touched
- Typical pause: 10-100 microseconds
Major GC (Fullsweep)¶
Major GC collects both heaps. It runs less frequently but reclaims dead objects in the old generation.
MAJOR GC (FULLSWEEP)
════════════════════════════════════════════════════════════════════════════════
Triggered when:
- Old heap is too large relative to live data
- After N minor GCs without fullsweep (configurable)
- Explicit (gc :full) call
Scope: BOTH young heap AND old heap
Process:
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ 1. Allocate NEW young heap block (may be larger if needed) │
│ │
│ 2. Scan ALL roots (stack, registers, process dictionary) │
│ │
│ 3. Copy ALL live objects (from both young AND old heaps) to new heap │
│ - Compacts all live data into contiguous space │
│ - Dead objects in old generation are reclaimed │
│ │
│ 4. Free old young heap block │
│ │
│ 5. Free old heap block (all data now in new young heap) │
│ │
│ 6. Allocate fresh empty old heap │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
BEFORE Major GC:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Young Heap: [STACK][young live][young dead] │
│ Old Heap: [old live][old dead][old live][old dead][old live] │
│ ▲ Dead objects accumulate in old heap │
└─────────────────────────────────────────────────────────────────────────────┘
AFTER Major GC:
┌─────────────────────────────────────────────────────────────────────────────┐
│ New Young Heap: [STACK][all live data compacted][ FREE ] │
│ New Old Heap: [ EMPTY ] │
│ │
│ All live data now in young heap, fresh start │
│ Dead objects from both generations reclaimed │
└─────────────────────────────────────────────────────────────────────────────┘
Major GC characteristics:
- More expensive than minor GC (scans everything)
- Reclaims dead objects in old generation
- Compacts all live data (better locality)
- Typical pause: 100 microseconds to several milliseconds
- Frequency: much less often than minor GC
Large Binary Handling¶
Large binaries (>64 bytes) are handled specially to avoid copying overhead:
LARGE BINARY HANDLING
════════════════════════════════════════════════════════════════════════════════
Small binaries (< 64 bytes):
- Stored directly in process heap
- Copied when sent in messages
- GC'd with other heap objects
Large binaries (>= 64 bytes):
- Stored in REALM-WIDE binary heap (separate from process heaps)
- Reference counted
- NOT copied when sent in messages
┌─────────────────────────────────────────────────────────────────────────────┐
│ │
│ Process A heap: Realm Binary Heap: │
│ ┌───────────────┐ ┌─────────────────────────────────────────┐ │
│ │ BinaryRef ────┼───────────►│ refcount: 2 │ │
│ │ (8 bytes) │ │ size: 10 MB │ │
│ └───────────────┘ │ data: [................] │ │
│ └─────────────────────────────────────────┘ │
│ Process B heap: ▲ │
│ ┌───────────────┐ │ │
│ │ BinaryRef ────┼────────────────────────┘ │
│ │ (8 bytes) │ │
│ └───────────────┘ │
│ │
│ A sends binary to B: │
│ - Only BinaryRef is copied (8 bytes) │
│ - refcount incremented (2) │
│ - 10 MB data NOT copied │
│ │
│ When A or B no longer references binary: │
│ - refcount decremented │
│ - When refcount reaches 0, binary is freed │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Benefits:
- Zero-copy message passing for large data
- Process heaps stay small (fast GC)
- Efficient sharing of read-only data
- Maintains "no shared mutable state" (binaries are immutable)
Scheduling¶
Lonala uses a hybrid cooperative/preemptive scheduling model.
Reduction-Based Scheduling¶
Within a realm, processes yield cooperatively after executing a certain number of "reductions" (bytecode instructions):
REDUCTION COUNTING
════════════════════════════════════════════════════════════════════════════════
const MAX_REDUCTIONS: u32 = 2000; // Tune for ~500µs (MCS budget)
fn run_process(proc: &mut Process) -> RunResult {
while proc.reductions > 0 {
let instruction = fetch(proc);
match execute(proc, instruction) {
Continue => proc.reductions -= 1,
Yield => return RunResult::Yielded,
Block(reason) => return RunResult::Blocked(reason),
Exit(reason) => return RunResult::Exited(reason),
}
}
// Out of reductions - yield to other processes
RunResult::Yielded
}
Scheduler Loop¶
VM SCHEDULER LOOP
════════════════════════════════════════════════════════════════════════════════
loop {
// Pick next runnable process
proc = run_queue.pop_front()
if proc is None {
// No runnable processes - check mailboxes, maybe idle
check_timeouts()
if run_queue.is_empty() {
wait_for_event() // Block until message/timeout/signal
}
continue
}
// Reset reduction counter
proc.reductions = MAX_REDUCTIONS
// Run the process
result = run_process(proc)
match result {
Yielded =>
// Process used its time slice, re-queue
run_queue.push_back(proc)
Blocked(mailbox) =>
// Process is waiting for a message
waiting_processes.insert(proc.pid, proc)
Exited(reason) =>
// Process terminated
notify_links(proc, reason)
notify_monitors(proc, reason)
cleanup(proc)
}
}
Scheduling Layers¶
THREE SCHEDULING LAYERS
════════════════════════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────────────┐
│ Layer 3: INTRA-REALM PROCESS SCHEDULING │
│ │
│ Mechanism: Reduction counting + run queue │
│ Granularity: ~1ms time slices │
│ Fairness: Round-robin among runnable processes │
│ Control: Lona VM (userspace) │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ Layer 2: INTRA-REALM WORKER SCHEDULING (if multiple workers) │
│ │
│ Mechanism: Work stealing between workers │
│ Granularity: Per-process │
│ Fairness: Load balancing across CPUs │
│ Control: Lona VM (userspace) │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ Layer 1: INTER-REALM SCHEDULING │
│ │
│ Mechanism: seL4 MCS scheduler │
│ Granularity: Per-realm CPU budgets │
│ Fairness: Policy-defined (min/max budgets) │
│ Control: seL4 kernel (hardware-enforced) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key insight: Processes cooperate within a realm (reductions),
but the kernel preempts realms (MCS budgets). A misbehaving
process can't starve its realm's other processes, and a
misbehaving realm can't starve other realms.
Work Stealing (Multi-Worker)¶
If a realm has multiple workers (TCBs), they can steal work from each other:
WORK STEALING (Optional, if multiple workers per realm)
════════════════════════════════════════════════════════════════════════════════
Each worker has a local run queue:
- Owner pushes to back, pops from front (FIFO - round-robin fairness)
- Thieves steal from back (most recently added)
fn worker_loop(worker_id):
loop {
// Try local queue first
proc = local_queue.pop_front()
if proc is None {
// Try stealing from other workers
proc = steal_from_others(worker_id)
}
if proc is None {
// All queues empty - wait for work
wait_for_work()
continue
}
run_process(proc)
}
fn steal_from_others(thief_id) -> Option<Process>:
// Random start to avoid thundering herd
start = random() % num_workers
for i in 0..num_workers:
victim = (start + i) % num_workers
if victim == thief_id:
continue
if let Some(proc) = workers[victim].queue.steal_top():
return Some(proc)
None
Message Passing¶
Processes communicate exclusively through message passing. No shared mutable state.
Intra-Realm Messages (Fast Path)¶
Messages within the same realm use deep copy to the receiver's heap:
INTRA-REALM MESSAGE PASSING
════════════════════════════════════════════════════════════════════════════════
Cost: ~100-500 ns (deep copy)
sender receiver
│ │
│ (send receiver-pid [:ok data]) │
│ │
├──────────────────────────────────►│
│ │
│ 1. Try to acquire receiver's │
│ heap lock │
│ │
│ 2a. If lock acquired: │
│ Deep copy to receiver's heap │
│ (direct write) │
│ │
│ 2b. If lock busy: │
│ Allocate heap fragment │
│ Copy to fragment │
│ Link fragment to receiver │
│ │
│ 3. Enqueue in receiver's │
│ mailbox (lock-free MPSC) │
│ │
│ 4. If receiver waiting, │
│ wake it up │
│ │
Deep copy ensures:
- No shared mutable state
- Receiver owns message completely
- Sender can modify original after send
- GC of sender doesn't affect receiver
Inter-Realm Messages (Kernel Path)¶
Messages between realms require seL4 IPC and serialization:
INTER-REALM MESSAGE PASSING
════════════════════════════════════════════════════════════════════════════════
Cost: ~1-10 µs (serialization + IPC)
Realm A seL4 Kernel Realm B
│ │ │
│ (send pid [:ok data]) │ │
│ │ │
│ 1. Serialize message │ │
│ to IPC buffer │ │
│ │ │
├─────seL4_Call──────────►│ │
│ │ │
│ ├──────seL4_Recv───────►│
│ │ │
│ │ 2. Deserialize from │
│ │ IPC buffer │
│ │ │
│ │ 3. Deep copy to │
│ │ receiver's heap │
│ │ │
Inter-realm IPC requires:
- Realm A has endpoint capability for Realm B
- Serialization format (TBD - likely compact binary)
- Deserialization + deep copy on receive
Mailbox Implementation¶
Each process has a lock-free MPSC (multiple-producer, single-consumer) queue:
MAILBOX (Lock-Free MPSC Queue)
════════════════════════════════════════════════════════════════════════════════
struct Mailbox {
head: AtomicPtr<Message>, // Producers push here
tail: *mut Message, // Consumer pops here
}
struct Message {
next: *mut Message,
sender: ProcessId,
data: Value, // Deep-copied to receiver's heap
}
Push (any process can send):
────────────────────────────────────────────────────────────────────────────
fn push(mailbox, msg):
msg.next = null
prev = atomic_exchange(&mailbox.head, msg)
prev.next = msg // Linearization point
Pop (only owner process):
────────────────────────────────────────────────────────────────────────────
fn pop(mailbox) -> Option<Message>:
tail = mailbox.tail
next = tail.next
if next is null:
return None
mailbox.tail = next
return Some(next)
Selective Receive¶
Processes can wait for messages matching a pattern:
;; Wait for specific message pattern
(receive
[:ok result] (handle-success result)
[:error reason] (handle-error reason)
:timeout 5000 (handle-timeout))
;; Messages not matching patterns stay in mailbox
;; for later receive calls (selective receive)
SELECTIVE RECEIVE
════════════════════════════════════════════════════════════════════════════════
Process mailbox: [msg1] [msg2] [msg3] [msg4]
(receive [:ok result] ...)
1. Check msg1 against pattern [:ok result]
- No match, skip
2. Check msg2 against pattern [:ok result]
- Match! Extract result, remove msg2
3. Return to process with bound 'result'
Mailbox after: [msg1] [msg3] [msg4]
(msg2 was consumed)
Non-matching messages remain for future receives.
Process Linking and Monitoring¶
Processes can be notified when other processes exit.
Links (Bidirectional)¶
PROCESS LINKS
════════════════════════════════════════════════════════════════════════════════
(spawn-link (fn [] (worker-loop)))
Creates bidirectional link:
Process A ◄────link────► Process B
If A crashes:
B receives exit signal (crashes too, unless trapping)
If B crashes:
A receives exit signal (crashes too, unless trapping)
Use case: Supervisor trees, coordinated shutdown
Monitors (Unidirectional)¶
PROCESS MONITORS
════════════════════════════════════════════════════════════════════════════════
(spawn-monitor (fn [] (worker-loop)))
Creates unidirectional monitor:
Process A ────monitors────► Process B
If B crashes:
A receives [:DOWN ref pid reason] message
If A crashes:
Nothing happens to B (unidirectional)
Use case: Watching without crashing together
Exit Signals¶
EXIT SIGNAL PROPAGATION
════════════════════════════════════════════════════════════════════════════════
Process exits with reason:
Normal exit (:normal):
- Links NOT notified (clean shutdown)
- Monitors receive [:DOWN ref pid :normal]
Crash exit (:error, exception, etc.):
- Links receive exit signal
- Linked processes crash (unless trapping exits)
- Monitors receive [:DOWN ref pid reason]
Trapping exits:
(process-flag :trap-exit true)
- Converts exit signals to messages
- Process receives [:EXIT from-pid reason]
- Used by supervisors to handle child crashes
Differences from BEAM¶
Lonala follows BEAM's process model closely. The key differences are due to running on seL4 rather than a traditional OS:
| Aspect | BEAM/Erlang | Lonala |
|---|---|---|
| Kernel | OS process on Linux/Windows | seL4 microkernel (formally verified) |
| Isolation boundary | OS process | Realm (seL4 protection domain) |
| Inter-node messaging | Erlang distribution protocol | seL4 IPC + serialization |
| CPU scheduling | BEAM scheduler only | Three layers: seL4 MCS → workers → processes |
| Memory protection | None between processes | Realm boundaries enforced by hardware |
| Hot code loading | Full OTP support | Var rebinding (simpler, no code_change callbacks) |
The core process semantics are identical: - Per-process heap with generational GC - Message passing with deep copy - Links and monitors - Selective receive - Reduction-based preemption
Summary¶
| Aspect | Description |
|---|---|
| Process creation | ~1-10 µs, pure userspace |
| Initial heap size | ~2 KB (configurable, enables millions of processes) |
| Heap structure | Two heaps: young (stack + objects) and old (promoted) |
| Heap growth | Fibonacci sequence then 20% increments |
| Memory allocation | Per-worker allocator instances, lock-free for common case |
| GC model | Generational copying, per-process, no global pauses |
| Minor GC | Promotes live young objects to old heap, ~10-100 µs pause |
| Major GC | Full sweep of both heaps, ~100 µs - few ms pause |
| Large binaries | Reference counted, stored in realm-wide binary heap |
| Message passing | Deep copy (heap fragments if contention), lock-free mailbox |
| Intra-realm latency | ~100-500 ns (deep copy) |
| Inter-realm latency | ~1-10 µs (serialization + IPC) |
| Scheduling | Reduction-based cooperative + MCS preemptive |
| Fault isolation | Crash affects only linked processes |