The Problem: LLM Wait Time
AI coding agents spend most of their time waiting for LLM API responses. Keeping a KVM guest alive, consuming active
RAM and CPU scheduler cycles during this network I/O downtime, is a significant waste of infrastructure resources.
The Solution: What I Built
I built a Rust-based micro-VMM that instantly freezes an executing KVM guest, extracts its state, and flushes memory
directly to block storage via 'io_uring' with zero-copy.
When the LLM responds, the VM wakes up and resumes execution exactly where it left off.
suspend_storage — live simulation
Boot
Run
Suspend
io_uring Flush
Restore
Verify
KVM vCPU — real-mode payload
RIP0x0000
RAX0x0000
RBX0x0000
RCX0x0000
MEM 64 KB — zeroed
io_uring
vm.snapshot — block storage
0 bytes written
The Stack
Minimal VMM
Built with kvm-ioctls and kvm-bindings against /dev/kvm. Guest RAM is explicitly allocated via mmap.
State Extraction
Uses KVM_GET_REGS and KVM_GET_SREGS to serialize architecture states and define guest memory boundaries for deterministic pausing.
Zero-Copy I/O
Bypasses userspace buffering entirely. Uses IORING_OP_WRITE_FIXED (DMA) to submit guest memory pages directly from the VMM's mmap'ed physical memory to storage.
Verification
The guest runs a bare-metal 16-bit real-mode payload that constantly mutates registers, mathematically proving exact-state resumption post-wake.
Why This Matters for Indexable
This solves the hyper-density problem for AI agents. Conceptually, the
'io_uring' DMA approach used
here to write directly to block storage mirrors how ix.dev pushes state over libfabric/RDMA to its
content-addressable storage layer. It is built entirely in safe, idiomatic Rust with strict clippy configurations,
designed for modern Linux kernels.