Sidecar Context Architectures for Model Portability
A systems design problem: how do you maintain persistent state across stateless inference runtimes? This is less about AI and more about building durable architectures when your compute layer is fundamentally ephemeral.
The Statefulness Problem
Modern inference runtimes are stateless by design. Every request reconstructs context from scratch. This is the same pattern we solved in web services with session stores, caches, and databases. Why are we relearning it?
The Problem
When you swap between different models locally, all accumulated context evaporates. Retrieval systems fetch fragments. Memory systems inject tokens. But neither maintains relational structure across runtime switches.
Sidecar as a Persistence Layer
What if there was a persistent graph that lives alongside your local models? A sidecar architecture that maintains entity state, timeline, and retrieval policies independent of which runtime is active.
The Architecture
User ↓ Inference Runtime (Ollama) ↓ Context Sidecar ├── Semantic Graph ├── Entity Memory ├── Timeline State └── Retrieval Policies
The sidecar intercepts requests, enriches context from the graph, and persists new relationships back. Runtime-agnostic. Local-first. The graph survives runtime switches, updates, even complete swaps.
Retrieval vs Structure
Vector similarity finds related chunks but doesn't encode why they're related. The graph stores relationships explicitly: causality, temporal ordering, entity connections. Any runtime can reason over structure, not just surface similarity.
This is a systems architecture thesis. Local-first compute needs persistent identity infrastructure. The runtimes become interchangeable; the context layer becomes the product.



