Technical Report · July 2026

Ody: A Preconditioning Substrate for Low-Latency, Low-Cost Agentic Workflows via Dual-Layer Recall and Skill-Based Environments

Ody Research Group·ody.ai

Abstract

With the launch of Claude 5 Sonnet on July 1, 2026, the frontier pricing baseline has shifted upward: Claude 5 Sonnet's per-token cost is double that of its predecessor at $6/M input tokens and $30/M output tokens, while its reasoning capabilities demand deeper context exploration — widening the token tax on reactive workflows. We present Ody, a preconditioning substrate that decouples context acquisition from model inference by pre-loading workspace schema, dependency graphs, and rule sets through a dual-layer recall architecture. The substrate comprises (i) a Layer 1 human-readable summary cache, (ii) a Layer 2 token-optimized structuredContent payload with confidence and importance metadata, and (iii) a persistent knowledge graph backend. By preconditioning the context window before any interaction, Ody enables Claude Haiku — a lightweight, low-cost model — to outperform the raw Claude 5 Sonnet baseline by 83% in token efficiency and 99.3% in cost per task, collapsing the apparent intelligence gap through substrate quality rather than model scale.

1 Introduction

The operational cost of agentic coding workflows is dominated not by model inference but by context acquisition. When a model like Claude 5 Sonnet receives a debugging task in an unfamiliar codebase, it must first explore the directory structure (ls -R), search for relevant patterns (grep -r), read file contents (cat), and reconstruct the dependency graph from scratch. This reactive consumption typically requires 12,000+ input tokens per task and incurs 45+ seconds of wall-clock latency, during which the model's context window fills with rapidly-expired intermediate states.

This phenomenon — which we term engineering inflation — scales superlinearly with codebase size. A 200-file repository imposes approximately 2.3x the context acquisition cost of a 50-file repository, not the expected 4x, due to combinatorial dependency exploration. At scale, the cognitive tax of context reconstruction becomes the dominant cost of ownership, eclipsing both model inference and developer salary.

The July 1, 2026 launch of Claude 5 Sonnet has substantially deepened this problem. While Claude 5 Sonnet represents a breakthrough in reasoning capability, its pricing structure — $6/M input tokens and $30/M output tokens, double the rate of Claude 3.5 Sonnet — imposes a significantly higher token tax on reactive workflows. Moreover, Claude 5 Sonnet's improved reasoning drives it to explore more deeply: our benchmarks show a 14% increase in mean exploration tokens per task compared to the previous generation, as the model pursues more thorough dependency and edge-case analysis before committing to a fix. In a codebase of 347 files, raw Claude 5 Sonnet consumes approximately $0.18 per task — a cost that compounds rapidly across dozens of daily invocations per developer.

We introduce Ody, a preconditioning substrate that inverts this model. Rather than acquiring context reactively on each interaction, Ody maintains a persistent, pre-computed representation of the workspace — schema definitions, entity relationships, pattern documents, and architectural decisions — organized across two recall layers with a backing knowledge graph. When a skill command (e.g., /ody-onboard) is invoked, Ody injects this preconditioned context directly into the model's context window before any user prompt is processed. The result is a hot-read interaction pattern: every query starts with a fully hydrated mental model of the system, eliminating the cold-start tax.

Critically, Ody's role as a context filter becomes more valuable as frontier models grow more expensive. Claude 5 Sonnet's deeper reasoning is wasted when it must re-discover the dependency graph on every invocation. By pre-loading only the highest-importance entities (top-20 by Layer 2 importance score survive any truncation), Ody ensures that the model's context budget is spent on the actual task, not on context acquisition. The key insight is that a smaller model (Claude Haiku) operating on a richer, preconditioned input can outperform a frontier model (Claude 5 Sonnet) operating on raw, unprocessed input — at a fraction of the token and dollar cost. We formalize this as the Haiku-Sonnet arbitrage and demonstrate a 144x cost reduction and 83% token reduction across standardized debugging tasks.

2 System Architecture

Ody is organized as a five-layer stack. Each layer encapsulates a distinct concern: environment adaptation, skill dispatch, recall generation, persistent storage, and model execution. Figure 1 presents the full architecture.

Figure 1

Ody system architecture. Agent environments connect through skill entry points to the Ody kernel, which generates preconditioning context from a dual-layer recall cache backed by a persistent knowledge graph. The preconditioned context is then routed to either the Haiku or Claude 5 Sonnet execution path.

2.1 Layer 0: Agent Environments

Ody is environment-agnostic by design. Three agent environments are currently supported: Claude Code (CLI-native, full skill support), Cursor (IDE-integrated, /ody-onboard and /ody-init), and Codex (autonomous, /ody-onboard and /ody-sync). Each environment adapter translates Ody's internal skill representation into the target environment's context injection mechanism — system prompt prefix for Claude Code, .cursorrules injection for Cursor, and API parameter modification for Codex.

2.2 Layer 1: Ody Skills

Four slash-command skills form the user-facing interface. /ody-onboard performs full preconditioning substrate injection; /ody-init initializes workspace state from the knowledge graph for fresh sessions; /ody-sync bidirectionally synchronizes the local layer cache with the remote graph; /ody-hooks installs git hooks for continuous, event-driven preconditioning without manual invocation.

2.3 Layer 2: Ody Kernel

The kernel is the core of the preconditioning substrate. It maintains two recall layers with distinct representations:

Layer 1: Human-Readable Recall stores markdown summaries of files, modules, patterns, decisions, and architecture notes. These are optimized for both human reading and model consumption, and are generated from the knowledge graph using a configurable template system. Retrieval is driven by Cypher queries filtered on entity importance, with an adaptive threshold that balances recall against context budget.
Layer 2: StructuredContent stores token-optimized structured payloads assembled through a four-stage parsing pipeline: (1) JSON validation — each raw entity payload is validated against a typed schema that enforces required fields (id, type, summary, dependencies, source meta), rejects malformed records, and normalizes date fields to ISO 8601; (2) confidence scoring — a composite score in [0.0, 1.0] computed as the weighted product of source reliability (0.4 weight), data staleness (0.3 weight, decaying exponentially from 1.0 to 0.3 over 30 days), and corroboration count (0.3 weight, saturating at 5+ corroborating entities); (3) importance weighting — a rank in [1, 100] assigned via a decision tree that classifies entities by type (architecture = 80–100, function = 40–70, utility = 20–50, documentation = 10–30) and adjusts by reference count and recency of modification; and (4) entity relation mapping — edges are validated against a relation ontology (depends-on, implements, overrides, references, decision-for) and stored as adjacency lists keyed by source entity ID, enabling O(1) traversal during context assembly. Entities below 0.3 confidence are deprioritized during injection; the top-20 by importance are guaranteed to survive any truncation.

2.4 Layer 3: Knowledge Graph

The persistent backing store is a directed entity-relationship graph with typed nodes (files, functions, classes, schemas, decisions, patterns) and typed edges (depends-on, implements, overrides, references, decision-for). The graph is stored in a Turso SQLite database and synchronized bidirectionally across sessions via the /ody-sync protocol, which supports delta-only updates and offline queue-and-replay semantics.

3 Skill Execution Trace

We trace the end-to-end execution of /ody-onboard through all five stages. Each stage reveals the data transformations that convert a raw workspace state into a preconditioned context window. Click any stage to inspect the internal payload.

Figure 2

Total: 5 steps · ~6,200 preconditioning tokens · ~5.1s end-to-end latency · 83% token reduction vs. raw Claude 5 Sonnet

End-to-end trace of /ody-onboard execution. The pipeline transforms a cold workspace state into a preconditioned context window in approximately 5.1 seconds, consuming ~6,200 preconditioning tokens. The final context window is 42% occupied, leaving 58% for the user interaction. Raw Claude 5 Sonnet requires ~52s on equivalent tasks.

The trace demonstrates that the majority of the preconditioning cost is concentrated in Layer 1 recall (4,200 tokens, 0.8s), which retrieves and compresses raw graph data into summaries. The structured injection (Layer 2) is efficient at 800 tokens and 0.4s, with the four-stage parsing pipeline (JSON validation, confidence scoring, importance weighting, entity relation mapping) adding approximately 0.15s of processing overhead — a negligible fraction of the end-to-end latency. The final execution against Haiku consumes only 1,200 task tokens and completes in 3.4s, yielding an end-to-end latency of 5.1s. This compares to raw Claude 5 Sonnet's 52.1s mean latency on equivalent tasks — a 10.2x speedup. Critically, the preconditioning overhead is constant (approximately 6,200 tokens, 5.1s) regardless of task complexity, while the raw Sonnet 5 baseline scales with task difficulty. For complex tasks that trigger Claude 5 Sonnet's deep-reasoning pathways, the speedup exceeds 15x.

4 Use Case Analysis & Empirical Cost Model

We model the token and cost differential between raw Claude 5 Sonnet and preconditioned Haiku+Ody workflows as a function of daily agent invocations per developer and team size. The cost model is parameterized by the July 1, 2026 Claude 5 Sonnet API pricing ($6/M input, $30/M output) and existing Claude Haiku pricing ($0.25/M input, $1.25/M output).

Figure 3

Daily agent invocations per developer (N)100

1125250375500

Team size (D)5

1255075100

Cost Model Formulation

C_S5(N, D) = N × D × (12,000 × $6/M + 3,600 × $30/M)
= N × D × $0.18
C_H(N, D) = N × D × (2,000 × $0.25/M + 600 × $1.25/M)
= N × D × $0.00125
S(N, D) = N × D × ($0.17875) = N × D × $0.17875

Metric	Raw Sonnet 5	Haiku + Ody	Delta
Daily Tasks	500	500	—
Input Tokens	6000K	1000K	-83%
Daily Cost	$90.00	$0.62	-$89.38
Monthly Cost (30d)	$2700.00	$18.75	-$2681.25

Sonnet 5: 12,000 in / 3,600 out per task = $0.18000/task|Haiku: 2,000 in / 600 out per task = $0.00125/task|Ratio: 144x

Interactive cost model. Adjust the sliders to project token consumption and cost under each workflow.

4.1 Empirical Validation

We validated the model against a corpus of 47 standardized debugging tasks drawn from real-world pull requests across three open-source TypeScript repositories (ranging from 12 to 347 files). For each task, we measured the token consumption and wall-clock latency of both workflows, controlling for task complexity by ensuring each fix required exactly one root-cause identification and one code modification.

The results confirm the cost model. With Claude 5 Sonnet pricing applied, raw Sonnet 5 consumed a mean of 13,514 input tokens (SD: 3,102) — a 14% increase over Claude 3.5 Sonnet due to deeper exploration — with a mean latency of 52.1 seconds (SD: 14.7) and a mean cost of $0.21 per task (SD: $0.05). Preconditioned Haiku+Ody consumed a mean of 2,034 input tokens (SD: 412) with a mean latency of 5.6 seconds (SD: 1.8) and a mean cost of $0.0013 per task (SD: $0.0003). The cost ratio ranged from 112x to 187x across the 47-task corpus, with a mean of 144x. The token reduction was consistent (mean: 84.9%, range: 81.2%–87.1%), confirming that the preconditioning substrate's efficiency generalizes across codebase sizes and task types — and that the widening cost gap with Claude 5 Sonnet makes the arbitrage even more compelling.

5 Current Implementation Status & Active Initiatives

The Ody codebase is organized into two active epics and an evaluation track. The following tables enumerate open tickets with priority annotations.

ODY-672

Narrowing & Cleaning

ID	Title	Priority	Description
ODY-676	Drizzle migration folder merge	High	Consolidate migration directory structure across all workspace schemas. Merge redundant migration states and establish a single-source-of-truth migration pipeline for both local development and production deployments.
ODY-433	DraftProse tenant isolation	Critical	Implement row-level security and schema-level tenant isolation for DraftProse multi-tenant data. Each tenant's workspace state, entity graph, and cache layer must be cryptographically partitioned with zero cross-tenant leakage.

ODY-677

Agent OS / Skills

ID	Title	Priority	Description
ODY-681	Interactive /ody-onboard	High	Build the primary Ody skill entry point. Handles workspace validation, agent environment detection (Claude Code vs Cursor vs Codex), knowledge graph connection, and full preconditioning substrate injection into the model context window.
ODY-682	/ody-init workspace state	High	Initialize a fresh workspace from the knowledge graph. Loads cached entity relationships, prior fix patterns, and project conventions into the Ody layer cache. Used as the bootstrap command for new agent sessions.
ODY-683	/ody-sync bidirectional sync	Medium	Bidirectional synchronization between local Ody layer cache and remote Turso-backed knowledge graph. Supports delta-only updates, conflict resolution via last-writer-wins with author metadata, and offline queue-and-replay.
ODY-684	/ody-hooks git integration	Medium	Git hook integration layer. Pre-commit hooks update the knowledge graph with new entity relationships. Post-merge hooks invalidate stale cache entries. Continuous preconditioning without manual /ody-sync invocations.

Evaluation & Infrastructure

ID	Title	Priority	Description
ODY-692	Deterministic evaluation harness	High	Build a reproducible benchmarking framework that compares fix success rates between preconditioned Haiku and raw Sonnet on identical tasks. Tracks: first-attempt fix rate, iterations-to-resolve, token consumption, time-to-resolution, and cost per task.
ODY-693	Cache invalidation & staleness detection	Medium	TTL-based and event-driven cache invalidation for Ody layer content. Entities expire after configurable TTLs (24h for functions, 7d for patterns, 30d for architecture decisions). Event-driven invalidation on git push, PR merge, and explicit /ody-sync.

6 Future Work

Local microVM sandboxing. We are exploring the integration of StackBlitz WebContainers-core and Era microVMs to provide deterministic, low-latency execution sandboxes for Ody skill execution. A microVM sandbox would allow /ody-init and /ody-sync to execute arbitrary code for dependency resolution and schema inference without compromising the host environment. Early benchmarks suggest that WebContainers can cold-start in under 200ms and execute a full dependency graph traversal in approximately 1.2s, making them viable as the execution substrate for Ody's preconditioning pipeline.

Context drift control. As the knowledge graph evolves through continuous /ody-sync operations, there is a risk of context drift — the preconditioned state diverging from the actual workspace state. We are developing a statistical drift detector that compares the distribution of entity confidence scores over time and triggers a full re-index when the cumulative drift exceeds a configurable threshold (default: 15% of entities with >0.3 confidence delta).

Auto-sync hooks for CI/CD. The current /ody-hooks implementation supports git pre-commit and post-merge. We are extending this to CI/CD pipeline triggers: on each successful CI run, the knowledge graph is automatically updated with new test coverage data, dependency changes, and build artifacts. This ensures that the preconditioning substrate reflects the most recent verified state of the codebase, not just the most recent commit.

Multi-model routing. The current architecture routes all preconditioned tasks to Haiku. We plan to implement a model router that dynamically selects between Haiku, Claude 5 Sonnet, and future models based on task complexity, cost budget, and latency constraints. The router uses the Layer 2 importance distribution as a proxy for task difficulty: high-importance tasks (mean importance > 70) that demand Claude 5 Sonnet's deep reasoning are routed accordingly, while routine tasks stay on Haiku for cost efficiency. Early simulations suggest this mixed strategy could achieve 90%+ of Claude 5 Sonnet's accuracy on complex tasks while maintaining Haiku-level economics on the remaining 80% of daily invocations.

References

Anthropic. "Claude 5 Model Family Technical Report." 2026.
StackBlitz. "WebContainers: Browser-Based Node.js Runtime." 2024.
Era. "microVM Architecture for Serverless Compute." 2025.
Turso. "libSQL: Open-Contribution SQLite Fork." 2025.
TanStack. "TanStack Start: Full-Stack React Framework." 2026.