Execution
90-Day Build Plan
Three phases, thirty-four tasks, two clouds. Bedrock + AgentCore on AWS and Foundry Models + Foundry Agent Service on Azure are peer surfaces — model dispatch and the new AgentRuntime substrate method group treat them symmetrically. The full task graph lives in maestro-build-plan.yaml — Claude Code can pull tasks from it directly.
↓ maestro-build-plan.yaml ↓ maestro-architecture.md ↓ maestro-README.md
Three phases
Spine
A continuation can be created, sleep, wake, run a no-op step, and complete. On both clouds. Same test suite passes.
- Define Substrate interface
- Continuation data model & event types
- Azure: Cosmos DB store + lease
- AWS: DynamoDB store + lease
- Append-only event log
- Scheduler v0 (timer wakes)
- Worker pod (no-op tick)
- Public API v0
- Warm store CDC into Redshift / Synapse
- Observability spine
Field & Policy
A continuation reads from a real context field, the policy kernel routes model calls, real money is spent against real budgets.
- Semantic channel (dairy corpus indexed)
- Stage-1 recall across all channels
- Stage-2 cross-encoder reranker
- Field materialization service
- Policy kernel v0 (rules-only)
- Tiered model dispatch (Bedrock + Foundry Models)
- Foundry Models inference provider
- Tool-schema normalization across providers
- Budget enforcement (soft-cap downshift, hard-cap terminate)
- End-to-end evidence-brief pilot
- Notebook UI v0
- Cargill tenant onboarding
Continuity & Learning
Wake on streams and signals. Forks and merges. Policy kernel trains on real usage. Two Cargill consultants run real briefs for two weeks.
- Stream-subscription wakes
- Sibling-publish + human-signal wakes
- Forking and merging with lineage integrity
- Policy kernel v1 (GBT training pipeline)
- Policy kernel v1 shadow deployment
- Tool stream adapters (PubMed, JDS, Cargill internal)
- Researcher notebook v1 (full workflow)
- Two-consultant Cargill pilot
- Maestro public site (this site)
- ADR backlog + Days-91+ handoff
- Foundry Agent Service substrate adapter
- AgentCore substrate adapter
Cross-cutting concerns
These run across all three phases and are not optional. Every PR is expected to address them where relevant.
- Security
- Short-lived tokens, tenant isolation at store layer, PII redaction on write, compliance freeze switch.
- Testing
- Substrate suite against both clouds in CI. Chaos tests for lease expiry, worker death, stream replay. Nightly E2E.
- Cost discipline
- Per-tenant daily $ cap at ingress. Field reranker is the #1 watched metric. Policy kernel on CPU only.
- Vendor delegation
- Continuations live in the Maestro worker pool by default. Delegation to AgentCore or Foundry Agent Service is opt-in per tick, gated on session-ceiling and tenant policy class, recorded as runtime: maestro | agentcore | foundry on every model_call event.
- Documentation
- README per package. ADRs for every load-bearing decision. If code disagrees with the architecture doc, one of them is wrong.
Working with Claude Code
The plan is designed to be consumed by Claude Code. Pick the lowest-numbered task whose dependencies are met, look at deliverables and acceptance, implement, verify, ship. One task per branch. PR title is the task id and title.
Three rules that are not negotiable:
- No LLM calls on the policy kernel critical path.
- No cloud SDK imports in
core/. - Every state change is an event in the event log. No exceptions.