Soothsayer MCP kernel: from prompts to controlled orchestration
How I built a policy-governed MCP runtime where models can reason freely but execution stays deterministic, verifiable, and auditable.
Key takeaways
- Reliable agents need a runtime loop, not just model + tools.
- In this design, the model proposes actions, but the runtime decides execution.
- Lifecycle state, permission gates, and trace logs are what make orchestration trustworthy.
This started from a familiar failure mode. Early agent demos looked impressive, but as soon as tasks became multi-step, the system became hard to trust. Tools executed in surprising order, state went out of sync, and success messages appeared without proof.
The core lesson was simple: model reasoning is probabilistic, but system execution must be controlled. So instead of building a tool-calling chatbot, I built workspace-mcp as a runtime kernel with explicit orchestration rules.
Architecture map
The most important boundary is this:
think and act are separated.
The model can think and propose. The runtime can approve, deny, execute, verify, and record.
What happened
The first version followed the typical pattern: prompt in, tool call out, answer back. It was fast to build and good for demos, but unreliable for sustained use.
Three problems kept repeating:
- tool calls retried in loops
- write operations happened without strong guardrails
- success was claimed before verification
The two gates
The first gate is permission governance. Before a tool runs, the runtime must decide: allow, deny, or ask user.
The second gate is contract integrity. If response shape drifts, clients become fragile and audit trails become noisy.
Message parts timeline
DAX-style event streaming becomes much easier to inspect when each response is split into structured parts instead of one long string. This is the timeline model that keeps execution debuggable.
Setup walkthrough
- Enter the
workspace-mcppackage in the Soothsayer repo. - Install in editable mode with dev tooling.
- Start the kernel with explicit workspace and profile.
- Run a controlled lifecycle, not ad hoc tool calls.
cd workspace-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
workspace-mcp --workspace-root /tmp/workspace --profile dev
First-time config
Project policy layering is where this becomes useful in real teams. The packaged kernel policy is stable, while each workspace can overlay tighter controls.
workspace-mcp \
--workspace-root ./examples/minimal-project/workspace \
--policy-path ./examples/minimal-project/project_policy.yaml \
--profile dev
A minimal lifecycle should always follow this order:
kernel_versionself_checkstart_runrepo_searchcreate_change_bundlebundle_reportend_runget_run_summary
This keeps every change traceable to one explicit run context.
Quick checks
Build-time confidence:
pytest -q
ruff check .
mypy src
Deterministic hash parity:
python spec-tests/generate_bundle_hashes.py --check
Runtime behavior sanity:
workspace-mcp --workspace-root /tmp/workspace --profile read_only
For CI hardening, run the kernel under ci profile and assert blocked write flows return canonical violations instead of tool output.
Failure modes
- Starting without
start_run, then assuming subsequent steps are connected. - Letting the model call tools directly without a policy decision boundary.
- Trusting success text without verification evidence.
- Treating violations as free-form errors rather than structured contract outputs.
- Ignoring loop behavior (repeated tool intents) until runtime costs spike.
What made the difference
These decisions made the runtime resilient:
- Enforce runtime invariants (
metashape, code alignment, timestamp format). - Make lifecycle explicit (
run_id, owner scoping, bounded run store). - Keep patch/change artifacts deterministic and hash-based.
- Record policy decisions in audit logs that can be explained later.
That moved the system from “assistant behavior” to “orchestrated execution.”
What I would do next time
I would add three integration harnesses on day one: one CLI loop, one IDE loop, one CI loop. The runtime is stable only when contract assumptions are tested from each client surface, not just from unit tests.
If you want the design lens behind this implementation, Dual NLP framework explains the planning language this runtime operationalizes.