Run MCP tools by writing code instead of calling them one by one.
The normal way to use MCP is to hand the model every tool a server exposes. It works, but it gets expensive fast. All those tool definitions sit in the context, and every intermediate result has to travel back through the model before you can do anything with it.
The idea here, borrowed from Cloudflare's "code mode" and Anthropic's writeup, is to flip that around. Give the model just three tools (discover servers, find a server's tools, and execute code) and let it write a program that does the orchestration. Loops, filtering, chaining calls together, all of it happens in the code instead of the chat, so intermediate results never re-enter the context.
The code runs in an embedded Boa engine (a pure-Rust JavaScript
interpreter), locked down so it can't touch the network, the filesystem, or anything else on the
host. The only way out is the per-server modules we generate from the tools you expose, and the
allowlist behind them is enforced in Rust. The engine sits behind a CodeRuntime trait, so the
community can add other engines and languages later.
As a library / Rig dependency (the codemode()-for-Rust shape): wrap your MCP servers (and/or
in-Rust tools) and give a Rig agent the three-tool surface instead of every tool.
use std::sync::Arc;
use codemode::{CodeMode, ServerConfig};
use codemode::rig::CodeModeExt;
let cm = Arc::new(
CodeMode::builder()
.server(ServerConfig::stdio("filesystem", "npx",
["-y", "@modelcontextprotocol/server-filesystem", "."]))
.build()
.await?,
);
let agent = client.agent(model).preamble("…").code_mode(&cm).build();Expose only the tools you trust the model to compose: the sandbox isolates compute, not capability, so injected code can chain any tools you allow (a read tool plus a write/send tool can exfiltrate). Keep the allowlist least-privilege. For untrusted/prompt-injected input, prefer the
subprocessruntime for hard OS isolation. See security.
As a standalone MCP server any host can configure:
--config takes our servers.toml (servers + allowlist + limits) or a standard .mcp.json.
cargo test # the full suite
cargo run --bin codemode-mcp -- tools --config servers.toml # list the exposed servers and tools
cargo run --example token_savings # token usage: traditional vs. code mode (needs a local LLM)
cargo run --example rig_agent --features rig-example # the same task through a Rig agent (needs a local LLM)The rig_agent example is the Surface B story end to end: a few lines wire CodeMode into a Rig
agent with .code_mode(&cm). The rig feature is just the adapter (you bring your own provider);
rig-example additionally pulls rig-core's openai provider so the example can reach a local server.
The token_savings example runs the same task both ways against a local OpenAI-compatible model and
reports, for each, the tokens, the number of LLM turns, and the wall-clock time. The task is a
production-shaped report (a "top sellers among well-reviewed, in-stock products" query that chains
five tools per product). A sample run reached the same answer both ways:
== Result (traditional -> code mode) ==
tokens: 11950 -> 2649 (78% fewer)
LLM turns: 6 -> 2 (each turn is a network round-trip)
latency: 205.6s -> 87.9s (57% faster)
The win grows with the data: in traditional tool-calling every intermediate result re-enters the model's context and is re-tokenized each turn, while in code mode it stays in the sandbox and only the final answer returns.
Design and contracts live in docs/: the Boa engine, the engine-agnostic seam, the MCP client, the two surfaces, the security model, and the API/CLI specs in docs/spec/.
{ "mcpServers": { "codemode": { "command": "codemode-mcp", "args": ["serve", "--config", "servers.toml"] } } }