All posts
AnthropicClaudeAI ArchitectureTool RetrievalMCP

Claude's Tool System, Rebuilt More Efficiently

The Anthropic leak is a blueprint for how production AI agents handle tools. We built the same system with one key change: retrieve first, inject second.

April 1, 2026 4 min readby Agent-CoreX

The Anthropic code leak handed developers a rare thing: a detailed look at how a production-grade AI agent actually handles tool orchestration. The 512,000-line TypeScript codebase is, among other things, a thorough specification for an agentic harness.

We read it carefully. A lot of what we found aligned with the architecture we've been building at Agent-CoreX. Some of it confirmed assumptions. And one part pointed directly at the efficiency problem we set out to solve.

What the Leaked Harness Architecture Shows

The core of Claude Code's architecture, as revealed in the leak, is a plugin-style tool registry:

  1. Capabilities are implemented as discrete, permission-gated tools
  2. The model receives tool definitions as part of its context
  3. The model decides which tool to call based on the user's request
  4. The harness routes that call and returns the result

It's a clean, composable design. Tools are auditable, swappable, and easy to extend. The base tool definition alone spans 29,000 lines in the leaked code — a testament to how much production capability lives in the harness rather than the model.

This is the same fundamental architecture MCP codifies: a standard for how an AI model discovers and invokes external tools. Claude Code just implemented it internally, at scale.

The Part That Doesn't Scale

There's one step in the flow above that becomes expensive as the tool registry grows: step 2 — injecting tool definitions into context.

The traditional flow is:

Load all tool definitions → inject into prompt → LLM selects → execute

When you have 10 tools, this is cheap. When you have 50 or 100 tools spread across MCP servers for GitHub, databases, file systems, APIs, and productivity tools, you're adding tens of thousands of tokens of overhead to every single request — before the user's message is even processed.

The model still has to scan all of those definitions to decide what to call. Token cost scales linearly with tool count. Retrieval accuracy drops as the signal-to-noise ratio falls.

The Change Agent-CoreX Makes

Agent-CoreX keeps the same plugin architecture and MCP execution layer but inserts a retrieval step before context injection:

Embed query → retrieve relevant tools → inject only those → LLM executes

Instead of loading 50 tools into context, you retrieve the 3–5 tools that are semantically relevant to the current query and pass only those. The model sees a focused, minimal toolset.

In code, the difference is one API call before you build the prompt:

// Before: all tools, every request
const { tools: allTools } = await fetch(`${ACX_API_BASE}/tools`, {
  headers: { Authorization: `Bearer ${ACX_API_KEY}` }
}).then(r => r.json())

// allTools might be 50+ definitions = 10,000+ tokens of overhead
// After: only relevant tools, per query
const { tools } = await fetch(
  `${ACX_API_BASE}/retrieve_tools?query=${encodeURIComponent(userQuery)}&top_k=5`,
  { headers: { Authorization: `Bearer ${ACX_API_KEY}` } }
).then(r => r.json())

// tools is 3–5 definitions = 600–1,500 tokens of overhead

Same result. The LLM executes the right tool. The difference is everything that doesn't happen: the 45 irrelevant tool definitions that never touch the context window.

Why This Matters Now

The leak lands at a moment when AI agent adoption is accelerating fast. Teams that were running 5 MCP servers six months ago are now running 20–30. The token overhead that was acceptable at small scale becomes a real cost center at production scale.

The three-layer memory architecture in the leaked code shows that Anthropic has thought carefully about context management within sessions. What's not yet optimized — in the leaked codebase or in most production AI agent implementations — is the tool context overhead.

Retrieval-first architecture is the logical next step. The leak confirms the tool-based execution model is right. We're pushing it to be efficient.

How the Execution Layer Works

Once tools are retrieved, execution happens through Agent-CoreX's MCP-based execution layer — the same /execute_tool endpoint used in the Playground:

if (message.stop_reason === "tool_use") {
  const toolUse = message.content.find(b => b.type === "tool_use")

  const result = await fetch(`${ACX_API_BASE}/execute_tool`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ACX_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ tool: toolUse.name, args: toolUse.input }),
  }).then(r => r.json())

  // Feed result back as tool_result in the next message
}

Agent-CoreX routes the call to the correct MCP server, handles the protocol, and returns the result. You get the full plugin architecture without managing the server-side execution yourself.

See how much you'd save with semantic retrieval →

Enable MCP servers and try it in the Playground →

Try Agent-CoreX for free

Connect 100+ MCP tools. Cut LLM costs by 60%. Setup in 2 minutes.

Get started free