The Anthropic leak is a blueprint for how production AI agents handle tools. We built the same system with one key change: retrieve first, inject second.
The Anthropic code leak handed developers a rare thing: a detailed look at how a production-grade AI agent actually handles tool orchestration. The 512,000-line TypeScript codebase is, among other things, a thorough specification for an agentic harness.
We read it carefully. A lot of what we found aligned with the architecture we've been building at Agent-CoreX. Some of it confirmed assumptions. And one part pointed directly at the efficiency problem we set out to solve.
The core of Claude Code's architecture, as revealed in the leak, is a plugin-style tool registry:
It's a clean, composable design. Tools are auditable, swappable, and easy to extend. The base tool definition alone spans 29,000 lines in the leaked code — a testament to how much production capability lives in the harness rather than the model.
This is the same fundamental architecture MCP codifies: a standard for how an AI model discovers and invokes external tools. Claude Code just implemented it internally, at scale.
There's one step in the flow above that becomes expensive as the tool registry grows: step 2 — injecting tool definitions into context.
The traditional flow is:
Load all tool definitions → inject into prompt → LLM selects → execute
When you have 10 tools, this is cheap. When you have 50 or 100 tools spread across MCP servers for GitHub, databases, file systems, APIs, and productivity tools, you're adding tens of thousands of tokens of overhead to every single request — before the user's message is even processed.
The model still has to scan all of those definitions to decide what to call. Token cost scales linearly with tool count. Retrieval accuracy drops as the signal-to-noise ratio falls.
Agent-CoreX keeps the same plugin architecture and MCP execution layer but inserts a retrieval step before context injection:
Embed query → retrieve relevant tools → inject only those → LLM executes
Instead of loading 50 tools into context, you retrieve the 3–5 tools that are semantically relevant to the current query and pass only those. The model sees a focused, minimal toolset.
In code, the difference is one API call before you build the prompt:
// Before: all tools, every request
const { tools: allTools } = await fetch(`${ACX_API_BASE}/tools`, {
headers: { Authorization: `Bearer ${ACX_API_KEY}` }
}).then(r => r.json())
// allTools might be 50+ definitions = 10,000+ tokens of overhead
// After: only relevant tools, per query
const { tools } = await fetch(
`${ACX_API_BASE}/retrieve_tools?query=${encodeURIComponent(userQuery)}&top_k=5`,
{ headers: { Authorization: `Bearer ${ACX_API_KEY}` } }
).then(r => r.json())
// tools is 3–5 definitions = 600–1,500 tokens of overhead
Same result. The LLM executes the right tool. The difference is everything that doesn't happen: the 45 irrelevant tool definitions that never touch the context window.
The leak lands at a moment when AI agent adoption is accelerating fast. Teams that were running 5 MCP servers six months ago are now running 20–30. The token overhead that was acceptable at small scale becomes a real cost center at production scale.
The three-layer memory architecture in the leaked code shows that Anthropic has thought carefully about context management within sessions. What's not yet optimized — in the leaked codebase or in most production AI agent implementations — is the tool context overhead.
Retrieval-first architecture is the logical next step. The leak confirms the tool-based execution model is right. We're pushing it to be efficient.
Once tools are retrieved, execution happens through Agent-CoreX's MCP-based execution layer — the same /execute_tool endpoint used in the Playground:
if (message.stop_reason === "tool_use") {
const toolUse = message.content.find(b => b.type === "tool_use")
const result = await fetch(`${ACX_API_BASE}/execute_tool`, {
method: "POST",
headers: {
Authorization: `Bearer ${ACX_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ tool: toolUse.name, args: toolUse.input }),
}).then(r => r.json())
// Feed result back as tool_result in the next message
}
Agent-CoreX routes the call to the correct MCP server, handles the protocol, and returns the result. You get the full plugin architecture without managing the server-side execution yourself.
512K lines of production AI agent code: a rare opportunity. Five lessons from the Claude Code leak that should shape how you build AI agents.
The Anthropic leak exposed Claude Code's 29,000-line tool definition. Here's the engineering problem this reveals — and why it doesn't scale.
From account creation to your first tool retrieval call — a step-by-step guide to the Agent-CoreX dashboard, MCP marketplace, and retrieval API.
Connect 100+ MCP tools. Cut LLM costs by 60%. Setup in 2 minutes.
Get started free