Building with MCP: Anthropic Guidance and Code Execution in Claude
What Is the Model Context Protocol (MCP) by Anthropic?
The model context protocol (MCP) is a framework introduced by Anthropic for its language models, such as Claude. MCP improves dynamic tool use by enabling language models to interact with code execution environments and external tools in a structured manner.
Unlike traditional prompt engineering or basic function calling interfaces, MCP provides a protocol-based approach to enable agent behaviors within a language model’s context window. Through MCP, models can call functions, access tool output, and manage workflows as coherent sequences, rather than as isolated calls.
The protocol defines how state, inputs, outputs, and intermediate tool results are passed between the model and hosted tools. This structured context simplifies chains of interactions, allowing for the design of autonomous agents or assistants capable of executing multi-step tasks with reliable state management and tool use.
Read our Beginner’s Guide to MCP for a deeper dive into the model context protocol (MCP).
In this article:
Benefits of Code Execution with MCP in Claude
Code execution through the model context protocol enables models like Claude to interact with code environments in a more stable and controllable way. Instead of relying on ad hoc prompt instructions, MCP formalizes how code is executed, how outputs are returned, and how they are reused across steps.
Key benefits include:
- Structured execution flow: MCP allows the model to call code execution tools with defined inputs and capture outputs in a structured way, reducing ambiguity and errors.
- Persistent state management: The protocol maintains a persistent context across multiple calls, enabling consistent use of variables, intermediate results, and tool state during tasks.
- Improved debugging and traceability: Because each step and its results are tracked, developers can audit execution flows, making it easier to trace bugs or understand model decisions.
- Dynamic adaptation: The model can condition its next actions on actual code execution results, enabling adaptive behavior in workflows like data analysis, simulation, or transformation tasks.
- Decoupled tool integration: MCP supports modular integration of different tools, so the same interface can handle code execution, querying APIs, or invoking custom utilities without changing the model prompt.
- Reduced prompt engineering overhead: By formalizing interaction patterns, MCP reduces the need for handcrafted prompt templates for each tool use case.
Anthropic’s MCP vs. OpenAI Function Calling
While both Anthropic’s model context protocol (MCP) and OpenAI’s function calling enable language models to interact with external tools, they differ significantly in architecture, scope, and state management.
OpenAI’s function calling is a structured extension of prompt inputs, where developers define functions with JSON schemas, and the model selects and calls these functions based on user intent. Each function call is typically treated as an isolated interaction, requiring external orchestration to maintain state or manage multi-step workflows.
MCP is built around a persistent, protocol-based context that keeps track of inputs, tool outputs, and intermediate states across multiple interactions. This enables more complex agent behaviors without requiring external state management. MCP embeds tool usage within the model’s context window, supporting a richer and more continuous interaction pattern.
MCP also emphasizes decoupling the interface from the implementation of tools. Tool invocations and responses follow a protocol, making it easier to switch or scale tools without redesigning the prompt or orchestration layer. OpenAI’s approach, while easier to adopt initially, can become more rigid in complex workflows due to its lack of built-in context persistence.
Anthropic’s Guidance for Building Efficient Agents with MCP
To build efficient agents using Anthropic’s Model Context Protocol (MCP), developers can shift from direct tool calls to code execution. This approach allows the model to generate and run code that interacts with MCP tools programmatically, reducing token usage, improving performance, and enabling more advanced control flows.
This section is adapted from an article by Anthropic engineering.
The Problem with Direct Tool Calls
By default, many MCP clients load all available tool definitions into the model’s context. This consumes significant tokens, especially when the agent has access to many tools. Additionally, intermediate tool results, such as large documents or datasets, are routed through the model’s context, often more than once. This bloats the context window and increases latency and cost.
For example, if an agent retrieves a transcript from Google Drive and then uploads it to Salesforce, the full transcript is processed twice (once when retrieved, and again when submitted) potentially adding tens of thousands of tokens per request.
Using Code Execution to Optimize Context
To solve this, tools can be exposed as code APIs. Instead of loading all tool definitions into the context window, the model can discover and invoke tools by writing and executing TypeScript code.
Here’s how this works:
// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../../../client.js";
interface GetDocumentInput {
documentId: string;
}
interface GetDocumentResponse {
content: string;
}
export async function getDocument(input: GetDocumentInput): Promise<GetDocumentResponse> {
return callMCPTool<GetDocumentResponse>('google_drive__get_document', input);
}
The agent can then combine tool calls programmatically:
// Download transcript and upload to Salesforce
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';
const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
objectType: 'SalesMeeting',
recordId: '00G3s020403cabNDF',
data: { Notes: transcript }
});
This avoids passing large intermediate results through the model context. Instead, data flows between tools directly via the execution environment.
Benefits of Code Execution
1. Progressive tool loading: Tools are discovered dynamically by reading the file system or using a search_tools API. This allows agents to load only the definitions they need at runtime.
2. Efficient data handling: Agents can filter or transform large datasets before returning results to the model. For example:
const allRows = await gdrive.getSheet({ sheetId: 'nfc381' });
const pendingOrders = allRows.filter(row => row["Status"] === 'pending');
console.log(pendingOrders.slice(0, 5));
This replaces thousands of rows with only a few in the model’s context.
3. Structured control flow: Loops and conditionals can be handled in code:
let found = false;
while (!found) {
const messages = await slack.getChannelHistory({ channel: 'B483759' });
found = messages.some(m => m.text.includes('deployment complete'));
if (!found) await new Promise(r => setTimeout(r, 5000));
}
console.log('Deployment notification received');
This enables fast, context-efficient execution compared to orchestrating logic through multiple model responses.
4. Privacy preservation: Sensitive data is handled outside the model by default. The MCP client can tokenize personally identifiable information (PII), shielding it from the model:
[
{ salesforceId: '00G3s020403cabNDF', email: '[EMAIL_1]', phone: '[PHONE_1]', name: '[NAME_1]' },
...
]
Real values are only untokenized when passed to trusted tools, ensuring secure data handling.
5. State persistence and reusable skills: Agents can write to the file system to track progress or resume tasks:
const leads = await salesforce.query({ query: 'SELECT Id, Email FROM Lead LIMIT 1000' });
const csvData = leads.map(l => `${l.Id},${l.Email}`).join('\n');
await fs.writeFile('./workspace/leads.csv', csvData);
They can also save working code as reusable skills:
// ./skills/save-sheet-as-csv.ts
import * as gdrive from './servers/google-drive';
export async function saveSheetAsCsv(sheetId: string) {
const data = await gdrive.getSheet({ sheetId });
const csv = data.map(row => row.join(',')).join('\n');
await fs.writeFile(`./workspace/sheet-${sheetId}.csv`, csv);
return `./workspace/sheet-${sheetId}.csv`;
}
These skills become part of the agent’s evolving toolbox for specialized tasks.
Considerations and Trade-Offs
While code execution with MCP offers major benefits, especially around token efficiency, control flow, and data privacy, it also introduces new complexity that teams must manage:
- Operational overhead: Executing model-generated code requires a sandboxed runtime that enforces resource limits, blocks unauthorized access, and isolates executions. Maintaining this environment adds operational burden, especially for teams without secure execution infrastructure experience.
- Security considerations: Allowing agents to run arbitrary code increases exposure to injection risks, filesystem misuse, and compute overuse. Safeguards such as validation layers, auditing, and logging become necessary to detect misuse and abnormal behavior.
- Debugging complexity: Dynamic code generation makes debugging dependent on visibility into both the produced code and its runtime behavior. Failures can come from incorrect model assumptions, incomplete tool metadata, or unexpected tool outputs, which requires access to execution logs, traces, and return values.
- Tool discoverability and usability: Dynamic tool loading reduces context use but gives models less clarity about available tools. This can increase trial-and-error and incorrect tool selection. Searchable indexes or schema summaries can help but require additional development work.
- Consistency and skill drift: Persisted agent-generated code can fall out of sync with evolving tool interfaces. Changes in backend tools or lack of validation can cause outdated assumptions, making ongoing synchronization and maintenance necessary.
Best Practices for Building with Anthropic’s MCP
Developers should consider the following best practices when using MCP to build agents.
1. Use Context Windows Efficiently
Efficient use of context windows is essential when building agents with MCP, as the model’s ability to process information depends heavily on the available context. Avoid bloating the prompt with redundant or verbose data, as this quickly exhausts memory and can degrade performance. Instead, focus on including only necessary and recent state, tool results, and messages relevant to the current workflow.
Segmenting workflows and tasks so that only required data persists in the prompt helps prevent confusion and maintains agent reliability, especially in longer sessions. Monitoring window utilization with each interaction and adopting strategies such as summarization or selective forgetting for older state can optimize buffer usage.
2. Limit Intermediate Tool Results
Intermediate tool results often accumulate rapidly during complex agent workflows, cluttering the model’s context window and increasing the risk of hallucination or performance throttling. To mitigate this, design tools to return only concise, relevant results after each invocation. Avoid logging verbose debug output or large, unnecessary artifacts unless they are strictly required for following steps in the workflow.
Implement policies that truncate, summarize, or compress intermediate results before persisting them to the context window. This not only ensures that the model’s memory is used efficiently but also simplifies state transitions and reduces the potential for context overflow. Consistent enforcement of result limits helps maintain model determinism and alignment, which is particularly important in safety-critical applications.
3. Modularize MCP Servers for Scalability
As agents and tool ecosystems grow, modularizing MCP servers becomes essential for managing complexity and supporting scalability. By organizing tools, logic, and state handlers in independent, interchangeable modules, developers can safely deploy updates, add new features, or fix bugs without disrupting the entire platform. This design approach enables better separation of concerns, making each component easier to test, maintain, and secure.
Modular MCP servers also enable horizontal scaling, as each module can be replicated or distributed according to demand or use case requirements. When one tool or logic branch needs to scale rapidly, modular deployment allows targeted resource allocation without overprovisioning other agent features. This flexibility maximizes resource efficiency and ensures that growth does not compromise reliability or maintainability.
4. Persist State Only When Necessary
State persistence is a powerful capability in agent architectures but should be used judiciously. Persisting too much state consumes context window space and can complicate the model’s reasoning by introducing outdated, irrelevant, or ambiguous information. Only essential variables, statuses, and tool results required for future decision-making steps should be serialized and maintained throughout the agent lifecycle.
Implement rigorous guidelines for state checkpoints, discarding obsolete or one-time-use data as soon as it is no longer needed. This practice improves the interpretability and predictability of agent decisions, as there is less historical baggage to process and fewer edge cases for the model to handle. Minimizing state also reduces security risk through data minimization, as less sensitive or proprietary information is retained.
5. Test Context Serialization Thoroughly
Thorough testing of context serialization is critical for building robust MCP-powered agents. Serialization errors can result in corrupted, inconsistent, or lost state, causing agent workflows to break or behave unpredictably. Automated tests covering typical and edge-case scenarios for both serializing and deserializing agent context are necessary to ensure consistent operation through the complete lifecycle of each workflow.
Special attention should be paid to migrations or schema changes in context serialization logic, as these can introduce difficult-to-diagnose bugs. Simulate real-world state transitions, rollback scenarios, and backward compatibility as part of the testing regimen. By establishing rigorous serialization checks, developers can ensure the continuity and reliability of the agent.