Documentation Index
Fetch the complete documentation index at: https://docs.arklex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Some agents handle tool calling internally (e.g. OpenAI Agents SDK, LangChain) and only return text. arksim can capture these tool calls so the evaluator can score them. There are two ways to provide tool calls to arksim:- Explicit: Return
AgentResponsewith tool calls fromexecute()(Python connector). - Automatic capture: Enable
trace_receiverin your config. arksim captures tool calls via the SDK’s tracing interface. The agent just returns text.
Connector Support
| Connector | Explicit (AgentResponse) | Automatic Capture |
|---|---|---|
| Python connector | Supported | Supported via ArksimTracingProcessor |
| Chat Completions | Planned | Planned |
| A2A | Supported via A2A AgentExtension | Not applicable |
Tool call capture works with the Python connector and A2A. A2A agents declare the
https://arksim.arklex.ai/a2a/tool-call-capture/v1 extension and surface tool calls in Task.artifacts[*].metadata. Chat Completions support is planned.Explicit Capture (Python Connector)
ReturnAgentResponse from your agent’s execute() method with the tool calls your agent made:
Metadata keys
arksim passes ametadata dict to your agent’s execute() via kwargs["metadata"]:
| Key | Type | Description |
|---|---|---|
chat_id | str | Conversation ID assigned by agent.get_chat_id() |
turn_id | int | Zero-based turn index within the conversation |
Automatic Capture (Python Connector)
For agents using the OpenAI Agents SDK with the Python connector, add two things: 1. Register the processor at module level in your agent file:execute() method just returns a string:
execute(). When the SDK fires on_span_end, the processor reads this context and injects tool calls into the receiver’s buffer. No wrapping, no manual context management.
The module loader caches modules by file path, so add_trace_processor runs exactly once regardless of how many conversations the simulator creates.
See examples/customer-service/traced_agent.py for a complete working example.
Requires pip install openai-agents.
A2A Protocol Capture
A2A defines no native tool-call semantics (it defers to MCP for tool invocation), so arksim layers a versioned A2A AgentExtension on top:Why artifacts, not messages?
Per A2A spec section 3.7, task outputs are delivered throughArtifacts on a Task, not through Messages. Messages are for conversational turns (initiation, clarification, status); Artifacts are the durable output record. Tool calls are part of the agent’s task output, so they belong in an Artifact.
Server requirements
Both snippets below use these imports:AgentCard.capabilities.extensions:
metadata and listing the URI in extensions. Inside your AgentExecutor.execute():
AgentExtension.params field is not used by v1 of this extension; leave it unset. Server-to-client parameter negotiation is not part of this convention (the schema is documented here rather than encoded in params, because A2A clients do not act on schema constraints at runtime). If you emit streaming updates via TaskArtifactUpdateEvent followed by a final Task snapshot, include the same tool calls on both: arksim treats the final Task snapshot as authoritative and re-merges its artifacts, so including the tool calls there guarantees they survive.
Tool call schema
| Field | Type | Required | Description |
|---|---|---|---|
id | string | No | Unique identifier. Defaults to "" if omitted. |
name | string | Yes | Tool function name. Entries missing this field or with a non-string name are skipped. |
arguments | object | No | Arguments passed to the tool. Must be a JSON object; defaults to {} if omitted. |
result | string | No | Tool execution result. Non-string values are JSON-serialized. |
error | string | No | Error message if the tool call failed. Non-string values are JSON-serialized. |
source | - | N/A | Do not include. arksim sets this client-side to ToolCallSource.A2A_PROTOCOL (serialized as "a2a_protocol") for provenance tracking. Custom metrics can filter by ToolCall.source. |
Streaming
For streaming agents (AgentCapabilities.streaming=True), emit tool call data on both the incremental TaskArtifactUpdateEvent and the final Task snapshot. arksim’s client:
- Accumulates tool calls from each
TaskArtifactUpdateEventas they arrive. - On the final
Tasksnapshot, clears accumulated state and re-merges fromTask.artifacts(the snapshot is treated as authoritative). - Treats
TaskStatusUpdateEventas a no-op that does not affect accumulated state.
TaskArtifactUpdateEvent must also appear on the corresponding artifact in the final Task.artifacts. Otherwise they will be dropped.
Versioning
Per the A2A extensions spec, breaking changes to this schema MUST bump the URI version (/v1 -> /v2). Breaking changes include renaming the tool_calls key, renaming any field in a tool call dict, or making an optional field required. Adding a new optional field is non-breaking.
When arksim bumps to a new version, both A2AToolCaptureExtension.URI and A2AToolCaptureExtension.METADATA_KEY change together (the key is derived from the URI), and the client’s extensions= negotiation header flips to the new URI. The arksim client advertises and reads a single URI at a time; a server still on the old URI silently produces zero tool calls (the artifact is extracted but its extensions list no longer matches). For a hard cutover, update server and client together. For graceful migration, the server can declare both URIs in its AgentCard and emit two separate artifacts (one per URI, each listing its URI in extensions and carrying tool calls under its respective {URI}/tool_calls metadata key) until all clients upgrade.
See the customer-service A2A example for a complete working example using the same tools and scenarios as the Python connector variant.
Security considerations
Tool call metadata is more sensitive than the agent’s text response. Arguments can contain PII (email addresses, customer IDs), verification codes, internal tool names that expose your agent’s architecture, and verbatim tool results. A server that emits this data unconditionally exposes strictly more than it would without the extension. The arksim example server gates tool call emission on the client’sA2A-Extensions request header: if the client does not request the extension, the server omits tool call metadata from the artifact. This is the right default, but the request header is not access control; any HTTP client can send it.
Three deployment options, ordered by isolation strength:
Option A: Dedicated evaluation deployment. Run two instances of the agent: a production instance that does not declare the extension in its AgentCard, and an eval-only instance behind private networking that does. Zero risk of leaking tool traces to production traffic. Best fit for regulated domains (HIPAA, PCI) or when production and evaluation are operated by different teams.
Option B: Single deployment with transport auth. One endpoint, behind bearer token or mTLS declared in AgentCard.securitySchemes. arksim carries the credential via A2AConfig.headers in the config:
A2A-Extensions request header. Auth keeps unauthorized clients out; the negotiation gate provides least-privilege within the authenticated group.
Option C: Environment flag gate. The server only emits the extension when an environment variable (e.g. ARKSIM_TOOL_CAPTURE_ENABLED=true) is set at startup. Simplest but weakest: useful for CI/staging where the entire deployment is evaluation-facing, but a flipped flag in production silently turns on exposure.
For most teams, Option B is the practical default.
Deduplication
When bothAgentResponse and automatic capture (via ArksimTracingProcessor) provide tool calls for the same turn, arksim merges and deduplicates:
- By ID: tool calls with matching IDs are deduplicated (AgentResponse wins)
- By signature: tool calls with matching
(name, arguments)are deduplicated (AgentResponse wins) - Unique: tool calls from either source that don’t match anything are included