Skip to main content

Overview

This example walks through running ArkSim against a shopping assistant agent built for an e-commerce use case. The agent is designed to help customers navigate product discovery, orders, returns, and general shopping queries. The example includes two ready-to-run agent setups you can test against out of the box, and a guide for plugging in your own agent once you’re familiar with the setup.

Example Agents

You can run the example in two ways: Option 1 uses the OpenAI API directly (no server, minimal setup). Option 2 runs a local agent built with the OpenAI Agents SDK (e-commerce knowledge base) via A2A, Chat Completions, or a custom agent connector.

Option 1: OpenAI API

A lightweight agent that calls the OpenAI API directly. Quick start with minimal setup.

Option 2: OpenAI Agents SDK

Agent built with the OpenAI Agents SDK, backed by an e-commerce knowledge base; supports A2A, Chat Completions, or custom agent connector.

Scenarios

The example ships with a set of pre-built scenarios in scenarios.json representing realistic e-commerce customer interactions. Each scenario defines a simulated user with a distinct persona, goal, and background knowledge drawn from product and policy documentation. Sample goals include:
  • Asking about product availability, specifications, and delivery timelines
  • Checking order status and tracking a shipment
  • Initiating or following up on a return or refund request
  • Comparing products and asking for recommendations
Scenarios are defined in scenarios.json in the example directory and can be edited or extended to reflect your own use case.
Before following either path, ensure ArkSim is installed (pip install arksim).

Option 1: OpenAI Agent

This agent calls the OpenAI API directly, with no server setup required.
1

Set your API key

export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
2

Run simulation and evaluation

Run from the examples/e-commerce directory:
cd examples/e-commerce
arksim simulate-evaluate config.yaml

Option 2: OpenAI Agents SDK

This agent is a shopping assistant built with the OpenAI Agents SDK, backed by an e-commerce knowledge base. It can be exposed via the A2A Protocol, a Chat Completions-compatible endpoint, or loaded directly as a Python class.
1

Select agent config

In the example directory, use the config file for your chosen interface:
  • A2A: config_a2a.yaml (inline agent config; uses ${A2A_API_KEY})
  • Chat Completions: config_chat_completions.yaml (inline agent config; uses ${AGENT_API_KEY})
  • Custom agent connector: config_custom.yaml (loads agent directly as a Python class — no server needed)
Set the matching environment variable before running.
2

Install agent dependencies

uv venv --python 3.11
source .venv/bin/activate
uv pip install -r examples/e-commerce/agent_server/requirements.txt
3

Start the agent server

Exposes an A2A-compatible agent on port 9999. Run from the repository root:
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export A2A_API_KEY=1234-4567-8910
python -m examples.e-commerce.agent_server.a2a.server
Run from the repository root (or the directory that contains the examples package). If your layout differs, use the module path that resolves to agent_server/a2a/server.py under the e-commerce example.
4

Run simulation and evaluation

In a new terminal, from examples/e-commerce, run the commands for the same interface you used in step 3. Use the same API key you set there.
cd examples/e-commerce
export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export A2A_API_KEY=1234-4567-8910
arksim simulate-evaluate config_a2a.yaml

Running with Your Own Agent

To test your own backend agent against these scenarios:
  • Chat Completions: Follow the comments in agent_server/chat_completions/server.py to swap in your own backend logic or point to your endpoint.
  • A2A: Implement your own A2A executor in agent_server/a2a/agent_executor.py.
  • Custom: Subclass BaseAgent and point config_custom.yaml at your module. See Custom agent configuration for details.
Make sure the agent_config field in the config YAML is updated. Then run simulation and evaluation as above.
See Agent configuration for supported protocols and how to configure agent connections.

Configuration

The example uses a single config file for both simulation and evaluation.
# AGENT CONFIGURATION

agent_config:
  agent_type: chat_completions
  agent_name: e-commerce
  api_config:
    endpoint: https://api.openai.com/v1/chat/completions
    headers:
      Content-Type: application/json
      Authorization: "Bearer ${OPENAI_API_KEY}"
    body:
      model: gpt-5.1
      messages:
        - role: system
          content: >-
            You are a shopping assistant. Please provide a complete answer
            to the user's question based on your knowledge. Response within
            50 words.

# SIMULATION SETTINGS

# Path to the scenarios file
scenario_file_path: ./scenarios.json

# Number of conversations per scenario to generate
num_conversations_per_scenario: 1

# Maximum turns per conversation
max_turns: 5

# Output file path for simulation results
output_file_path: ./results/simulation/simulation.json

# Jinja template for the simulated user's system prompt
simulated_user_prompt_template: null

# EVALUATION SETTINGS

# Output directory for evaluation results
output_dir: ./results/evaluation

# Paths to Python files defining custom
# QuantitativeMetric or QualitativeMetric subclasses
custom_metrics_file_paths:
  - ./custom_metrics.py

# Built-in metrics to run; if empty, all built-in metrics run
metrics_to_run:
  - faithfulness
  - helpfulness
  - coherence
  - verbosity
  - relevance
  - goal_completion
  - agent_behavior_failure

# Generate HTML report
generate_html_report: true

# Numeric thresholds: per-metric minimum scores on native scale
# numeric_thresholds:
#   overall_score: 0.7
#   goal_completion: 0.6

# SHARED SETTINGS

# LLM model
model: gpt-5.1

# LLM provider
provider: openai

# Workers for parallel processing
num_workers: 50

Output

Results are written under the example directory:
LocationContents
./results/simulation/simulation.jsonSimulated conversations from the simulation step
./results/evaluation/evaluation.jsonEvaluation results (per-turn and per-conversation scores, unique errors)
./results/evaluation/final_report.htmlInteractive HTML report for browsing and sharing results