Quickstart - ArkSim Docs

Prerequisites

Python 3.10–3.13
An API key from a supported provider (e.g. OpenAI, Anthropic, or Google Gemini)

Get started

Install ArkSim

pip install arksim

Set your API key and config

OpenAI (default)
Other providers

Set your API key:

export OPENAI_API_KEY="your-api-key"

Anthropic (Claude): Set your API key.

export ANTHROPIC_API_KEY="your-api-key"

In your config.yaml:

agent_config:
  agent_type: chat_completions
  agent_name: bank-insurance
  api_config:
    endpoint: https://api.anthropic.com/v1/messages
    headers:
      Content-Type: application/json
      x-api-key: "${ANTHROPIC_API_KEY}"
      anthropic-version: "2023-06-01"
    body:
      model: claude-opus-4-6
      max_tokens: 1024
      system: |
        You are a customer service chatbot for XYZ Bank insurance.
        XYZ Insurance—a core business within XYZ Bank Group—is one of
        Canada's leading providers of life, health, home, auto, and
        travel insurance.

        Rules:
        1. Do not flip roles.
        2. Avoid using bullet points or lists.
        3. Never exceed 80 words.

# ...

# LLM model
model: claude-opus-4-6

# LLM provider
provider: anthropic

Google Gemini: Set your API key. You can use Google Gemini as the evaluation LLM and optionally as the agent (Google Gemini exposes an OpenAI-compatible endpoint).

export GEMINI_API_KEY="your-api-key"

In your config.yaml:

agent_config:
  agent_type: chat_completions
  agent_name: e-commerce
  api_config:
    endpoint: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    headers:
      Content-Type: application/json
      Authorization: "Bearer ${GEMINI_API_KEY}"
    body:
      model: gemini-2.5-flash
      messages:
        - role: system
          content: >-
            You are a shopping assistant. Please provide a complete answer
            to the user's question based on your knowledge. Response within
            50 words.

# ...

# LLM model
model: gemini-2.5-flash

# LLM provider
provider: google

Run Simulation & Evaluation

Download the examples, then run simulation and evaluation via CLI, Python, or the web UI.Download examples: From your project directory, run:

arksim examples

This creates an examples/ folder with three projects you can experiment.

CLI
Python
UI

From the bank-insurance example directory run:

cd examples/bank-insurance
arksim simulate-evaluate config.yaml

This uses the example’s config.yaml (OpenAI) and scenarios.json. Ensure OPENAI_API_KEY is set from Step 2.

Run simulation and evaluation from Python using the bank-insurance example:

import asyncio
from arksim.simulation_engine import run_simulation, SimulationInput
from arksim.evaluator import run_evaluation, EvaluationInput
from arksim.config import AgentConfig
from arksim.scenario import Scenarios

BANK_INSURANCE_DIR = "./examples/bank-insurance"

# Example agent config (OpenAI; match bank-insurance config.yaml)
agent_config = AgentConfig(
    agent_type="chat_completions",
    agent_name="bank-insurance",
    api_config={
        "endpoint": "https://api.openai.com/v1/chat/completions",
        "headers": {
            "Content-Type": "application/json",
            "Authorization": "Bearer ${OPENAI_API_KEY}",
        },
        "body": {
            "model": "gpt-5.1",
            "messages": [
                {"role": "system", "content": "You are a customer service chatbot for XYZ Bank insurance."}
            ],
        },
    },
)

scenarios = Scenarios.load(f"{BANK_INSURANCE_DIR}/scenarios.json")

simulation = asyncio.run(run_simulation(SimulationInput(
    agent_config=agent_config,
    num_conversations_per_scenario=1,
    max_turns=5,
    num_workers="50",
    output_file_path=f"{BANK_INSURANCE_DIR}/results/simulation.json",
)))

evaluation = run_evaluation(EvaluationInput(
    scenario_file_path=f"{BANK_INSURANCE_DIR}/scenarios.json",
    output_dir=f"{BANK_INSURANCE_DIR}/results",
    custom_metrics_file_paths=[f"{BANK_INSURANCE_DIR}/custom_metrics.py"],
    metrics_to_run=[
        "faithfulness",
        "helpfulness",
        "coherence",
        "relevance",
        "goal_completion",
        "agent_behavior_failure",
    ],
    model="gpt-5.1",
    provider="openai",
    num_workers=50,
    generate_html_report=True,
), simulation=simulation, scenarios=scenarios)

print("Simulation and evaluation complete.")
print(f"Simulated {len(simulation.conversations)} conversations.")
print(f"Evaluation results saved to {BANK_INSURANCE_DIR}/results")

Run simulation and evaluation from your browser with the arksim web UI.

arksim ui

Your browser opens at http://localhost:8080. Set your API key in the environment before starting (e.g. OPENAI_API_KEY). Then open examples/bank-insurance/config.yaml or create a config that points at your chosen example.

You can run e-commerce and openclaw the same way from examples/e-commerce and examples/openclaw. See the example guides in Next Steps below.

View Results

Open final_report.html or evaluation.json in your results/evaluation directory for metrics and failure analysis.Use these to see how your agent performed and where it failed or could be improved.

Next Steps

Now that you’ve run your first simulation and evaluation, here’s where to go next.

Explore the core concepts: Dive deeper into Scenarios, Simulation, and Evaluation to understand how each piece works and how to configure them for your agent.
Explore the examples: Run ArkSim against E-commerce, Insurance, and Personal AI assistant (OpenClaw) to see different use cases and configs.

​Prerequisites

​Get started

​Next Steps

Prerequisites

Get started

Next Steps