Documentation Index
Fetch the complete documentation index at: https://docs.arklex.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Option A: Test your own agent
If you already have an agent running, use arksim init to scaffold a starter config and scenarios file, then point it at your endpoint.
Install and set your API key
pip install arksim
export OPENAI_API_KEY="your-api-key"
For other providers: pip install "arksim[anthropic]" or pip install "arksim[google]".Scaffold a starter config
This creates three files in the current directory:
- config.yaml pointing at
./my_agent.py with sensible defaults
- scenarios.json with four domain-agnostic starter scenarios (happy path, out of scope, ambiguous intent, multi-step)
- my_agent.py with a
BaseAgent subclass ready to fill in (no server needed)
Open my_agent.py and replace the execute() body with your agent logic. All files include inline comments explaining each field.For HTTP or A2A agents, use arksim init --agent-type chat_completions or arksim init --agent-type a2a instead. Use --force to re-scaffold if files already exist.
Run simulation and evaluation
arksim simulate-evaluate config.yaml
View results
Open results/final_report.html in your browser for an interactive report with scores, failure categories, and full conversation transcripts.
Option B: Explore a pre-built example
If you want to see ArkSim in action before connecting your own agent, try one of the included examples.
Install and set your API key
pip install arksim
export OPENAI_API_KEY="your-api-key"
Download examples
This creates an examples/ folder with ready-to-run projects (bank-insurance, e-commerce, customer-service, openclaw). Run simulation and evaluation
cd examples/bank-insurance
arksim simulate-evaluate config.yaml
import asyncio
from arksim.simulation_engine import run_simulation, SimulationInput
from arksim.evaluator import run_evaluation, EvaluationInput
from arksim.config import AgentConfig
from arksim.scenario import Scenarios
EXAMPLE_DIR = "./examples/bank-insurance"
agent_config = AgentConfig(
agent_type="chat_completions",
agent_name="bank-insurance",
api_config={
"endpoint": "https://api.openai.com/v1/chat/completions",
"headers": {
"Content-Type": "application/json",
"Authorization": "Bearer ${OPENAI_API_KEY}",
},
"body": {
"model": "gpt-5.1",
"messages": [
{"role": "system", "content": "You are a customer service chatbot for XYZ Bank insurance."}
],
},
},
)
scenarios = Scenarios.load(f"{EXAMPLE_DIR}/scenarios.json")
simulation = asyncio.run(run_simulation(SimulationInput(
agent_config=agent_config,
num_conversations_per_scenario=1,
max_turns=5,
num_workers="50",
output_file_path=f"{EXAMPLE_DIR}/results/simulation.json",
)))
evaluation = run_evaluation(EvaluationInput(
scenario_file_path=f"{EXAMPLE_DIR}/scenarios.json",
output_dir=f"{EXAMPLE_DIR}/results",
custom_metrics_file_paths=[f"{EXAMPLE_DIR}/custom_metrics.py"],
metrics_to_run=[
"faithfulness", "helpfulness", "coherence",
"relevance", "goal_completion", "agent_behavior_failure",
],
model="gpt-5.1",
provider="openai",
num_workers=50,
generate_html_report=True,
), simulation=simulation, scenarios=scenarios)
print(f"Done. Simulated {len(simulation.conversations)} conversations.")
Opens at http://localhost:8080. Load any example config and run from the browser. View results
Open results/evaluation/final_report.html in your browser for scores, failure analysis, and full conversation transcripts.
Using other LLM providers
ArkSim uses OpenAI by default for both the simulated user and the evaluator. To use Anthropic or Google instead, set the provider in your config.yaml:
pip install "arksim[anthropic]"
export ANTHROPIC_API_KEY="your-api-key"
model: claude-opus-4-6
provider: anthropic
pip install "arksim[google]"
export GEMINI_API_KEY="your-api-key"
model: gemini-2.5-flash
provider: google
Next Steps
Now that you’ve run your first simulation and evaluation, here’s where to go next.