Scenarios (scenarios.json)
The scenario file is a single JSON document that lists scenario objects. Each scenario describes one simulated user and one conversation session. It is the input to Simulation and is referenced during Evaluation.
Schema
Schema version identifier. Use
"v1".List of scenario objects.
Simulation output (simulation.json)
Written by arksim simulate or the simulation step of arksim simulate-evaluate. Path is set by output_file_path in your config.
Schema
Output schema version (e.g.
"v1").ArkSim package version that produced this file.
UUID that uniquely identifies this simulation run. Referenced by the evaluation output as
simulation_id to link the two artifacts.ISO-8601 UTC timestamp when the file was generated.
One record per simulated conversation.
Evaluation output (evaluation.json)
Written by arksim evaluate or the evaluation step of arksim simulate-evaluate. Path is {output_dir}/evaluation.json.
Schema
Output schema version (e.g.
"v1").ISO-8601 UTC timestamp when the file was generated.
ArkSim package version that produced this file.
UUID that uniquely identifies this evaluation run. Generated fresh for each
arksim evaluate or arksim simulate-evaluate invocation.UUID of the simulation run that produced the conversations being evaluated. Copied from
simulation.json to link the two artifacts.One record per conversation evaluated.
Deduplicated behavior failures across all conversations.
Run configuration (config.yaml)
The same YAML file can be used for arksim simulate, arksim evaluate, and arksim simulate-evaluate. For simulate-evaluate, simulation and evaluation settings are merged from a single config.
Agent configuration
| Key | Type | Description |
|---|---|---|
agent_config | object | Inline agent configuration. See Agent configuration for chat_completions and a2a. |
Simulation keys
Used bysimulate and the simulation phase of simulate-evaluate.
| Key | Type | Description |
|---|---|---|
scenario_file_path | string | Path to scenarios.json. |
num_conversations_per_scenario | int | Number of conversations to run per scenario. |
max_turns | int | Maximum turns per conversation. |
output_file_path | string | Path where simulation.json is written. |
simulated_user_prompt_template | string | Optional. Jinja template for the simulated user system prompt. |
model | string | LLM model (e.g. gpt-5.1). |
provider | string | LLM provider (e.g. openai). |
num_workers | string | int | Parallel workers (default 50). Set "auto" to auto-scale. |
Evaluation keys
Used byevaluate and the evaluation phase of simulate-evaluate.
| Key | Type | Description |
|---|---|---|
scenario_file_path | string | Path to scenarios.json (for goal and knowledge during evaluation). |
simulation_file_path | string | Path to simulation output (simulation.json). Omitted when running simulate-evaluate (output passed in memory). |
output_dir | string | Directory for evaluation results; evaluation.json and optional reports are written here. |
custom_metrics_file_paths | list[string] | Paths to Python files defining custom metrics. |
metrics_to_run | list[string] | Names of built-in metrics to run. If empty, all built-in metrics run. |
generate_html_report | boolean | Whether to generate final_report.html. |
score_threshold | float | null | If set, exit with non-zero code when any conversation score is below this value. |
model | string | LLM model for scoring. |
provider | string | LLM provider. |
num_workers | string | int | Parallel workers (default 50). Set "auto" to auto-scale. |
model, provider, and num_workers are shared across simulation and evaluation when both run from the same config.