Scenarios (scenarios.json)
The scenario file is a single JSON document that lists scenario objects. Each scenario describes one simulated user and one conversation session. It is the input to Simulation and is referenced during Evaluation.
Schema
Schema version identifier. Use
"v1".List of scenario objects.
Simulation output (simulation.json)
Written by arksim simulate or the simulation step of arksim simulate-evaluate. Path is set by output_file_path in your config.
Schema
Output schema version (e.g.
"v1").Arksim package version that produced this file.
ISO-8601 UTC timestamp when the file was generated.
One record per simulated conversation.
Evaluation output (evaluation.json)
Written by arksim evaluate or the evaluation step of arksim simulate-evaluate. Path is {output_dir}/evaluation.json.
Schema
Output schema version (e.g.
"v1").ISO-8601 UTC timestamp when the file was generated.
Arksim package version that produced this file.
One record per conversation evaluated.
Deduplicated behavior failures across all conversations.
Run configuration (config.yaml)
The same YAML file can be used for arksim simulate, arksim evaluate, and arksim simulate-evaluate. For simulate-evaluate, simulation and evaluation settings are merged from a single config.
Simulation keys
Used bysimulate and the simulation phase of simulate-evaluate.
| Key | Type | Description |
|---|---|---|
agent_config_file_path | string | Path to agent_config.json. |
scenario_file_path | string | Path to scenarios.json. |
num_conversations_per_scenario | int | Number of conversations to run per scenario. |
max_turns | int | Maximum turns per conversation. |
output_file_path | string | Path where simulation.json is written. |
simulated_user_prompt_template | string | Optional. Jinja template for the simulated user system prompt. |
model | string | LLM model (e.g. gpt-5.1). |
provider | string | LLM provider (e.g. openai). |
num_workers | string | int | Parallel workers; use "auto" to auto-scale. |
Evaluation keys
Used byevaluate and the evaluation phase of simulate-evaluate.
| Key | Type | Description |
|---|---|---|
scenario_file_path | string | Path to scenarios.json (for goal and knowledge during evaluation). |
simulation_file_path | string | Path to simulation output (simulation.json). Omitted when running simulate-evaluate (output passed in memory). |
output_dir | string | Directory for evaluation results; evaluation.json and optional reports are written here. |
custom_metrics_file_paths | list[string] | Paths to Python files defining custom metrics. |
metrics_to_run | list[string] | Names of built-in metrics to run. If empty, all built-in metrics run. |
generate_html_report | boolean | Whether to generate final_report.html. |
score_threshold | float | null | If set, exit with non-zero code when any conversation score is below this value. |
model | string | LLM model for scoring. |
provider | string | LLM provider. |
num_workers | string | int | Parallel workers; use "auto" for auto-scale. |
model, provider, and num_workers are shared across simulation and evaluation when both run from the same config.