Skip to main content
This page documents the structure of the main artifacts used and produced by Arksim: scenario files, simulation output, evaluation output, and the run configuration file. Use it when building integrations, writing custom tooling, or debugging pipeline outputs.
For agent connection details (Chat Completions, A2A, and agent_config.json), see Agent configurations.

Scenarios (scenarios.json)

The scenario file is a single JSON document that lists scenario objects. Each scenario describes one simulated user and one conversation session. It is the input to Simulation and is referenced during Evaluation.

Schema

schema_version
string
required
Schema version identifier. Use "v1".
scenarios
list[Scenario]
required
List of scenario objects.

Simulation output (simulation.json)

Written by arksim simulate or the simulation step of arksim simulate-evaluate. Path is set by output_file_path in your config.

Schema

schema_version
string
required
Output schema version (e.g. "v1").
simulator_version
string
required
Arksim package version that produced this file.
generated_at
string
required
ISO-8601 UTC timestamp when the file was generated.
conversations
list[Conversation]
required
One record per simulated conversation.

Evaluation output (evaluation.json)

Written by arksim evaluate or the evaluation step of arksim simulate-evaluate. Path is {output_dir}/evaluation.json.

Schema

schema_version
string
required
Output schema version (e.g. "v1").
generated_at
string
required
ISO-8601 UTC timestamp when the file was generated.
evaluator_version
string
required
Arksim package version that produced this file.
conversations
list[ConversationEvaluation]
required
One record per conversation evaluated.
unique_errors
list[UniqueError]
required
Deduplicated behavior failures across all conversations.

Run configuration (config.yaml)

The same YAML file can be used for arksim simulate, arksim evaluate, and arksim simulate-evaluate. For simulate-evaluate, simulation and evaluation settings are merged from a single config.

Simulation keys

Used by simulate and the simulation phase of simulate-evaluate.
KeyTypeDescription
agent_config_file_pathstringPath to agent_config.json.
scenario_file_pathstringPath to scenarios.json.
num_conversations_per_scenariointNumber of conversations to run per scenario.
max_turnsintMaximum turns per conversation.
output_file_pathstringPath where simulation.json is written.
simulated_user_prompt_templatestringOptional. Jinja template for the simulated user system prompt.
modelstringLLM model (e.g. gpt-5.1).
providerstringLLM provider (e.g. openai).
num_workersstring | intParallel workers; use "auto" to auto-scale.

Evaluation keys

Used by evaluate and the evaluation phase of simulate-evaluate.
KeyTypeDescription
scenario_file_pathstringPath to scenarios.json (for goal and knowledge during evaluation).
simulation_file_pathstringPath to simulation output (simulation.json). Omitted when running simulate-evaluate (output passed in memory).
output_dirstringDirectory for evaluation results; evaluation.json and optional reports are written here.
custom_metrics_file_pathslist[string]Paths to Python files defining custom metrics.
metrics_to_runlist[string]Names of built-in metrics to run. If empty, all built-in metrics run.
generate_html_reportbooleanWhether to generate final_report.html.
score_thresholdfloat | nullIf set, exit with non-zero code when any conversation score is below this value.
modelstringLLM model for scoring.
providerstringLLM provider.
num_workersstring | intParallel workers; use "auto" for auto-scale.
model, provider, and num_workers are shared across simulation and evaluation when both run from the same config.