Schema Reference - ArkSim Docs

This page documents the structure of the main artifacts used and produced by ArkSim: scenario files, simulation output, evaluation output, and the run configuration file. Use it when building integrations, writing custom tooling, or debugging pipeline outputs.

For agent connection details (Chat Completions, A2A, and config), see Agent configuration.

Scenarios (`scenarios.json`)

The scenario file is a single JSON document that lists scenario objects. Each scenario describes one simulated user and one conversation session. It is the input to Simulation and is referenced during Evaluation.

Schema

schema_version

string

required

Schema version identifier. Use "v1".

scenarios

list[Scenario]

required

List of scenario objects.

Show Scenario fields

scenario_id

string

required

Unique identifier for this scenario.

user_id

string

required

Unique identifier for the simulated user.

goal

string

required

The user’s conversational goal, passed into the simulated user prompt. Write in second person (“You want to…”).

agent_context

string

required

Context about the agent (e.g. company or product overview) supplied to the simulated user.

user_profile

string

required

Natural-language persona description for the simulated user. Used directly in the simulation prompt. Write in second person (“You are…”).

knowledge

list[KnowledgeItem]

Knowledge items injected into the simulated user’s context. Can be an empty list. Each item has:

content (string, required): Text used as the user’s background knowledge. (Additional keys such as metadata may be present in the file for traceability but are not part of the loaded model.)

origin

dict[string, Any]

Provenance metadata (e.g. target_agent_capability, goal_raw, user_attributes). Not used by the simulator or evaluator.

Simulation output (`simulation.json`)

Written by arksim simulate or the simulation step of arksim simulate-evaluate. Path is set by output_file_path in your config.

Schema

schema_version

string

required

Output schema version (e.g. "v1").

simulator_version

string

required

ArkSim package version that produced this file.

simulation_id

string

required

UUID that uniquely identifies this simulation run. Referenced by the evaluation output as simulation_id to link the two artifacts.

generated_at

string

required

ISO-8601 UTC timestamp when the file was generated.

conversations

list[Conversation]

required

One record per simulated conversation.

Show Conversation fields

conversation_id

string

required

Unique ID for this conversation.

scenario_id

string

required

ID of the scenario used for this conversation.

conversation_history

list[Message]

required

Ordered list of messages in the conversation.Each message has: turn_id (int), message_id (string), role ("simulated_user" or "assistant"), content (string).

simulated_user_prompt

SimulatedUserPrompt

required

Rendered simulated user prompt and variables used to drive the conversation.

simulated_user_prompt_template: The Jinja template used for the simulated user system prompt.
variables: Dict used to render the template. Keys are scenario.agent_context, scenario.goal, scenario.knowledge, and scenario.user_profile.

Evaluation output (`evaluation.json`)

Written by arksim evaluate or the evaluation step of arksim simulate-evaluate. Path is {output_dir}/evaluation.json.

Schema

schema_version

string

required

Output schema version (e.g. "v1").

generated_at

string

required

ISO-8601 UTC timestamp when the file was generated.

evaluator_version

string

required

ArkSim package version that produced this file.

evaluation_id

string

required

UUID that uniquely identifies this evaluation run. Generated fresh for each arksim evaluate or arksim simulate-evaluate invocation.

simulation_id

string

required

UUID of the simulation run that produced the conversations being evaluated. Copied from simulation.json to link the two artifacts.

conversations

list[ConversationEvaluation]

required

One record per conversation evaluated.

Show ConversationEvaluation fields

conversation_id

string

required

ID of the conversation.

goal_completion_score

float

required

How fully the user goal was achieved (0–1).

goal_completion_reason

string

required

Explanation for the goal completion score.

turn_success_ratio

float

required

Proportion of turns with no detected behavior failure (0–1).

overall_agent_score

float

required

Combined score for the conversation (0–1). Computed from turn_success_ratio and goal_completion_score.

evaluation_status

string

required

One of: completed, partial_failure, failed.

turn_scores

list[TurnEvaluation]

required

Per-turn scores and failure labels.Each turn has: turn_id, scores (list of metric results with name, value, optional reason, optional metadata), turn_score, turn_behavior_failure, turn_behavior_failure_reason, qual_scores, unique_error_ids.

unique_errors

list[UniqueError]

required

Deduplicated behavior failures across all conversations.

Show UniqueError fields

unique_error_id

string

required

Unique ID (e.g. UUID) for this error pattern.

behavior_failure_category

string

required

Category of the failure (e.g. lack of specific information, repetition, false information).

unique_error_description

string

required

Human-readable description of the error.

severity

string

Default "medium". Severity of the error.

occurrences

list[Occurrence]

required

List of (conversation_id, turn_id) where this error occurred.

Run configuration (`config.yaml`)

The same YAML file can be used for arksim simulate, arksim evaluate, and arksim simulate-evaluate. For simulate-evaluate, simulation and evaluation settings are merged from a single config.

Agent configuration

Key	Type	Description
`agent_config`	object	Inline agent configuration. See Agent configuration for `chat_completions` and `a2a`.

Simulation keys

Used by simulate and the simulation phase of simulate-evaluate.

Key	Type	Description
`scenario_file_path`	string	Path to `scenarios.json`.
`num_conversations_per_scenario`	int	Number of conversations to run per scenario.
`max_turns`	int	Maximum turns per conversation.
`output_file_path`	string	Path where `simulation.json` is written.
`simulated_user_prompt_template`	string	Optional. Jinja template for the simulated user system prompt.
`model`	string	LLM model (e.g. `gpt-5.1`).
`provider`	string	LLM provider (e.g. `openai`).
`num_workers`	string \| int	Parallel workers (default `50`). Set `"auto"` to auto-scale.

Evaluation keys

Used by evaluate and the evaluation phase of simulate-evaluate.

Key	Type	Description
`scenario_file_path`	string	Path to `scenarios.json` (for goal and knowledge during evaluation).
`simulation_file_path`	string	Path to simulation output (`simulation.json`). Omitted when running `simulate-evaluate` (output passed in memory).
`output_dir`	string	Directory for evaluation results; `evaluation.json` and optional reports are written here.
`custom_metrics_file_paths`	list[string]	Paths to Python files defining custom metrics.
`metrics_to_run`	list[string]	Names of built-in metrics to run. If empty, all built-in metrics run.
`generate_html_report`	boolean	Whether to generate `final_report.html`.
`score_threshold`	float \| null	If set, exit with non-zero code when any conversation score is below this value.
`model`	string	LLM model for scoring.
`provider`	string	LLM provider.
`num_workers`	string \| int	Parallel workers (default `50`). Set `"auto"` to auto-scale.

model, provider, and num_workers are shared across simulation and evaluation when both run from the same config.

​Scenarios (scenarios.json)

​Schema

​Simulation output (simulation.json)

​Schema

​Evaluation output (evaluation.json)

​Schema

​Run configuration (config.yaml)

​Agent configuration

​Simulation keys

​Evaluation keys

Scenarios (`scenarios.json`)

Schema

Simulation output (`simulation.json`)

Schema

Evaluation output (`evaluation.json`)

Schema

Run configuration (`config.yaml`)

Agent configuration

Simulation keys

Evaluation keys