Skip to main content

Overview

Customize your simulation by controlling conversation count, model selection, parallelism, and evaluation settings using YAML configuration files or CLI flags.
For step-by-step execution with individual stage configurations, see the Advanced Usage section.

YAML Configuration

Use YAML configuration files to run the full pipeline (scenario building, conversation simulation, and conversation evaluation).

Understanding Key Settings

  • num_conversations — How many test conversations to generate. Start with 5-10, increase for thorough testing.
  • max_turns — Maximum back-and-forth exchanges per conversation. 5 is typical for most use cases.
  • num_workers — Parallel processing threads. Higher values = faster execution but more parallel API calls. Use auto to default to num_conversations.
  • model — The LLM used for scenario generation and evaluation (not your agent).

Configuration File

# Path to agent setup directory
agent_setup_dir: ./examples/bank-insurance

# Output directory for all pipeline results (default: ./results/)
output_dir: null

# --- Scenario Builder Settings ---
user_attribute_file: null                 # Path to user attributes file
num_conversations: 5                      # Number of conversations to generate
num_conversations_per_persona: 1          # Conversations per persona
oversample_ratio: 3                       # Oversample ratio for user attributes
seed: null                                # Random seed for reproducibility
knowledge_method: null                    # Clustering: knee, density_cp, ppr, hdbscan, meanshift, affprop, ratio
enable_topic_modeling: true               # Use topic modeling to classify knowledge
embedding_model: text-embedding-3-small   # Embedding model for vector operations

# --- Simulation Engine Settings ---
max_turns: 5                              # Maximum turns per conversation
simulated_user_behavior_instructions: []  # Custom instructions for simulated user behavior
display_time_delay: false                 # Add delays when displaying conversations

# --- Evaluation Settings ---
code_file_path: null                      # Path to code file for fix suggestions
entry_function: null                      # Entry function for code fix generation
generate_html_report: true                # Generate HTML report
score_threshold: null                     # Score threshold (0.0-1.0)

# --- Shared Settings ---
provider: null                            # LLM provider: openai, azure
model: gpt-5.1                            # LLM model to use
temperature: 0.1                          # LLM temperature
num_workers: auto                         # Number of parallel workers (use auto to default to num_conversations)
You can override any config value using command-line flags. See CLI Flags for the complete reference.

CLI Flags

Command-line flags override config file values and can be applied to both Binary and Docker runs.

Usage

Use --flag value (or --flag=value). Flags are written in kebab-case (e.g., --num-conversations) and are mapped to YAML keys automatically. Example (Binary):
./run_arksim.sh run config.yaml --num-conversations 2 --max-turns 2
Example (Docker):
docker run --rm \
  -v $(pwd)/examples:/app/examples \
  public.ecr.aws/f1v9v1i4/arklex/arksim:1.0.0-alpha \
  run config.yaml --num-conversations 10 --max-turns 3

Available Flags

FlagApplies toDescription
--agent-setup-dirbuild, simulate, evaluate, runPath to the agent setup directory (contains agent_config.json and knowledge.json).
--output-dirbuild, simulate, evaluate, runOutput directory. If omitted, a stage-specific default is used.
--modelbuild, simulate, evaluate, runLLM model used by Arksim (scenario generation and/or evaluation; not your agent).
--providerbuild, simulate, evaluate, runLLM provider (e.g., openai, azure).
--temperaturerunTemperature for LLM generation (full pipeline only).
--embedding-modelbuild, runEmbedding model name for vector operations.
--enable-topic-modelingbuild, runWhether to enable topic modeling for knowledge.
--knowledge-methodbuild, runKnowledge clustering method: knee, density_cp, ppr, hdbscan, meanshift, affprop, ratio.
--user-attribute-filebuild, runPath to the user attributes file.
--oversample-ratiobuild, runOversample ratio for user attributes generation.
--seedbuild, runRandom seed for reproducibility.
--num-conversationsbuild, simulate, runNumber of conversations to generate/simulate.
--num-conversations-per-personabuild, runNumber of conversations per persona.
--max-turnssimulate, runMaximum turns per conversation.
--num-workerssimulate, evaluate, runNumber of parallel workers.
--display-time-delaysimulate, runAdd delays between turns when displaying conversations.
--input-dirsimulate, evaluateInput directory (simulate: pre-built scenarios; evaluate: simulation outputs).
--generate-html-reportevaluate, runWhether to generate an HTML report.
--score-thresholdevaluate, runIf any per-conversation final score is below this threshold, exit with non-zero code.
--code-file-pathevaluate, runPath to code file for fix generation (requires --entry-function).
--entry-functionevaluate, runEntry function for code fix generation (requires --code-file-path).