Skip to main content

Overview

Use this stage for scenario building only with your agent’s configuration and knowledge base. When to use this:
  • Generating scenarios once to reuse for multiple simulation runs
  • Reviewing or editing scenarios before testing
  • Creating reproducible test sets for consistent evaluation
What it does: Analyzes your agent’s knowledge base and capabilities to generate realistic test scenarios with user personas, intents, and expected behaviors. Output: Scenarios are saved to output_dir (default: ./results/scenario/) and can be used with the simulate command.

Running the Command

./run_arksim.sh build config_build.yaml

Configuration File

Understanding Key Settings

  • enable_topic_modeling — Uses AI to categorize your knowledge base into topics for diverse scenario coverage
  • oversample_ratio — Generates extra scenarios to ensure variety, then selects the best ones
  • num_conversations_per_persona — How many different scenarios each generated user persona should have

YAML Configuration

# Path to agent setup directory
agent_setup_dir: ./examples/bank-insurance

# Output directory for scenario results (default: scenario/)
output_dir: null

# Scenario generation settings
user_attribute_file: null                # Path to the user attributes file
num_conversations: 5                     # Number of conversations to generate/simulate
num_conversations_per_persona: 1         # Number of conversations per persona
oversample_ratio: 3                      # Oversample ratio for user attributes generation
seed: null                               # Random seed for reproducibility
knowledge_method: null                   # Knowledge clustering method: knee, density_cp, ppr, hdbscan, meanshift, affprop, ratio
enable_topic_modeling: true              # Whether to enable topic modeling for knowledge

# LLM settings
provider: null                           # LLM provider (e.g., openai, azure)
model: gpt-5.1                           # LLM model used by Arksim (not your agent)
temperature: 0.1                         # Temperature for LLM generation
embedding_model: text-embedding-3-small  # Embedding model name for vector operations

Customize User Attributes (Optional)

A custom user attributes file lets you:
  • Control which attributes define simulated user personas
  • Choose how values are set — from a fixed list or generated by the LLM
The path to this file is set in your Arksim config (e.g. config.yaml) under user_attribute_file. Leave it null to use the built-in file at scenario_builder/utils/user_attributes.json.

Set the File Path

The path to the user attributes file is set in the Arksim configuration file (e.g. config.yaml used by arksim run).
PropertyValue
Config keyuser_attribute_file
Typestring or null
Defaultnull (use built-in default)
# config.yaml – use built-in attributes
user_attribute_file: null
  • Relative paths are resolved from the current working directory when you run Arksim (e.g. ./config/user_attributes.json)
  • Absolute paths are supported (e.g. /path/to/user_attributes.json)
If user_attribute_file is set but the file is missing or invalid, Arksim falls back to the built-in default and logs a warning.

How the File is Used

1

Load at scenario build

Arksim calls load_user_attributes(user_attribute_file). If the config value is set, that path is used; otherwise the built-in default is used.
2

Build populations

Loaded attributes drive population generation:
  • Individual personas use the individual section
  • B2B / B2C personas use the b2b or b2c section based on business_type in your knowledge config (e.g. knowledge.json)
3

Resolve each attribute

For each attribute:
  • generate_values: false — values are sampled from the values array (e.g. demographics, fixed categories)
  • generate_values: true — values are generated by the LLM using the attribute’s description; values can be []
4

Use in simulation

The resulting attributes are attached to each simulated user and used for goals, profiles, and conversation behavior (e.g. demographics, B2B deal stage, B2C spending behavior).

Required Structure

The JSON file must have exactly three root keys: "individual", "b2b", and "b2c". Each root holds a tree of categories; attribute definitions are the leaves.
FieldTypeRequiredDescription
valuesstring[]YesList of allowed values. Use [] when generate_values is true.
generate_valuesbooleanYestrue = LLM generates values; false = sample from values.
descriptionstringRequired if generate_values: trueShort description for the LLM; ignored when generate_values: false.
Nested structure is allowed (e.g. individual.demographic.age, b2b.deal_stage). Any leaf that has both values and generate_values is treated as one attribute. Example — LLM-generated attribute:
{
  "age": {
    "description": "Age of the person in years (e.g. 25, 42).",
    "values": [],
    "generate_values": true
  }
}
Example — Sampled attribute:
{
  "deal_stage": {
    "description": "The stage of the deal process the individual is currently in.",
    "values": ["new leads", "demo schedule", "after demo", "contract sent", "contract signed"],
    "generate_values": false
  }
}

Best Practices

Use a custom file when you need domain-specific personas (e.g. industry, deal stage, budget, company size) or want to restrict demographics to a fixed set.
Keep description clear and short for any attribute with generate_values: true; the LLM uses it to generate realistic, consistent values.
  • generate_values: false for closed sets (e.g. sex, education, deal_stage) — reproducible and easy to analyze
  • generate_values: true for open-ended fields (e.g. company name, job title, location, budget) — always provide a good description
Run a small test (1–2 conversations) after changing the attributes file. Version the file with your scenario/config for reproducible runs.

Sample File

Below is a minimal valid sample with all three roots and both sampled and LLM-generated attributes.
{
  "individual": {
    "demographic": {
      "age": {
        "description": "Age of the person in years.",
        "values": [],
        "generate_values": true
      },
      "education": {
        "values": ["High school", "Bachelor's", "Master's", "PhD"],
        "generate_values": false
      }
    },
    "contact_information": {
      "name": {
        "description": "Full name (First and Last).",
        "values": [],
        "generate_values": true
      }
    }
  },
  "b2b": {
    "customer_type": {
      "values": ["new prospect", "returning customer"],
      "generate_values": false
    },
    "company_name": {
      "description": "Name of the company or organization.",
      "values": [],
      "generate_values": true
    }
  },
  "b2c": {
    "customer_type": {
      "values": ["new prospect", "returning customer"],
      "generate_values": false
    },
    "budget": {
      "description": "Budget range or amount they are willing to spend.",
      "values": [],
      "generate_values": true
    }
  }
}
The full default file is at simulator/scenario_builder/utils/user_attributes.json in the Arksim package. Copy and edit it as a starting point for your custom user_attribute_file.

Next Steps

After scenario building completes, proceed to the Simulate Conversation to run conversations with your agent using the generated scenarios.