Skip to main content

Overview

This example walks through running ArkSim against an OpenClaw personal AI assistant. OpenClaw is an open-source assistant that runs on your own hardware and can manage tasks, smart home devices, messaging, calendar, and files. You use this setup to simulate and evaluate conversations against your own OpenClaw deployment. The agent is exposed via a Chat Completions-compatible endpoint; the example config.yaml is already configured for the OpenClaw gateway.

Prerequisites

Before getting started:
  • OpenClaw installed and configured: openclaw.ai
  • OpenClaw gateway running with the HTTP Chat Completions endpoint enabled
  • OpenAI API key: used by ArkSim to power the simulated user
  • OpenClaw gateway token: for authenticating to your OpenClaw instance (see ~/.openclaw/openclaw.json under gateway.auth.token)

Scenarios

The example ships with pre-built scenarios in scenarios.json representing realistic interactions with a personal AI assistant. Each scenario defines a simulated user with a distinct persona, goal, and background knowledge (files, smart home setup, contacts, tasks, etc.). The scenario goals are:
  • Manage reminders and to-do lists for files and storage (cleanup Downloads, cloud folders, storage alerts, important documents).
  • Control and automate smart home devices and scenes (lights, thermostat, blinds, locks, scenes like Good Morning, Movie Night, Leaving Home, Goodnight).
  • Send and manage messages across WhatsApp, iMessage, Slack, email, and Telegram using contacts and recent conversations.
  • Manage calendar and schedule (e.g. call dentist to confirm appointment, schedule car maintenance, recurring reminders, upcoming tasks).
  • Organize files and documents across local folders and cloud storage (cleanup Downloads, duplicate photos, Trash, important items and shared folders).
Scenarios are defined in scenarios.json in the example directory and can be edited or extended to reflect your assistant’s capabilities.

Setup

1

Enable Chat Completions in OpenClaw

In ~/.openclaw/openclaw.json, enable the HTTP Chat Completions endpoint:
{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": {
          "enabled": true
        }
      }
    }
  }
}
2

Start the OpenClaw gateway

openclaw gateway --port 18789 --verbose
3

Set environment variables

export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export OPENCLAW_TOKEN="<YOUR_OPENCLAW_GATEWAY_TOKEN>"
Your gateway token is in ~/.openclaw/openclaw.json under gateway.auth.token.
4

Verify the OpenClaw connection

Before running simulation, confirm the gateway is responding:
curl -sS "http://127.0.0.1:18789/v1/chat/completions" \
  -H "Authorization: Bearer $OPENCLAW_TOKEN" \
  -H "Content-Type: application/json" \
  -H "x-openclaw-agent-id: main" \
  -d '{"model":"openclaw","messages":[{"role":"user","content":"Hello"}]}'
You should receive a valid JSON response from the gateway.
5

Run simulation and evaluation

Run from the examples/openclaw directory:
cd examples/openclaw
arksim simulate-evaluate config.yaml
The config.yaml has the agent_config pointed at http://localhost:18789/v1/chat/completions with Authorization: Bearer ${OPENCLAW_TOKEN} and x-openclaw-agent-id: main.

Configuration

The example uses a single config file for both simulation and evaluation. Agent configuration is specified inline under the agent_config key:
# AGENT CONFIGURATION

agent_config:
  agent_type: chat_completions
  agent_name: openclaw
  api_config:
    endpoint: http://localhost:18789/v1/chat/completions
    headers:
      Content-Type: application/json
      Authorization: "Bearer ${OPENCLAW_TOKEN}"
      x-openclaw-agent-id: main
    body:
      model: openclaw
      messages:
        - role: system
          content: "You are a helpful personal assistant."
      enable_metadata: false

# SIMULATION SETTINGS

# Path to the scenarios file
scenario_file_path: ./scenarios.json

# Number of conversations per scenario to generate
num_conversations_per_scenario: 1

# Maximum turns per conversation
max_turns: 5

# Output file path for simulation results
output_file_path: ./results/simulation/simulation.json

# Jinja template for the simulated user's system prompt
simulated_user_prompt_template: null

# EVALUATION SETTINGS

# Output directory for evaluation results
output_dir: ./results/evaluation

# Paths to Python files defining custom
# QuantitativeMetric or QualitativeMetric subclasses
custom_metrics_file_paths: []

# Built-in metrics to run; if empty, all built-in metrics run
metrics_to_run:
  - faithfulness
  - helpfulness
  - coherence
  - verbosity
  - relevance
  - goal_completion
  - agent_behavior_failure

# Generate HTML report
generate_html_report: true

# Numeric thresholds: per-metric minimum scores on native scale
# numeric_thresholds:
#   overall_score: 0.7
#   goal_completion: 0.6

# SHARED SETTINGS
# LLM model
model: gpt-5.1

# LLM provider
provider: openai

# Workers for parallel processing
num_workers: 50

Output

Results are written under the example directory:
LocationContents
./results/simulation/simulation.jsonSimulated conversations from the simulation step
./results/evaluation/evaluation.jsonEvaluation results (per-turn and per-conversation scores, unique errors)
./results/evaluation/final_report.htmlInteractive HTML report for browsing and sharing results

Example Files

FileDescription
config.yamlSimulate and evaluate configuration with inline OpenClaw agent config.
scenarios.jsonPre-built scenarios for the personal assistant.

Adapting to Your Own Assistant

To use this example as a starting point for your own personal assistant:
  1. Scenarios: Edit or add scenarios in scenarios.json so goals and knowledge reflect your assistant’s capabilities.
  2. Agent config: If your endpoint, port, or auth headers differ from the OpenClaw defaults, update the agent_config section within config.yaml file accordingly.
See Agent configuration for supported protocols and how to configure agent connections.