Skip to main content

Overview

This example walks through running Arksim against an OpenClaw personal AI assistant. OpenClaw is an open-source assistant that runs on your own hardware and can manage tasks, smart home devices, messaging, calendar, and files. You use this setup to simulate and evaluate conversations against your own OpenClaw deployment. The agent is exposed via a Chat Completions-compatible endpoint; the example agent_config.json is already configured for the OpenClaw gateway.

Prerequisites

Before getting started:
  • OpenClaw installed and configured: openclaw.ai
  • OpenClaw gateway running with the HTTP Chat Completions endpoint enabled
  • OpenAI API key: used by Arksim to power the simulated user
  • OpenClaw gateway token: for authenticating to your OpenClaw instance (see ~/.openclaw/openclaw.json under gateway.auth.token)

Scenarios

The example ships with pre-built scenarios in scenarios.json representing realistic interactions with a personal AI assistant. Each scenario defines a simulated user with a distinct persona, goal, and background knowledge (files, smart home setup, contacts, tasks, etc.). The scenario goals are:
  • Manage reminders and to-do lists for files and storage (cleanup Downloads, cloud folders, storage alerts, important documents).
  • Control and automate smart home devices and scenes (lights, thermostat, blinds, locks, scenes like Good Morning, Movie Night, Leaving Home, Goodnight).
  • Send and manage messages across WhatsApp, iMessage, Slack, email, and Telegram using contacts and recent conversations.
  • Manage calendar and schedule (e.g. call dentist to confirm appointment, schedule car maintenance, recurring reminders, upcoming tasks).
  • Organize files and documents across local folders and cloud storage (cleanup Downloads, duplicate photos, Trash, important items and shared folders).
Scenarios are defined in scenarios.json in the example directory and can be edited or extended to reflect your assistant’s capabilities.

Setup

1

Enable Chat Completions in OpenClaw

In ~/.openclaw/openclaw.json, enable the HTTP Chat Completions endpoint:
{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": {
          "enabled": true
        }
      }
    }
  }
}
2

Start the OpenClaw gateway

openclaw gateway --port 18789 --verbose
3

Set environment variables

export OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
export OPENCLAW_TOKEN="<YOUR_OPENCLAW_GATEWAY_TOKEN>"
Your gateway token is in ~/.openclaw/openclaw.json under gateway.auth.token.
4

Verify the OpenClaw connection

Before running simulation, confirm the gateway is responding:
curl -sS "http://127.0.0.1:18789/v1/chat/completions" \
  -H "Authorization: Bearer $OPENCLAW_TOKEN" \
  -H "Content-Type: application/json" \
  -H "x-openclaw-agent-id: main" \
  -d '{"model":"openclaw","messages":[{"role":"user","content":"Hello"}]}'
You should receive a valid JSON response from the gateway.
5

Run simulation and evaluation

Run from the examples/openclaw directory:
cd examples/openclaw
arksim simulate-evaluate config.yaml
The example agent_config.json points at http://localhost:18789/v1/chat/completions and uses Authorization: Bearer ${OPENCLAW_TOKEN} and x-openclaw-agent-id: main.

Configuration

The example uses a single config file for both simulation and evaluation. Key settings:
# Agent and scenarios (paths relative to config file location)
agent_config_file_path: ./agent_config.json
scenario_file_path: ./scenarios.json

# Simulation
num_conversations_per_scenario: 1
max_turns: 5
output_file_path: ./results/simulation/simulation.json

# Evaluation
output_dir: ./results/evaluation
custom_metrics_file_paths: []

metrics_to_run:
  - faithfulness
  - helpfulness
  - coherence
  - verbosity
  - relevance
  - goal_completion
  - agent_behavior_failure
generate_html_report: true

# Shared
model: gpt-5.1
provider: openai
num_workers: auto
The config also defines a simulated_user_prompt_template (Jinja) that uses scenario.agent_context, scenario.goal, scenario.knowledge, and simulation.profile to drive the simulated user.
The example directory contains config.yaml, scenarios.json, and agent_config.json. There are no separate simulate or evaluate config files.

Output

Results are written under the example directory:
LocationContents
./results/simulation/simulation.jsonSimulated conversations from the simulation step
./results/evaluation/evaluation.jsonEvaluation results (per-turn and per-conversation scores, unique errors)
./results/evaluation/final_report.htmlInteractive HTML report for browsing and sharing results

Example Files

FileDescription
config.yamlSimulate and evaluate configuration (single pipeline).
scenarios.jsonPre-built scenarios for the personal assistant.
agent_config.jsonOpenClaw agent config: Chat Completions endpoint, port 18789, ${OPENCLAW_TOKEN}, and x-openclaw-agent-id: main.

Adapting to Your Own Assistant

To use this example as a starting point for your own personal assistant:
  1. Scenarios: Edit or add scenarios in scenarios.json so goals and knowledge reflect your assistant’s capabilities.
  2. Agent config: If your endpoint, port, or auth headers differ from the OpenClaw defaults, update agent_config.json accordingly.
See Agent Compatibility for supported protocols and how to configure agent_config.json.