Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.arklex.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is ArkSim?

ArkSim is an open-source agent testing tool designed to help developers validate, simulate, and harden their agent systems. It focuses on reproducible interaction simulations to identify reliability and idempotency gaps before production deployment.
ArkSim Workflow

Who is it for?

Developers looking to ensure their agents are robust, reliable, and ready for real-world interactions, without relying on manual testing.

What makes it different?

Open-source

Fully transparent and customizable, letting you extend or adapt the tool as needed.

End-to-end testing

Test your agent in realistic scenarios to uncover gaps in behavior or reliability before production.

Pre-built scenarios

Use predefined scenarios to quickly validate your agent and jump-start testing effectively.

Easily extensible

Integrate with existing systems and tailor the testing environment to your specific needs.

Core Capabilities

Scenarios

A scenario is a test case that defines a simulated user’s attributes, goals, and prior knowledge. It allows you to test different versions of your agent against the same set of scenarios to compare behavior and catch regressions.

Simulation

Simulation turns scenarios into live, multi-turn conversations with your agent. You get full transcripts (and optional metadata) that you can critically inspect or pass to evaluation.

Evaluation

Evaluation scores your agent on those conversations. You get quantitative and qualitative metrics (e.g. helpfulness, faithfulness) and agent behavior failures that you can prioritize and fix.

Frequently Asked Questions

ArkSim generates realistic multi-turn conversations between LLM-powered synthetic users and your agent, then evaluates every turn. Each synthetic user has a distinct profile, goal, and knowledge level. This reveals failures that only emerge across multiple conversation turns, like losing context, calling the wrong tool, or contradicting earlier responses.
Install with pip install arksim, run arksim init to scaffold a starter config and agent file, then run arksim simulate-evaluate config.yaml. See the quickstart guide for a full walkthrough.
ArkSim works with any AI agent, whether it is built with LangChain, CrewAI, OpenAI Agents SDK, or your own custom code. Connect through a Chat Completions HTTP endpoint, the A2A protocol, or load a Python agent class directly with no server needed. See the full list of supported integrations.
Yes. ArkSim runs as a CLI command that exits non-zero when quality thresholds are not met. Add it to any CI pipeline as a quality gate on every pull request. See the CI integration guide.
Built-in metrics include helpfulness, coherence, relevance, faithfulness, verbosity, goal completion, and agent behavior failure detection. You can also define custom quantitative and qualitative metrics with full access to the conversation context. See the evaluation guide.

Ready to start?

Test your AI agent with ArkSim. Follow the quick start or explore the source on GitHub.

Quick Start

Set up your first simulation in minutes.

Source Code

Explore full implementation on GitHub.