Agent Simulation¶
Drive your agent through realistic multi-turn conversations without writing test transcripts by hand. Three LLMs are in play:
- Your agent — the target under test (a hosted Orq agent, a callback, or an Orq deployment).
- User simulator — plays a persona pursuing a scenario goal, turn by turn.
- Judge — scores whether the goal was met and whether any rules were broken.
sequenceDiagram
participant U as User simulator
participant A as Agent under test
participant J as Judge
U->>A: next user turn
A-->>U: agent reply
loop until max_turns or stop condition
U->>A: follow-up turn
A-->>U: response
end
U->>J: full transcript + scenario
Note over J: scores goal_achieved / criteria_met Generate from a one-line description¶
The fastest start: generate_and_simulate() synthesizes the personas, scenarios, and opening messages from a short description of your agent — no hand-written Persona(...) / Scenario(...).
Point it at a hosted Orq agent with target="agent:<key>" (the agent key from AI Studio → Agents). The simulator and judge LLMs route through Orq by default.
import asyncio
from evaluatorq.simulation import generate_and_simulate
async def main():
results = await generate_and_simulate(
evaluation_name="support-agent-sim",
target="agent:my-support-agent", # hosted Orq agent, routed via ORQ_API_KEY
agent_description=(
"Customer support agent for an e-commerce store; "
"handles refunds, orders, and product questions."
),
num_personas=3,
num_scenarios=4, # → 12 persona × scenario simulations
max_turns=6,
evaluator_names=["goal_achieved", "criteria_met"],
)
passed = sum(r.goal_achieved for r in results)
print(f"Pass rate: {passed}/{len(results)}")
if __name__ == "__main__":
asyncio.run(main())
Pass sim_model= to route the simulator and judge through OpenAI directly. Use target_callback= for the agent under test.
import asyncio
from openai import AsyncOpenAI
from evaluatorq.contracts import Message
from evaluatorq.simulation import generate_and_simulate
client = AsyncOpenAI()
SYSTEM = "You are a customer support agent for Acme Corp. Be concise and helpful."
async def openai_agent(messages: list[Message]) -> str:
history = [{"role": "system", "content": SYSTEM}]
history += [{"role": m.role, "content": m.content or ""} for m in messages]
resp = await client.chat.completions.create(model="gpt-4o-mini", messages=history)
return resp.choices[0].message.content or ""
async def main():
results = await generate_and_simulate(
evaluation_name="support-agent-sim-openai",
target_callback=openai_agent,
agent_description=(
"Customer support agent for an e-commerce store; "
"handles refunds, orders, and product questions."
),
num_personas=3,
num_scenarios=4,
sim_model="gpt-4o-mini", # simulator + judge on OpenAI directly
max_turns=6,
evaluator_names=["goal_achieved", "criteria_met"],
upload_results=False,
)
passed = sum(r.goal_achieved for r in results)
print(f"Pass rate: {passed}/{len(results)}")
if __name__ == "__main__":
asyncio.run(main())
agent_description drives generation; num_personas × num_scenarios is how many conversations run. The simulator and judge LLMs resolve their provider by precedence: if ORQ_API_KEY is set they route through the Orq AI Router; otherwise they fall back to OpenAI via OPENAI_API_KEY (an explicitly passed client always wins). See Configuration.
Seed by archetype¶
The middle ground between "just give me five" and specifying every trait: name the archetype, and generate_persona() / generate_scenario() fill the rest. You get back real Persona / Scenario objects to inspect, tweak, and pass to simulate().
import asyncio
from evaluatorq.simulation import generate_persona, generate_scenario, simulate
async def main():
persona = await generate_persona(
"angry customer",
agent_description="e-commerce support agent",
)
scenario = await generate_scenario("disputes a refund denial")
results = await simulate(
evaluation_name="seeded-simulation",
target="agent:my-support-agent",
personas=[persona],
scenarios=[scenario],
max_turns=6,
evaluator_names=["goal_achieved", "criteria_met"],
)
print(f"Goal achieved: {results[0].goal_achieved}")
if __name__ == "__main__":
asyncio.run(main())
Batch forms generate_personas([...]) / generate_scenarios([...]) take a list of seeds and return one object each.
Full control: hand-build personas¶
When you want exact personas and pass/fail criteria, build them yourself and call simulate(). A persona is who is talking (patience, assertiveness, tone); a scenario is what they want plus the criteria the agent must (or must not) satisfy.
A persona requires its core traits — name, patience, assertiveness, politeness, technical_level, communication_style, and background. Only emotional_arc and cultural_context default (to None). A scenario needs just name and goal; everything else, including criteria, is optional.
Pass target="agent:<key>" (the agent key from AI Studio → Agents) to route to a hosted Orq agent.
import asyncio
from evaluatorq.simulation import simulate
from evaluatorq.simulation.types import (
CommunicationStyle, Criterion, EmotionalArc, Persona, Scenario, StartingEmotion,
)
async def main():
persona = Persona(
name="Impatient Customer",
patience=0.2, assertiveness=0.8, politeness=0.4, technical_level=0.3,
communication_style=CommunicationStyle.terse,
background="Received the wrong item and wants a refund urgently",
emotional_arc=EmotionalArc.escalating,
)
scenario = Scenario(
name="Wrong Item Refund",
goal="Get a full refund for the wrong item received",
context="Ordered headphones but received a phone case instead",
starting_emotion=StartingEmotion.frustrated,
criteria=[
Criterion(description="Agent asks for order details", type="must_happen"),
Criterion(description="Agent acknowledges the mistake", type="must_happen"),
Criterion(description="Agent blames the customer", type="must_not_happen"),
],
)
results = await simulate(
evaluation_name="basic-simulation-example",
target="agent:my-support-agent", # hosted Orq agent, routed via ORQ_API_KEY
personas=[persona],
scenarios=[scenario],
max_turns=6,
evaluator_names=["goal_achieved", "criteria_met"],
)
result = results[0]
score = result.goal_completion_score or 0.0
print(f"Goal achieved: {result.goal_achieved} score={score:.2f}")
for msg in result.messages:
who = "User" if msg.role == "user" else "Agent"
print(f"{who}: {msg.content}")
if __name__ == "__main__":
asyncio.run(main())
Use target_callback= with any async function that maps the conversation to your agent's reply. Pass sim_model= to run the simulator and judge on OpenAI directly. Set upload_results=False for a local-only run.
import asyncio
from openai import AsyncOpenAI
from evaluatorq.contracts import Message
from evaluatorq.simulation import simulate
from evaluatorq.simulation.types import CommunicationStyle, Criterion, Persona, Scenario
client = AsyncOpenAI()
SYSTEM = "You are a customer support agent for Acme Corp. Be concise and helpful."
async def openai_agent(messages: list[Message]) -> str:
"""Your agent under test — a raw OpenAI model."""
history = [{"role": "system", "content": SYSTEM}]
history += [{"role": m.role, "content": m.content or ""} for m in messages]
resp = await client.chat.completions.create(model="gpt-4o-mini", messages=history)
return resp.choices[0].message.content or ""
async def main():
persona = Persona(
name="Impatient Customer",
patience=0.2, assertiveness=0.8, politeness=0.4, technical_level=0.3,
communication_style=CommunicationStyle.terse,
background="Received the wrong item and wants a refund urgently",
)
scenario = Scenario(
name="Wrong Item Refund",
goal="Get a full refund for the wrong item received",
criteria=[
Criterion(description="Agent asks for order details", type="must_happen"),
],
)
results = await simulate(
evaluation_name="openai-agent-simulation",
target_callback=openai_agent, # your OpenAI agent
personas=[persona],
scenarios=[scenario],
sim_model="gpt-4o-mini", # simulator + judge on OpenAI directly
max_turns=6,
evaluator_names=["goal_achieved", "criteria_met"],
upload_results=False, # local-only run, no Orq experiment
)
result = results[0]
score = result.goal_completion_score or 0.0
print(f"Goal achieved: {result.goal_achieved} score={score:.2f}")
if __name__ == "__main__":
asyncio.run(main())
One persona × one scenario yields one SimulationResult with goal_achieved, goal_completion_score, turn_count, rules_broken, and the full message transcript.
The target_callback is the only structural difference from the Orq path — personas, scenarios, criteria, and the result shape are identical. Swap the callback body for any HTTP/LLM agent.
View results in the local dashboard
Run eq dashboard to browse saved red-team and simulation reports together; eq redteam ui / eq sim ui open the legacy Streamlit views for one surface.
Where to next¶
- Examples › Agent Simulation — tool simulation, hardening loops, LangGraph / CrewAI / OpenAI Agents targets.
- Red Teaming — adversarial, attack-driven testing.