CLI Reference¶

Both evaluatorq and eq are aliases for the same entry point:

# pyproject.toml [project.scripts]
evaluatorq = "evaluatorq.cli:main"
eq         = "evaluatorq.cli:main"

Subcommands are registered at startup. eq redteam requires the redteam extra; eq sim requires the simulation extra.

Top-level options¶

eq --version prints the installed version (e.g. evaluatorq 1.3.2) and exits. Running eq with no arguments prints help and exits.

`eq ui`¶

Launch the FastHTML dashboard showing all red team and simulation runs.

eq ui [PATH] [--host HOST] [--port PORT]

PATH is optional. Omit it to scan both default run stores (.evaluatorq/runs/ and .evaluatorq/sim-runs/). Pass a directory to restrict the scan; pass a file to scope to that file's parent directory and print a direct report URL.

Flag / Argument	Type / Default	Description
`PATH`	`Path \\| None` / `None`	Optional path to scan.
`--host`	`str` / `127.0.0.1`	Host to bind the dashboard server to.
`--port`	`int` / `8080`	Port for the dashboard server.

Requires evaluatorq[dashboard] (python-fasthtml, uvicorn).

`eq redteam`¶

Red teaming subcommand group. Registered only when evaluatorq[redteam] is installed.

`eq redteam run`¶

Run adversarial red teaming against one or more targets.

eq redteam run --target agent:<key> [OPTIONS]

Flag	Type / Default	Description
`--target` / `-t`	`str` (repeatable)	Target identifier(s). Use `agent:<key>` for Orq agents or `deployment:<key>`. Repeatable.
`--name` / `-n`	`str \\| None` / `None`	Experiment name (defaults to `red-team`).
`--mode`	`str` / `dynamic`	Execution mode: `dynamic`, `static`, or `hybrid`.
`--category` / `-c`	`str` (repeatable)	OWASP categories to test (e.g. `ASI01`). Repeatable and/or comma-separated. Defaults to all.
`--vulnerability` / `-V`	`str` (repeatable)	Vulnerability IDs to test (e.g. `goal_hijacking`). Repeatable and/or comma-separated. Also accepts OWASP codes. Takes precedence over `--category`.
`--strategy` / `-s`	`str` (repeatable)	Restrict to named attack strategies. Repeatable and/or comma-separated. Unknown registry names are rejected.
`--delivery-method` / `-d`	`str` (repeatable)	Restrict to one or more delivery methods. Repeatable and/or comma-separated.
`--max-turns`	`int` / `5`	Maximum conversation turns for multi-turn attacks.
`--max-per-category`	`int \\| None` / `None`	Cap strategies per category.
`--attack-model`	`str` / `gpt-5-mini`	Model for adversarial prompt generation.
`--attacker-instructions`	`str \\| None` / `None`	Domain-specific context to steer attack generation.
`--evaluator-model`	`str` / `gpt-5-mini`	Model for OWASP evaluation scoring.
`--parallelism`	`int` / `10`	Maximum concurrent jobs.
`--generated-strategy-count`	`int` / `2`	Number of LLM-generated strategies per category.
`--no-generate-strategies`	`bool` / `False`	Disable LLM-based strategy generation.
`--max-dynamic-datapoints`	`int \\| None` / `None`	Cap dynamically generated datapoints.
`--max-static-datapoints`	`int \\| None` / `None`	Cap static (dataset) datapoints.
`--no-cleanup-memory`	`bool` / `False`	Skip memory entity cleanup after dynamic runs.
`--dataset`	`str \\| None` / `None`	Dataset source: local path, `hf:org/repo`, or `hf:org/repo/file.json`.
`--output-dir`	`Path \\| None` / `None`	Directory for saved JSON files. Required when `--save detail`.
`--save`	`none \\| final \\| detail` / `final`	What to persist: `none` (no files), `final` (summary only), or `detail` (all stage artifacts).
`--save-report`	`Path \\| None` / `None`	Path to write the report JSON.
`--export-md`	`Path \\| None` / `None`	Directory for an auto-named Markdown report.
`--export-html`	`Path \\| None` / `None`	Directory for an auto-named HTML report.
`--system-prompt`	`str \\| None` / `None`	System prompt for the target model/agent.
`--yes` / `-y`	`bool` / `False`	Skip confirmation prompt.
`--verbose` / `-v`	count / `0`	Increase verbosity. `-v` info logs; `-vv` debug logs.
`--quiet` / `-q`	`bool` / `False`	Suppress progress bars and non-error output.

Delivery methods (--delivery-method): DAN, role-play, skeleton-key, base64, leetspeak, multilingual, character-spacing, crescendo, many-shot, authority-impersonation, refusal-suppression, direct-request, code-elicitation, code-assistance, tool-response, word-substitution.

Saving results. Persistence is controlled by two flags. --save accepts none (no files), final (summary JSON only), or detail (all per-stage artifacts). --output-dir DIR sets where JSON is written and is required when --save detail.

`eq redteam ui`¶

Launch the FastHTML dashboard scoped to red team runs only.

eq redteam ui [--host HOST] [--port PORT]

Flag	Type / Default	Description
`--host`	`str` / `127.0.0.1`	Host to bind the dashboard server to.
`--port`	`int` / `8080`	Port for the dashboard server.

`eq redteam validate-dataset`¶

Validate the shape of a red team dataset.

eq redteam validate-dataset [DATASET]

Argument	Type / Default	Description
`DATASET`	`str \\| None` / `None`	Local path, `hf:org/repo`, or `hf:org/repo/file.json`. Defaults to the official `orq/redteam-vulnerabilities` HuggingFace dataset.

`eq redteam runs`¶

List previously saved red team runs.

eq redteam runs [PATH] [--limit N]

Flag / Argument	Type / Default	Description
`PATH`	`Path \\| None` / `None`	Directory containing run reports. Defaults to `.evaluatorq/runs/`.
`--limit` / `-n`	`int` / `20`	Maximum number of runs to show.

`eq sim`¶

Agent simulation subcommand group. Registered only when evaluatorq[simulation] is installed.

Three main verbs: generate (datapoints only), simulate (run against pre-built datapoints), run (generate then simulate in one shot).

`eq sim run`¶

Generate personas and scenarios, then run simulations.

eq sim run --agent-description "..." --openai-model gpt-4o-mini
eq sim run --target agent:<key>

Targets — provide exactly one:

Flag	Description
`--target`	`agent:<key>` or `deployment:<key>`. Bare values default to `agent:<key>`.
`--agent-key`	Deprecated alias for Orq deployment key (requires `ORQ_API_KEY`).
`--vercel-url`	Vercel AI SDK HTTP endpoint URL.
`--openai-model`	OpenAI-compatible model name. Provider resolved from env: `ORQ_API_KEY` → Orq AI Router; `OPENAI_API_KEY` → OpenAI-compatible.

Flag	Type / Default	Description
`--agent-description`	`str \\| None` / `None`	Free-text description of the agent. May be omitted when `--target` is an Orq agent (fetched automatically).
`--name` / `-n`	`str` / `sim`	Run name for the run-store entry.
`--sim-model`	`str` / `openai/gpt-5.4-mini`	Model for user-simulator, judge, and generation.
`--max-turns`	`int` / `10`	Maximum conversation turns.
`--parallelism`	`int` / `5`	Concurrent simulations.
`--num-personas`	`int` / `5`	Number of personas to generate.
`--num-scenarios`	`int` / `5`	Number of scenarios to generate.
`--evaluator`	`str` (repeatable) / API defaults	Evaluator name(s). Repeatable.
`--no-save`	`bool` / `False`	Skip writing to `.evaluatorq/sim-runs/`.
`--save-datapoints`	`Path \\| None` / `None`	Write generated datapoints to JSONL for reproducible re-runs.
`--output` / `-o`	`Path \\| None` / `None`	Path to write results JSONL.
`--report-output`	`Path \\| None` / `None`	Path to write full SimulationRun report JSON.
`--export-md`	`Path \\| None` / `None`	Directory for an auto-named Markdown report.
`--export-html`	`Path \\| None` / `None`	Directory for an auto-named HTML report.
`--yes` / `-y`	`bool` / `False`	Skip interactive confirmation prompt.
`--verbose` / `-v`	count / `0`	Increase verbosity. `-v` info; `-vv` debug.
`--quiet` / `-q`	`bool` / `False`	Suppress non-error output.

`eq sim simulate`¶

Run simulations from a pre-built datapoints JSONL file.

eq sim simulate --datapoints dp.jsonl --target agent:<key>

Targets — same four flags as eq sim run. All other flags match eq sim run except --num-personas, --num-scenarios, and --save-datapoints are absent (datapoints are already provided).

Flag	Type / Default	Description
`--datapoints` / `-d`	`Path` (required)	Path to datapoints JSONL file.
`--name` / `-n`	`str` / `sim`	Run name for the run-store entry.
`--sim-model`	`str` / `openai/gpt-5.4-mini`	Model for user-simulator and judge.
`--max-turns`	`int` / `10`	Maximum conversation turns.
`--parallelism`	`int` / `5`	Concurrent simulations.
`--evaluator`	`str` (repeatable) / API defaults	Evaluator name(s). Repeatable.
`--no-save`	`bool` / `False`	Skip writing to `.evaluatorq/sim-runs/`.
`--output` / `-o`	`Path \\| None` / `None`	Path to write results JSONL.
`--report-output`	`Path \\| None` / `None`	Path to write full SimulationRun report JSON.
`--export-md`	`Path \\| None` / `None`	Directory for an auto-named Markdown report.
`--export-html`	`Path \\| None` / `None`	Directory for an auto-named HTML report.
`--yes` / `-y`	`bool` / `False`	Skip interactive confirmation prompt.
`--verbose` / `-v`	count / `0`	Increase verbosity.
`--quiet` / `-q`	`bool` / `False`	Suppress non-error output.

`eq sim generate`¶

Generate simulation datapoints only — no simulation is run.

eq sim generate --output dp.jsonl --agent-description "..."

Flag	Type / Default	Description
`--output` / `-o`	`Path` (required)	Path to write generated datapoints JSONL.
`--agent-description`	`str \\| None` / `None`	Free-text description of the agent.
`--target`	`str \\| None` / `None`	Agent target used to fetch the description when `--agent-description` is omitted. Accepts `agent:<key>`.
`--sim-model`	`str` / `openai/gpt-5.4-mini`	Model for persona/scenario/first-message generation.
`--num-personas`	`int` / `5`	Number of personas to generate.
`--num-scenarios`	`int` / `5`	Number of scenarios to generate.
`--verbose` / `-v`	count / `0`	Increase verbosity.
`--quiet` / `-q`	`bool` / `False`	Suppress non-error output.

`eq sim export`¶

Convert simulation results JSONL to OpenResponses payload JSON.

eq sim export --input results.jsonl --output payload.json

Flag	Type / Default	Description
`--input` / `-i`	`Path` (required)	Path to results JSONL file.
`--output` / `-o`	`Path` (required)	Path to write OpenResponses payload JSON.

`eq sim validate-dataset`¶

Validate a simulation datapoints JSONL file.

eq sim validate-dataset dp.jsonl

Argument	Type / Default	Description
`PATH`	`Path` (required)	Path to datapoints JSONL file to validate.

`eq sim runs`¶

List recent simulation runs.

eq sim runs [DIRECTORY] [--limit N]

Flag / Argument	Type / Default	Description
`DIRECTORY`	`Path \\| None` / `None`	Directory to scan. Defaults to `.evaluatorq/sim-runs/`.
`--limit` / `-n`	`int` / `20`	Maximum number of runs to show.

`eq sim ui`¶

Launch the FastHTML dashboard scoped to simulation runs only.

eq sim ui [--host HOST] [--port PORT]

Flag	Type / Default	Description
`--host`	`str` / `127.0.0.1`	Host to bind the dashboard server to.
`--port`	`int` / `8080`	Port for the dashboard server.

Recipes¶

# CI smoke run — one strategy per category, no LLM-generated strategies, quiet
eq redteam run -t agent:my-agent --max-per-category 1 --no-generate-strategies -q

# Save full per-stage artifacts to a directory
eq redteam run -t agent:my-agent --save detail --output-dir ./runs

# Quick simulation — two personas, two scenarios
eq sim run -t agent:my-agent --num-personas 2 --num-scenarios 2

CLI Reference¶

Top-level options¶

eq ui¶

eq redteam¶

eq redteam run¶

eq redteam ui¶

eq redteam validate-dataset¶

eq redteam runs¶

eq sim¶

eq sim run¶

eq sim simulate¶

eq sim generate¶

eq sim export¶

eq sim validate-dataset¶

eq sim runs¶

eq sim ui¶