Dashboard¶

evaluatorq ships a built-in web dashboard for browsing red team and simulation reports. It is powered by FastHTML (a lightweight Python web framework) and served locally via uvicorn. There is no external service dependency — everything runs on your machine.

The evaluatorq combined dashboard — stat band, runs by type, attack-resistance, findings by severity, token usage, and recent runs.

Install¶

The dashboard is an optional extra (it pulls in python-fasthtml and uvicorn):

pip install "evaluatorq[dashboard]"
# or — if you already have the redteam / simulation extras:
pip install "evaluatorq[redteam,dashboard]"

With uv:

uv add "evaluatorq[dashboard]"

Launch¶

Launch it with eq dashboard (the evaluatorq and eq entry points are interchangeable):

# Browse both default stores at once — red team + simulation
eq dashboard

# Scope to a directory of exported reports, or a single report file
eq dashboard /path/to/my/reports
eq dashboard .evaluatorq/runs/red-team_20260626_143024.json

# Bind a custom host / port (default 127.0.0.1:8080)
eq dashboard --host 0.0.0.0 --port 8888

Invocation	What it scans
`eq dashboard`	Both default stores: `.evaluatorq/runs` (red team) and `.evaluatorq/sim-runs` (simulation)
`eq dashboard PATH`	Only that directory; pass a file to scope to its parent and print the report's direct URL
`eq redteam ui` / `eq sim ui`	Legacy Streamlit views, scoped to a single surface (see the note below)

With no PATH the server prints the local URL to open. Pointing at a file prints that report's direct URL so you land straight on it.

Legacy Streamlit views

eq redteam ui and eq sim ui launch the older Streamlit dashboards, scoped to a single surface. The FastHTML eq dashboard documented here is the preview replacement that browses both surfaces together. The CLI surface is still being consolidated — see the CLI Reference.

What the dashboard browses¶

The dashboard auto-discovers JSON report files in the configured root directories:

Default store	Written by
`.evaluatorq/runs/*.json`	`red_team()` / `eq redteam run`
`.evaluatorq/sim-runs/*.json`	`eq sim run` (auto-saves unless `--no-save`); `simulate()` only when called with `save=True`

Each report gets a stable URL for the lifetime of its file, so links you share keep working.

Supported surfaces¶

Surface	JSON discriminator	Rendered by
Red team	`"pipeline"` key present	`redteam/reports/export_html.py`
Simulation	`"mode"` key present (`mode` wins over `pipeline`)	`simulation/reports/export_html.py`

Files that cannot be parsed (invalid JSON) are silently skipped. Files that parse but fail model validation appear in the index as broken cards with an error badge; their detail page shows a non-fatal error message instead of a traceback.

Landing (GET /)¶

GET / opens the combined dashboard: a stat band (total runs, per-surface counts, attack resistance), runs-by-type and attack-resistance breakdowns, findings by severity, token usage, and a recent runs list across both stores. The left sidebar switches surface — Red Team and Agent Sim open filtered run lists at ?surface=…, sorted by creation time (newest first). Each run row drills into its report view; reports whose JSON is only partially valid surface an error badge instead of a traceback. The export action on a report downloads the standalone self-contained HTML for offline sharing.

Filters¶

Both surfaces expose dimension filters in a sidebar:

Red team filters (7 dimensions)¶

Dimension	Values
`result`	VULNERABLE / RESISTANT
`severity`	critical / high / medium / low / info
`category`	framework category codes (ASI01, LLM01, …)
`vulnerability`	vulnerability enum values
`attack_technique`	technique identifiers
`delivery_method`	delivery method identifiers
`source`	dataset source identifiers

Simulation filters (4 dimensions)¶

Dimension	Values
`goal_outcome`	achieved / not achieved
`persona`	persona names present in the run
`scenario`	scenario names present in the run
`evaluator`	evaluator names present in the run

Filters are applied via HTMX (no page reload). The report body, summary aggregates, and download links all update in-place to reflect the active filter state.

A red team report filtered by the sidebar dimension filters.

Interactive views (red team)¶

The red team surface exposes four dashboard-only interactive panels alongside the static report body:

Interactive breakdown — pick a group-by and stack-by dimension (7 × 7 combinations); attack-success rate recomputed per (group, stack) cell.
Agent heatmap — select the pivot dimension (vulnerability / category / technique / severity) for the agent × dimension ASR heatmap.
Conversation viewer — drill into the full message-by-message transcript for any individual attack (system / user / assistant / tool messages plus evaluator explanation).
Disagreement viewer — for multi-agent runs, select any agent pair and page through attacks where their results differ (side-by-side transcripts).

Simulation transcript viewer¶

Simulation reports expose a conversation transcript panel: select any conversation entry from the run to see the full multi-turn exchange between the simulated user and the target agent.

The dashboard conversation transcript viewer, message by message.

Additional red team charts¶

Beyond the four panels above, the red team surface recomputes several charts live that the static exported report does not carry:

Cumulative discovery curve — vulnerabilities found as a function of conversation turn depth.
Attack-failure treemap — vulnerability → technique, sized by attack count.
Token histograms — prompt and completion token distributions per attack.
Vulnerability × severity — a cross-join stacked bar.

Downloads¶

Every report page includes a download sidebar with export links:

Format	Red team	Simulation
HTML (standalone, self-contained)	yes	yes
Markdown	yes	—
CSV (filtered result rows)	yes	—
JSON (filtered result rows)	yes	yes

Download links respect the currently active filter state — the CSV/JSON exports contain only the rows visible in the filtered report body.