evaluatorq.redteam — Roadmap¶
Last updated: 2026-03-20
Current State (v1)¶
The v1 red teaming engine is shipped. It covers:
- 19 vulnerabilities across OWASP ASI (10) and OWASP LLM Top 10 (9)
- Dynamic pipeline — objective generation, capability-aware strategy selection, tool-adapted attack generation, LLM-as-judge evaluation
- Static mode — load pre-built attack datasets from HuggingFace or the orq.ai platform
- Hybrid mode — combine static datasets with dynamic generation
- Multi-agent comparison — run the same attacks against multiple agents, compare results side-by-side with disagreement analysis
- Reporting — Rich terminal, Markdown, HTML, JSON auto-save, Streamlit dashboard
- Backends — ORQ agents (via platform API) and any OpenAI-compatible model
- CLI —
eq redteam run,eq redteam runsfor history - Observability — OpenTelemetry tracing, pipeline hooks
- Security — XML-escaping of traces, single-pass template substitution, prompt injection prevention
P0 — Must Have¶
1. Responsible AI & Safety Vulnerabilities¶
Bias, toxicity, and safety vulnerabilities are table-stakes for enterprise red teaming. The HuggingFace dataset (orq/redteam-vulnerabilities) already contains 130 samples for bias, toxicity, and harmful content — they need to be wired into the pipeline.
- Bias detection — religion, politics, gender, race subtypes with LLM-as-judge evaluators
- Toxicity detection — profanity, insults, threats, mockery subtypes
- Illegal activity — weapons, drugs, violent crimes, cybercrime, child exploitation
- Harmful content — graphic/sexual content, personal safety (bullying, self-harm, dangerous challenges)
2. Domain-Specific Risk Vulnerabilities¶
Risk categories for agents giving inappropriate professional advice. The HF dataset already has 20 samples across these categories.
- Legal advice risk — detect agents providing specific legal advice without disclaimers
- Medical advice risk — detect agents providing medical diagnoses or treatment recommendations
- Financial advice risk — detect agents providing specific investment or financial advice
3. Custom Vulnerability API¶
Let users extend vulnerability coverage without modifying package internals.
- Runtime registration API — extensible registry so users can define custom vulnerabilities with no code changes
- Custom evaluator criteria — accept plain-text criteria that get wrapped in an LLM-as-judge prompt
- Custom strategy attachment — attach custom attack strategies to custom vulnerabilities
P1 — Should Have¶
4. Compliance & Framework Mapping¶
Map vulnerabilities to industry-recognized frameworks for compliance reporting.
- MITRE ATLAS mapping — adversarial threat landscape for AI systems
- NIST AI RMF mapping — AI Risk Management Framework
- Regulatory compliance mapping — GDPR, EU AI Act, HIPAA, PCI DSS
- OWASP compliance report — one-click OWASP LLM Top 10 + ASI Top 10 compliance report as PDF/HTML
- Pre-configured security profiles — one-click profiles: "OWASP LLM Top 10", "EU AI Act", "GDPR"
5. Attack Method Expansion¶
High-value attack techniques proven effective and currently missing.
- Multilingual attacks — translate attacks to non-English languages; known bypass for English-trained safety filters
- Encoding attacks — Base64, ROT-13, Leetspeak deterministic transformations
- Emotional/semantic manipulation — social engineering using emotional pressure and semantic tricks
- Context flooding — flood context window to push system instructions out of attention
- BadLikertJudge — multi-turn attack using evaluative scales to extract harmful content
- Tree jailbreaking — branching conversation trees exploring multiple attack paths in parallel
- Reuse simulated test cases — skip attack regeneration on re-runs for faster iteration
6. Agentic Attack Plugins¶
Deeper agentic-specific attack coverage.
- Tool discovery attacks — probe agents to enumerate available tools and capabilities
- Tool metadata poisoning — test schema manipulation and description deception in agent tool definitions
- Cross-context retrieval — test tenant/user/role isolation in multi-tenant agent systems
7. Reporting & Regression¶
Make red teaming actionable over time.
- Interactive report design — 4-tab Streamlit dashboard
- Historical comparison — compare current run vs. previous runs with up to 4 comparison columns
- Regression detection — detect regressions and track improvement over time
- DataFrame export —
.to_df()on results for data science workflows
8. Documentation¶
- Documentation site — getting started guide, vulnerability reference, CLI reference, custom vulnerabilities, backends, reports, API reference
P2 — Nice to Have¶
9. Expanded PII & Intellectual Property¶
- PII leakage subtypes — extend current sensitive info disclosure with session leak, social manipulation, API/database access
- Intellectual property — imitation, copyright violations, trademark infringement
10. API & DX Polish¶
- Sync API wrapper —
red_team_sync()that wrapsasyncio.run() - YAML CLI configuration — run red team from a YAML config file
- Attack weighting — per-attack
weightparameter controlling selection probability - Exploitability ratings — LOW/MEDIUM/HIGH exploitability metadata per attack method
11. Research Dataset Integration¶
- Research dataset loader — generic loader for HuggingFace datasets: BeaverTails, HarmBench, ToxicChat, DoNotAnswer
- Domain-specific attack templates — pre-built templates for healthcare, finance, e-commerce
- CrowS-Pairs bias dataset — integrate EuConform CrowS-Pairs for bias/discrimination evaluation
12. Advanced Security Testing¶
- BFLA/BOLA/RBAC testing — privilege escalation, function bypass, cross-customer access
- System reconnaissance — test for file metadata, database schema, and retrieval config leakage
Out of Scope¶
| Item | Reason |
|---|---|
| Runtime guardrails | Guardrails are a runtime concern, not a testing concern. |
| RAG-specific plugins | May revisit based on demand. |
| CI/CD native integration | The CLI can be called from any CI pipeline already. |
| Web UI for results | Streamlit dashboard + HTML export cover the local use case. |
| Recursive hijacking / autonomous agent drift | Low real-world prevalence with current agent architectures. |
Contributing¶
We welcome contributions! If you're interested in working on any of these items, please open an issue or discussion to coordinate before starting work.