Contributing to evaluatorq¶
Getting Started¶
Prerequisites¶
- Python 3.10+
- uv package manager
Setup¶
# Install all dependencies (dev group + all optional extras)
uv sync --all-extras --all-groups
# Verify the setup
uv run pytest -m 'not integration' --co # list tests without running
uv run basedpyright # type check
uv run ruff check src # lint
Development Workflow¶
Running Tests¶
# Unit tests only (fast, no external services)
uv run pytest -m 'not integration'
# Specific test file
uv run pytest tests/redteam/test_vulnerability_first.py -v
# With coverage
uv run pytest -m 'not integration' --cov=src/evaluatorq
# Integration tests (requires ORQ_API_KEY in .env)
uv run pytest -m integration
Linting and Formatting¶
# Check for lint issues
uv run ruff check src
# Auto-fix lint issues
uv run ruff check src --fix
# Format code
uv run ruff format src
# Type check
uv run basedpyright
Project Structure¶
The package has two main areas:
- Core evaluation framework (
src/evaluatorq/) — the publicevaluate()API, dataset fetching, scorers, and integrations - Red teaming subpackage (
src/evaluatorq/redteam/) — adversarial testing pipeline with vulnerability-first data model
See CLAUDE.md for a detailed file tree.
Code Conventions¶
Python Version¶
Target Python 3.10+. Use from __future__ import annotations at the top of files for modern type syntax. The codebase includes a StrEnum polyfill for Python 3.10 compatibility.
Imports¶
- Use absolute imports (
from evaluatorq.redteam.contracts import ...) - Use
TYPE_CHECKINGblocks for imports only needed at type-check time - Ruff handles import sorting
Data Models¶
- All shared data models live in
redteam/contracts.py(Pydantic BaseModel) - Enums use
StrEnumfor JSON serialization compatibility - Semantic convention:
passed=True= RESISTANT,passed=False= VULNERABLE
Error Handling¶
- Custom exceptions in
redteam/exceptions.py - Use
loguru.loggerfor logging in the redteam subpackage - Evaluator failures should return inconclusive results (
passed=None), not raise
Testing¶
- Unit tests go in
tests/unit/, integration tests intests/integration/, redteam tests intests/redteam/ - Mark integration tests:
@pytest.mark.integration - Use
pytest-asynciofor async test functions - Default timeout: 120s per test
Adding Features¶
New Vulnerability / Evaluator / Framework¶
See docs/custom-evaluators-and-frameworks.md for a step-by-step guide.
New Backend (Target)¶
Implement the AgentTarget protocol from backends/base.py:
class AgentTarget(Protocol):
async def send_prompt(self, prompt: str) -> str: ...
def reset_conversation(self) -> None: ...
Optionally implement SupportsClone, SupportsTokenUsage, or SupportsTargetMetadata for advanced features. Register your backend by creating a BackendBundle in backends/registry.py.
New Integration¶
Add integration modules under src/evaluatorq/integrations/. Add the dependency as an optional extra in pyproject.toml.
Pull Requests¶
- Branch from
main - Run
uv run pytest -m 'not integration'anduv run basedpyrightbefore pushing - Use conventional commit format for commit messages (e.g.,
feat(redteam): ...,fix(evaluatorq): ...) - Keep PRs focused — one feature or fix per PR when possible