Skip to content

Tracing

evaluatorq ships optional OpenTelemetry tracing. When enabled, every evaluation run, job, evaluator, and LLM call becomes a span you can view in the Orq dashboard or any OTLP-compatible backend.

How tracing is enabled

Tracing initialises lazily on the first evaluation run. It turns on automatically when either condition is true:

  • ORQ_API_KEY is set — the OTLP base endpoint is https://my.orq.ai/v2/otel (or <ORQ_BASE_URL>/v2/otel if ORQ_BASE_URL is set); the exporter appends /v1/traces, so spans POST to …/v2/otel/v1/traces.
  • OTEL_EXPORTER_OTLP_ENDPOINT is set — that endpoint is used as the OTLP base.

If neither variable is set, no tracer is created and all span context managers are no-ops.

Set ORQ_DISABLE_TRACING=1 or ORQ_DISABLE_TRACING=true to suppress tracing even when the above variables are present.

Install the OTEL packages

Tracing depends on optional packages that are not installed by default:

pip install opentelemetry-api opentelemetry-sdk \
    opentelemetry-exporter-otlp-proto-http \
    opentelemetry-semantic-conventions
# or via the extras bundle:
pip install "evaluatorq[otel]"

If these packages are absent the SDK silently skips initialisation — no error is raised.

Minimal enable example

import os
import asyncio

os.environ["ORQ_API_KEY"] = "your_orq_api_key"   # tracing auto-enables

from evaluatorq import DataPoint, evaluatorq, job, string_contains_evaluator


@job("echo")
async def echo_job(data: DataPoint, _row: int) -> str:
    return str(data.inputs.get("text", ""))


asyncio.run(
    evaluatorq(
        "my-eval",
        data=[DataPoint(inputs={"text": "hello"}, expected_output="hello")],
        jobs=[echo_job],
        evaluators=[string_contains_evaluator()],
    )
)

To send traces to a custom OTLP endpoint instead:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 python my_eval.py

To debug tracing setup:

ORQ_DEBUG=1 python my_eval.py

This prints the resolved endpoint, auth header presence, and any initialisation errors to stdout.

OTLP exporter details

  • Protocol: HTTP/protobuf (OTLPSpanExporter from opentelemetry-exporter-otlp-proto-http)
  • Export mode: BatchSpanProcessor (asynchronous batching)
  • Timeout: 5 seconds per export request
  • Auth: Authorization: Bearer <ORQ_API_KEY> is added automatically when the resolved endpoint's hostname ends in .orq.ai or is exactly orq.ai. For any other endpoint the header is not added; use OTEL_EXPORTER_OTLP_HEADERS to supply auth manually.
  • Custom headers: parsed from OTEL_EXPORTER_OTLP_HEADERS as key1=value1,key2=value2.

Span hierarchy

Evaluation runner spans

orq.job                          # one per DataPoint — root when no ambient trace is active,
  ├── <your job code>            #   otherwise a child of the caller's span
  └── orq.evaluation             # one per evaluator applied to this job

All orq.job spans from a single evaluatorq() call share the same orq.run_id attribute, which ties them together as a logical run without requiring a common parent span.

Span attributes on orq.job:

Attribute Value
orq.trace_type "evaluatorq"
orq.run_id UUID for this evaluation run
orq.row_index Zero-based row number
orq.job_name Job name (if set via @job("name"))

Span attributes on orq.evaluation:

Attribute Value
orq.run_id Same UUID as the parent job span
orq.evaluator_name Name of the evaluator
orq.score JSON-serialised score value
orq.explanation Explanation string (if the evaluator provides one)
orq.pass Boolean pass/fail result

Red teaming spans

orq.redteam.pipeline             # root — one per red_team() call
  ├── orq.redteam.context_retrieval
  ├── orq.redteam.datapoint_generation
  │     ├── orq.redteam.capability_classification
  │     │     ├── chat (llm_purpose=classify_tools)
  │     │     └── chat (llm_purpose=infer_resources)
  │     └── orq.redteam.strategy_planning
  │           └── chat (llm_purpose=generate_strategies)
  ├── orq.job                    # one per attack datapoint
  │     └── orq.redteam.attack
  │           ├── orq.redteam.target_call
  │           └── orq.redteam.attack_turn  (x N turns)
  │                 ├── orq.redteam.adversarial_generation
  │                 │     └── chat (llm_purpose=adversarial)
  │                 └── orq.redteam.target_call
  ├── orq.evaluation             # security evaluator result
  │     └── orq.redteam.security_evaluation
  │           └── chat (llm_purpose=evaluation)
  └── orq.redteam.memory_cleanup # post-run agent memory entity cleanup

LLM spans (chat ...) carry standard GenAI attributes:

Attribute Value
gen_ai.operation.name Operation name (e.g. "chat")
gen_ai.system Provider name
gen_ai.request.model Model identifier
gen_ai.usage.input_tokens Prompt token count
gen_ai.usage.output_tokens Completion token count
gen_ai.input.messages JSON serialised input messages (gated by EVALUATORQ_CAPTURE_MESSAGE_CONTENT)
gen_ai.output.messages JSON serialised output messages (gated by EVALUATORQ_CAPTURE_MESSAGE_CONTENT)
orq.llm.purpose Cross-domain purpose tag (e.g. "adversarial", "evaluation", "target")

Content capture and truncation

Two env vars control how much text is stored on spans:

  • EVALUATORQ_CAPTURE_MESSAGE_CONTENT (default true): set to false or 0 to keep LLM message content out of traces entirely. Token counts and model name are still recorded.
  • EVALUATORQ_SPAN_MAX_TEXT_CHARS (default: no limit): set to a positive integer to truncate span text attributes. Truncated strings end with ... [truncated].

W3C trace context propagation

To propagate trace context across service boundaries, inject the active span's W3C traceparent/tracestate headers into your outgoing HTTP requests. Use the OpenTelemetry SDK's public inject() helper — a stable, supported API:

from opentelemetry.propagate import inject

headers: dict[str, str] = {}
inject(headers)          # writes `traceparent` (+ `tracestate`) for the active span
# pass `headers` into your outgoing request, e.g. httpx.get(url, headers=headers)

inject() is a no-op when no span is active, so it is safe to call whenever OpenTelemetry is installed. (The from opentelemetry.propagate import inject import itself requires OTel; if you need code that also runs without it installed, use the internal helper below, which degrades to an empty dict.)

Internal convenience helper

evaluatorq also ships get_trace_context_headers() in evaluatorq.common.tracing, which returns the same headers as a dict (empty when OTel is unavailable). It is an internal utility — not re-exported from the public evaluatorq.tracing namespace, and its import path may change without a deprecation cycle. Prefer the OpenTelemetry inject() path above for anything stable.