Skip to main content

Sampling

Full telemetry on every turn is rarely what you want in production — it's expensive and it makes the backend UI cluttered. Sampling lets you keep a random fraction of traces while discarding the rest at the source.

The default

sample_rate: null (or unset) → AlwaysOn. Every span is kept. Best for development and low-volume deployments.

Enabling sampling

# config.yaml
sample_rate: 0.25 # keep 25% of traces, drop 75%

Or via env var:

export HERMES_OTEL_SAMPLE_RATE=0.25

Valid range: 0.0 to 1.0. Setting 0.0 disables all spans (equivalent to enabled: false but cheaper — the decision is made per-span in the SDK).

Why ParentBased?

The plugin configures ParentBased(TraceIdRatioBased(rate)). What that means:

  • The root span of each trace is evaluated against the rate (a random decision seeded by the trace ID).
  • Descendant spans inherit the root's decision — if the root is sampled, they all are; if it isn't, none of them are.

You never see a partial trace where half the children were sampled and half weren't. Either the whole session → llm → api → tool tree makes it to the backend or none of it does.

When to sample

Rough rules of thumb:

SituationSuggested rate
Local developmentnull (AlwaysOn)
Staging1.0 or null
Low-volume production (< 1 turn/sec)null or 0.5
Medium-volume production (1-10 turns/sec)0.10.25
High-volume production (> 10 turns/sec)0.010.05, then boost on error

For the last case (low sample rate + boost on error) you'd need a tail-based sampler — hermes-otel doesn't ship one yet, so the head-based decision is locked in at span start. If you need tail-based sampling, pipe through an OTel Collector that does.

Sampling and metrics

Sampling only affects spans. Metrics (token counts, tool counts, API durations) are always recorded — they're aggregates, not per-trace events, so sampling them would give you wrong numbers.

Verifying

Debug logging prints the sampler config at startup:

[hermes-otel] Sampler: ParentBased(TraceIdRatioBased(0.25))

And each span decision is logged at debug level — see Debug logging.