Skip to main content

Conversation capture

By default, the llm.* span's input.value is just the latest user turn. That's the obvious thing to show in a UI, but it's not what the model actually saw — the model was given the system prompt, the full conversation history, and all tool results in addition to that last user message.

Turning on capture_conversation_history attaches the full message list (as JSON) to the llm.* span. Indispensable for debugging "why did the model do that?" questions.

Enabling it

# config.yaml
capture_conversation_history: true
conversation_history_max_chars: 40000 # safety cap

Or via env var:

export HERMES_OTEL_CAPTURE_CONVERSATION_HISTORY=true

What gets set

On every llm.* span:

AttributeTypeExample
input.valuestring[{"role":"system","content":"You are..."},{"role":"user","content":"..."}, ...]
input.mime_typestringapplication/json
hermes.conversation.message_countint12

Backends that recognise input.mime_type=application/json pretty-print the JSON:

  • Phoenix: JSON view in the Input panel, fully expandable.
  • Langfuse: syntax-highlighted JSON blob.
  • SigNoz / Jaeger / Tempo: raw JSON string — readable but not folded.

Respects capture_previews

When capture_previews: false (privacy mode), conversation capture is also suppressed. The two interact cleanly — you don't need to remember to turn this off when you enable privacy mode.

Respects preview_max_chars? Not exactly

The cap on conversation history is its own fieldconversation_history_max_chars — not preview_max_chars. The reasoning: conversation JSON is orders of magnitude larger than a single tool input preview, so sharing the same cap would either truncate individual messages uselessly or balloon the size of normal previews.

Default cap is 20,000 characters (≈20 KB UTF-8), which is roughly 5k tokens of conversation. Long conversations get clipped with a trailing ... on whatever message the cap lands in the middle of.

Bump it for complex agents:

conversation_history_max_chars: 100000 # 100 KB

Why only on llm.*?

api.* spans are per-HTTP-request. A single turn can include multiple api.* round-trips (one to get tool calls, another to get the final response after tool results). The conversation history changes between them (tool results get appended), so attaching it to api.* spans would double or triple the data with mostly-overlapping payloads.

The parent llm.* span represents the whole turn end-to-end. Attaching conversation history there keeps it in one place.

Performance

Conversation capture adds a JSON serialisation + size check on every pre_llm_call hook. For a 10-message conversation at ~200 tokens each, that's ~10 ms of serialisation — negligible next to a network round-trip to the model. Not a concern.

The backend impact is bigger: every trace is now carrying ~20 KB of JSON it didn't carry before. On Langfuse Cloud's free tier (500 MB/mo), that's ~25k turns before you hit the limit. Size accordingly.