ZenLLM

Context waste is diagnosable if the telemetry is good enough

ZenLLM is not a memory runtime. The value is measurement: showing where context growth, retrieval overhead, and memory rebuild patterns are inflating token cost so teams know what to fix.

Patterns ZenLLM can surface

The current backend can detect context accumulation, static context waste, retrieval overhead, and persistent memory opportunities when the telemetry is explicit enough.

Context accumulation within a session or agent run.

Large static prompt bundles that repeat with little variation.

Retrieval payloads dominating useful output.

Repeated memory rebuilds across sessions.

Fields worth sending

At minimum, send the standard cost telemetry. For higher-confidence analysis, include session IDs, agent-step IDs, and retrieval or memory fields when available.

response = llm.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    workflow_type="support-agent",
    session_id="chat-session-7f3d",
    agent_run_id="run-2026-03-30-001",
    agent_step=3,
    history_tokens=1800,
    system_prompt_tokens=650,
    retrieved_chunks=12,
    retrieved_context_tokens=3200,
    memory_strategy="summary-plus-retrieval",
    cache_read_tokens=900,
    cache_write_tokens=140,
    context_fingerprint="support-bot:enterprise:docs-v42",
)