Metric scope
Workflow metrics are projected; evidence stays tied to analyzed threads
derived-flight-cancellation-refund · Based on 68 threads · medium confidence
Flight Cancellation Refund
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
2 opportunities for this workflow
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
- Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
- Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- Loop×2
Loop: respond → plan → tool — repeats 2 times.
Latency unavailable762 tok avg- 1RespondKimi-K2
Respond step in the workflow.
Latency unavailable225 tok avg - 2PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable78 tok avg - 3Tooltool
Tool step in the workflow.
Latency unavailable78 tok avg
- 1
- Loop×97
Loop: plan → tool — repeats 97 times.
Latency unavailable15.1K tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable78 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable78 tok avg
- 1
- Verify
Verify step in the workflow.
Latency unavailable132 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 252 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 324 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4172 toktoolToolRun tool
tool - 541 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 6166 toktoolToolRun tool
tool - 7166 toktoolToolRun tool
tool - 8218 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 994 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 101.3K toktoolToolRun tool
tool - 1168 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 121.3K toktoolToolRun tool
tool - 13357 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 1429 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 155 toktoolToolRun tool
tool - 16129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 175 toktoolToolRun tool
tool - 1850 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1956 toktoolToolRun tool
tool - 2052 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2114 toktoolToolRun tool
tool - 22129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 23179 toktoolToolRun tool
tool - 24129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 25195 toktoolToolRun tool
tool - 26129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 27181 toktoolToolRun tool
tool - 28129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 29242 toktoolToolRun tool
tool - 30129 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3112 toktoolToolRun tool
tool - 32241 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 33213 tok · $0.01PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3414 toktoolToolRun tool
tool - 35110 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3616 toktoolToolRun tool
tool - 37269 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 38103 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 39217 toktoolToolRun tool
tool - 40119 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 415 toktoolToolRun tool
tool - 4255 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 43100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 44235 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? I need to cancel my upcoming flights for reservations XEHM4B and 59XX6W. I'd be happy to help you cancel those reservations. First, I need to verify …