Metric scope
Workflow metrics are projected; evidence stays tied to analyzed threads
derived-flight-delay-compensation · Based on 19 threads · medium confidence
Flight Delay Compensation
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
2 opportunities for this workflow
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
- Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
- Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable313 tok avg - Loop×50
Loop: plan → tool — repeats 50 times.
Latency unavailable13.4K tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable116 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable152 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable313 tok avg - Loop×15
Loop: plan → tool — repeats 15 times.
Latency unavailable4K tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable116 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable152 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable313 tok avg - Tooltool
Tool step in the workflow.
Latency unavailable152 tok avg - Verify
Verify step in the workflow.
Latency unavailable119 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2130 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 353 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4300 toktoolToolRun tool
tool - 5108 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 6202 toktoolToolRun tool
tool - 777 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 8219 toktoolToolRun tool
tool - 9180 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1014 toktoolToolRun tool
tool - 11113 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 121.3K toktoolToolRun tool
tool - 13102 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1429 toktoolToolRun tool
tool - 15294 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 16108 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 176 toktoolToolRun tool
tool - 18216 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 19171 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2011 toktoolToolRun tool
tool - 21290 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2241 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 238 toktoolToolRun tool
tool - 2438 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 255 toktoolToolRun tool
tool - 26138 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 276 toktoolToolRun tool
tool - 2856 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2956 toktoolToolRun tool
tool - 3058 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3115 toktoolToolRun tool
tool - 32138 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3341 toktoolToolRun tool
tool - 34160 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 35100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 36124 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? Hi, I’m calling because I’m really frustrated about my last flight—it was delayed for hours and it messed up all my plans. Can you help me with this?…