Metric scope
Workflow metrics are projected; evidence stays tied to analyzed threads
derived-cabin-class-upgrade-downgrade · Based on 29 threads · medium confidence
Cabin Class Upgrade Downgrade
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
2 opportunities for this workflow
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
- Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
- Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable227 tok avg - Loop×5
Loop: plan → tool — repeats 5 times.
Latency unavailable1.1K tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable112 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable102 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable227 tok avg - Loop×2
Loop: plan → tool — repeats 2 times.
Latency unavailable428 tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable112 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable102 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable227 tok avg - Loop×4
Loop: plan → tool — repeats 4 times.
Latency unavailable856 tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable112 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable102 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable227 tok avg - Loop×60
Loop: plan → tool — repeats 60 times.
Latency unavailable12.8K tok avg- 1PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable112 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable102 tok avg
- 1
- RespondKimi-K2
Respond step in the workflow.
Latency unavailable227 tok avg - Tooltool
Tool step in the workflow.
Latency unavailable102 tok avg - Verify
Verify step in the workflow.
Latency unavailable136 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2144 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 3148 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 416 toktoolToolRun tool
tool - 598 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 6499 toktoolToolRun tool
tool - 7148 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 81.1K toktoolToolRun tool
tool - 9352 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 10215 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1119 toktoolToolRun tool
tool - 12186 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 13480 toktoolToolRun tool
tool - 14196 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 152 toktoolToolRun tool
tool - 16170 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1740 toktoolToolRun tool
tool - 18215 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 19194 toktoolToolRun tool
tool - 20215 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 215 toktoolToolRun tool
tool - 22215 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 23174 toktoolToolRun tool
tool - 24509 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 25391 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2633 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 275 toktoolToolRun tool
tool - 28112 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2919 toktoolToolRun tool
tool - 30133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 317 toktoolToolRun tool
tool - 32133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3313 toktoolToolRun tool
tool - 3462 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3556 toktoolToolRun tool
tool - 36133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 373 toktoolToolRun tool
tool - 38133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 393 toktoolToolRun tool
tool - 40133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 413 toktoolToolRun tool
tool - 42133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 433 toktoolToolRun tool
tool - 44133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 453 toktoolToolRun tool
tool - 46133 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4713 toktoolToolRun tool
tool - 48232 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 49100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 50253 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? Hi. I need to downgrade all my business class flights to economy. I’m having some financial trouble right now, so I can’t afford business anymore. I …