Metric scope
Workflow metrics are projected; evidence stays tied to analyzed threads
derived-new-flight-booking · Based on 20 threads · medium confidence
New Flight Booking
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
2 opportunities for this workflow
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
- Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
- Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Loop×11
Loop: plan → tool — repeats 11 times.
Latency unavailable2.4K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable93 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Loop×6
Loop: plan → tool — repeats 6 times.
Latency unavailable1.3K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable93 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable642 tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable93 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Loop×18
Loop: plan → tool — repeats 18 times.
Latency unavailable3.9K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable93 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable642 tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable93 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg - Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg - Verify
Verify step in the workflow.
Latency unavailable149 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2140 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 338 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4207 toktoolToolRun tool
tool - 5125 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 616 toktoolToolRun tool
tool - 763 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 81.8K toktoolToolRun tool
tool - 9125 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1090 toktoolToolRun tool
tool - 11204 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 12134 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 133 toktoolToolRun tool
tool - 14134 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1590 toktoolToolRun tool
tool - 16116 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 17124 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 18947 toktoolToolRun tool
tool - 19124 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 201.3K toktoolToolRun tool
tool - 21124 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2210 toktoolToolRun tool
tool - 23124 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 24209 toktoolToolRun tool
tool - 25300 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2683 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2714 toktoolToolRun tool
tool - 28189 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 29167 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 3021 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 312 toktoolToolRun tool
tool - 325 toktoolToolRun tool
tool - 3352 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3456 toktoolToolRun tool
tool - 3554 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 36417 toktoolToolRun tool
tool - 3788 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3871 toktoolToolRun tool
tool - 3951 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4014 toktoolToolRun tool
tool - 4161 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4250 toktoolToolRun tool
tool - 4346 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 44146 toktoolToolRun tool
tool - 45115 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 46158 toktoolToolRun tool
tool - 47115 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4812 toktoolToolRun tool
tool - 49115 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 508 toktoolToolRun tool
tool - 51188 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 52100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 53214 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? Hi, I’d like to book a one-way flight from New York to Seattle on May 20th. I'd be happy to help you book a one-way flight from New York to Seattle o…