Metric scope
Workflow metrics are projected; evidence stays tied to analyzed threads
derived-booking-conflict-resolution · Based on 15 threads · medium confidence
Booking Conflict Resolution
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
2 opportunities for this workflow
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
- Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
- Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- Loop×2
Loop: respond → plan → tool — repeats 2 times.
Latency unavailable1.1K tok avg- 1Respondgemini-3-pro-preview
Respond step in the workflow.
Latency unavailable304 tok avg - 2Plangemini-3-pro-preview
Plan the next steps in the workflow.
Latency unavailable151 tok avg - 3Tooltool
Tool step in the workflow.
Latency unavailable80 tok avg
- 1
- Loop×37
Loop: plan → tool — repeats 37 times.
Latency unavailable8.5K tok avg- 1Plangemini-3-pro-preview
Plan the next steps in the workflow.
Latency unavailable151 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable80 tok avg
- 1
- Respondgemini-3-pro-preview
Respond step in the workflow.
Latency unavailable304 tok avg - Loop×59
Loop: plan → tool — repeats 59 times.
Latency unavailable13.6K tok avg- 1Plangemini-3-pro-preview
Plan the next steps in the workflow.
Latency unavailable151 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable80 tok avg
- 1
- Verify
Verify step in the workflow.
Latency unavailable146 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2354 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 3202 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4147 toktoolToolRun tool
tool - 5180 toktoolToolRun tool
tool - 6291 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 719 toktoolToolRun tool
tool - 8242 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 933 toktoolToolRun tool
tool - 10261 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1164 toktoolToolRun tool
tool - 12271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 13665 toktoolToolRun tool
tool - 14251 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1512 toktoolToolRun tool
tool - 16271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 17665 toktoolToolRun tool
tool - 18271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 19600 toktoolToolRun tool
tool - 20271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 21183 toktoolToolRun tool
tool - 22293 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 233 toktoolToolRun tool
tool - 24271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 25165 toktoolToolRun tool
tool - 26398 tok · $0.01PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2723 toktoolToolRun tool
tool - 28570 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2995 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 302 toktoolToolRun tool
tool - 315 toktoolToolRun tool
tool - 32189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 333 toktoolToolRun tool
tool - 34189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 357 toktoolToolRun tool
tool - 36107 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3756 toktoolToolRun tool
tool - 38189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3957 toktoolToolRun tool
tool - 40115 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 41417 toktoolToolRun tool
tool - 42189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4313 toktoolToolRun tool
tool - 44289 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 4546 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 46100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 47108 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? Hi! I need some help with a couple of things. First, I’d like to remove a passenger named Ethan from my reservation—can you help with that? Also, I’m…