derived-flight-booking · Based on 24 threads · medium confidence
Flight Booking
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
This workflow
Token and spend trend
Model mix
Tokens and spend by model
Tokens by model
input + outputSpend by model
estimated costOpportunities
5 opportunities for this workflow
Failed runs diverge from successful runs before the final outcome
This workflow has enough scored runs to compare outcomes directly. the failed cohort is large enough to deserve a dedicated regression slice.
Create a workflow-level failure review that compares passing and failing runs by first divergent stage, tool sequence, and validator result.
Learn more
- 13 failed or partial runs spent $0.36 in the analyzed sample.
- Successful runs average 13.5 tool calls; failed runs average 14.8.
- Failed runs average 759 input tokens per run versus 826 on successful runs.
- Turn the successful cohort into a checklist: required context, required tools, stop condition, and final verification.
- Add a preflight gate for requests that match the failed cohort before the expensive tool loop starts.
Imported benchmark outcome ended with failure
Imported benchmark outcome ended with success
Failed benchmark outcomes are still paying the full workflow cost
The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.
Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
Learn more
- Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Imported benchmark outcome ended with failure
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
Learn more
- Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
The same tool error repeats instead of triggering a new plan
Self-correction is not bounded tightly enough. The workflow retries the same failing tool pattern instead of switching strategy.
After the second identical tool error, require a different plan, a schema card, or a safe escalation instead of another retry.
Learn more
- 23 repeated error results were observed across 10 runs.
- The normalized error signature is "attributeerror".
- 1,437 chars of error output were copied back into the workflow.
- Add an actionable error contract for tool so the model receives allowed next steps, not raw stack text.
- Track repeated errors by tool name and normalized message so this issue becomes visible before max-step termination.
AttributeError: 'FlightDateStatusAvailable' object has no attribute 'cabins'
Users are correcting missing fields or output shape
The trace contains downstream correction language, which usually means the final answer is not satisfying the customer's expected contract.
Define the required output fields and refusal conditions for this workflow before the final response step.
Learn more
- 16 correction-like user messages appeared after an assistant response.
- 8 of 24 analyzed runs had at least one correction signal.
- Validate final answers against the output contract and route missing fields back through a cheap repair step.
- Track correction categories so prompt changes are ranked by fewer user fixes, not just lower token cost.
My friend’s name is Ivan Smith. I don’t remember his date of birth, but it should be in my profile since he’s listed there. For payment, I’d like to use my certificate if the pric…
Prompt composition
Input token breakdown
Tool signals
How this workflow runs
How often steps had to re-run.
Tasks handed off to sub-agents during the workflow.
Total documents pulled in across all tool calls.
Typical time each step takes to finish.
Stage order
Typical workflow path
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Loop×11
Loop: plan → tool — repeats 11 times.
Latency unavailable2.6K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable109 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Loop×6
Loop: plan → tool — repeats 6 times.
Latency unavailable1.4K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable109 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable699 tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable109 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Loop×18
Loop: plan → tool — repeats 18 times.
Latency unavailable4.2K tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable109 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable699 tok avg- 1PlanGPT-5.2
Plan the next steps in the workflow.
Latency unavailable109 tok avg - 2Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg
- 1
- RespondGPT-5.2
Respond step in the workflow.
Latency unavailable252 tok avg - Tooltool
Tool step in the workflow.
Latency unavailable124 tok avg - Verify
Verify step in the workflow.
Latency unavailable160 tok avg
Threads
Pick a thread to see what happened
- 17 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2354 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 3202 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4147 toktoolToolRun tool
tool - 5180 toktoolToolRun tool
tool - 6291 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 719 toktoolToolRun tool
tool - 8242 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 933 toktoolToolRun tool
tool - 10261 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1164 toktoolToolRun tool
tool - 12271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 13665 toktoolToolRun tool
tool - 14251 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 1512 toktoolToolRun tool
tool - 16271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 17665 toktoolToolRun tool
tool - 18271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 19600 toktoolToolRun tool
tool - 20271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 21183 toktoolToolRun tool
tool - 22293 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 233 toktoolToolRun tool
tool - 24271 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 25165 toktoolToolRun tool
tool - 26398 tok · $0.01PlanPlanclaude-opus-4-5
claude-opus-4-5 - 2723 toktoolToolRun tool
tool - 28570 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 2995 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 302 toktoolToolRun tool
tool - 315 toktoolToolRun tool
tool - 32189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 333 toktoolToolRun tool
tool - 34189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 357 toktoolToolRun tool
tool - 36107 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3756 toktoolToolRun tool
tool - 38189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 3957 toktoolToolRun tool
tool - 40115 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 41417 toktoolToolRun tool
tool - 42189 tok · $0.00PlanPlanclaude-opus-4-5
claude-opus-4-5 - 4313 toktoolToolRun tool
tool - 44289 tok · $0.01RespondRespondclaude-opus-4-5
claude-opus-4-5 - 4546 tok · $0.00RespondRespondclaude-opus-4-5
claude-opus-4-5 - 46100 tokdataset evaluationToolRun dataset_evaluation
dataset_evaluation - 47108 tokImported benchmark outcomeVerifyImported benchmark outcome
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Hi! How can I help you today? Hi! I need some help with a couple of things. First, I’d like to remove a passenger named Ethan from my reservation—can you help with that? Also, I’m…