Workflow metrics are projected; evidence stays tied to analyzed threads
Projected workflow runs362 / moThis workflow represented 109 of 300 analyzed threads.Analyzed workflow sample109 threadsFindings, recommendations, and evidence cards are still anchored to the normalized workflow sample.Projection factor3.3xApplied to this workflow's spend, savings, runs, and token totals. Confidence: medium.Source pool996 sessionsThe full source-pool population used by the dashboard projection.
derived-flight-reservation-modification · Based on 109 threads · medium confidence
Flight Reservation Modification
Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.
Projected spend / mo$7.14sample $2.15
Projected savings / mo$1.99sample $0.60 · Could cut spend by ~28%
Projected runs / mo362sample 109
Projected total tokens1.5Mavg 4.2K per run
Projected input / output262.9K / 1.3M
This workflow
Token and spend trend
17 hour buckets
$1.09$2.17
$0 · 14.5K tokens
Dec 4 4AM
$0 · 21.2K tokens
Dec 4 5AM
$0.03 · 23.2K tokens
Dec 4 6AM
$0 · 5.5K tokens
Dec 4 7AM
$0 · 15.7K tokens
Dec 4 8AM
$0.10 · 75.6K tokens
Dec 4 9AM
$0.03 · 17.8K tokens
Dec 4 11AM
$0.03 · 34.8K tokens
Dec 4 12PM
$0.10 · 19.6K tokens
Dec 4 11PM
$0.17 · 31.2K tokens
Dec 5 12AM
$0.33 · 69K tokens
Dec 17 4PM
$1.53 · 195.2K tokens
Dec 17 5PM
$1.16 · 207.7K tokens
Dec 17 6PM
$1.46 · 210.8K tokens
Dec 17 7PM
$2.17 · 355.1K tokens
Dec 17 8PM
$0.03 · 74.7K tokens
Dec 17 9PM
$0 · 141.8K tokens
Dec 17 10PM
Input tokensOutput tokens
Model mix
Tokens and spend by model
6 models
Tokens by model
input + output
Claude Opus 4.5274.5K tokens
522 calls35.5% of total
gemini-3-pro-preview172.6K tokens
475 calls22.3% of total
GPT-5.2152.8K tokens
411 calls19.8% of total
Kimi-K2133K tokens
492 calls17.2% of total
GPT-5.1-CODEX31.7K tokens
96 calls4.1% of total
Qwen3-Coder9.1K tokens
26 calls1.2% of total
Spend by model
estimated cost
Claude Opus 4.5$5.46
74.3% of total
GPT-5.2$1.26
17.1% of total
GPT-5.1-CODEX$0.22
3% of total
gemini-3-pro-preview$0.21
2.8% of total
Kimi-K2$0.20
2.8% of total
Qwen3-Coder$0.01
0.1% of total
Opportunities
2 opportunities for this workflow
$1.99 projected
Tool misuse
Failed benchmark outcomes are still paying the full workflow cost
54 of 109 runs fail yet pay full token and tool call costs. Adding early exit checks could cut waste fast.
$1.13projected / month savedsample $0.34/mo
high riskmedium confidence
What we saw
Failure rate is 49.5%, meaning nearly half of all runs produce no useful result.
Failed runs still pay full cost in tokens and tool calls before stopping.
No preflight check appears to screen out likely failures before the workflow starts.
Monthly spend is $2.15, with roughly half going to runs that fail.
Recommended changes
Add a preflight prompt that checks input quality before the full workflow runs.
Set an early exit after the first tool call if key signals suggest failure.
Log why each run fails to find the most common failure patterns quickly.
Evidence (1)
StepImported failing outcome
Imported benchmark outcome ended with failure
Tool misuse
Tool loops are dense enough to need batching or early stopping
tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.
$0.86projected / month savedsample $0.26/mo
medium riskmedium confidence
Recommended changes
Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Prompt composition
Input token breakdown
78.9K tokens
user78.9K · 100%
Tool signals
How this workflow runs
Retries0
How often steps had to re-run.
Delegated subtasks0
Tasks handed off to sub-agents during the workflow.
Documents retrieved0
Total documents pulled in across all tool calls.
Median step latency0 ms
Typical time each step takes to finish.
Stage order
Typical workflow path
3 steps
1
RespondKimi-K2
Respond step in the workflow.
Latency unavailable229 tok avg
2
Loop×99
Loop: plan → tool — repeats 99 times.
Latency unavailable23.9K tok avg
1
PlanKimi-K2
Plan the next steps in the workflow.
Latency unavailable99 tok avg
2
Tooltool
Tool step in the workflow.
Latency unavailable142 tok avg
3
Verify
Verify step in the workflow.
Latency unavailable144 tok avg
Threads
Pick a thread to see what happened
109 threads
Cost per run$0.13
Monthly runs1
Monthly cost$0.13
Operation path68 named tool/models
1
RespondRespond
claude-opus-4-5claude-opus-4-5
7 tok · $0.00
2
RespondRespond
claude-opus-4-5claude-opus-4-5
101 tok · $0.00
3
PlanPlan
claude-opus-4-5claude-opus-4-5
45 tok · $0.00
4
toolTool
Run tooltool
237 tok
5
PlanPlan
claude-opus-4-5claude-opus-4-5
170 tok · $0.00
6
toolTool
Run tooltool
16 tok
7
PlanPlan
claude-opus-4-5claude-opus-4-5
131 tok · $0.00
8
toolTool
Run tooltool
229 tok
9
RespondRespond
claude-opus-4-5claude-opus-4-5
248 tok · $0.01
10
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
11
toolTool
Run tooltool
237 tok
12
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
13
toolTool
Run tooltool
98 tok
14
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
15
toolTool
Run tooltool
16 tok
16
PlanPlan
claude-opus-4-5claude-opus-4-5
81 tok · $0.00
17
toolTool
Run tooltool
233 tok
18
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
19
toolTool
Run tooltool
38 tok
20
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
21
toolTool
Run tooltool
38 tok
22
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
23
toolTool
Run tooltool
43 tok
24
PlanPlan
claude-opus-4-5claude-opus-4-5
154 tok · $0.00
25
toolTool
Run tooltool
1.1K tok
26
PlanPlan
claude-opus-4-5claude-opus-4-5
178 tok · $0.00
27
toolTool
Run tooltool
39 tok
28
RespondRespond
claude-opus-4-5claude-opus-4-5
428 tok · $0.01
29
PlanPlan
claude-opus-4-5claude-opus-4-5
156 tok · $0.00
30
toolTool
Run tooltool
6 tok
31
PlanPlan
claude-opus-4-5claude-opus-4-5
156 tok · $0.00
32
toolTool
Run tooltool
9 tok
33
PlanPlan
claude-opus-4-5claude-opus-4-5
156 tok · $0.00
34
toolTool
Run tooltool
9 tok
35
PlanPlan
claude-opus-4-5claude-opus-4-5
156 tok · $0.00
36
toolTool
Run tooltool
9 tok
37
RespondRespond
claude-opus-4-5claude-opus-4-5
370 tok · $0.01
38
PlanPlan
claude-opus-4-5claude-opus-4-5
174 tok · $0.00
39
toolTool
Run tooltool
6 tok
40
PlanPlan
claude-opus-4-5claude-opus-4-5
148 tok · $0.00
41
toolTool
Run tooltool
163 tok
42
PlanPlan
claude-opus-4-5claude-opus-4-5
148 tok · $0.00
43
toolTool
Run tooltool
9 tok
44
PlanPlan
claude-opus-4-5claude-opus-4-5
148 tok · $0.00
45
toolTool
Run tooltool
9 tok
46
PlanPlan
claude-opus-4-5claude-opus-4-5
148 tok · $0.00
47
toolTool
Run tooltool
5 tok
48
PlanPlan
claude-opus-4-5claude-opus-4-5
148 tok · $0.00
49
toolTool
Run tooltool
14 tok
50
RespondRespond
claude-opus-4-5claude-opus-4-5
417 tok · $0.01
51
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
52
toolTool
Run tooltool
315 tok
53
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
54
toolTool
Run tooltool
188 tok
55
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
56
toolTool
Run tooltool
12 tok
57
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
58
toolTool
Run tooltool
70 tok
59
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
60
toolTool
Run tooltool
12 tok
61
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
62
toolTool
Run tooltool
9 tok
63
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
64
toolTool
Run tooltool
9 tok
65
PlanPlan
claude-opus-4-5claude-opus-4-5
192 tok · $0.00
66
toolTool
Run tooltool
175 tok
67
RespondRespond
claude-opus-4-5claude-opus-4-5
470 tok · $0.01
68
dataset evaluationTool
Run dataset_evaluationdataset_evaluation
100 tok
69
Imported benchmark outcomeVerify
Imported benchmark outcome
120 tok
The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.
Snapshots
full_transcriptSnapshot 1 · imported
Hi! How can I help you today? Hi! I’d like to make some changes to my upcoming flight. Can you help me with that? Of course, I'd be happy to help you make changes to your upcoming…