WorkflowBack to dashboard

Flight Reservation Modification Flight Cancellation Refund Other Cabin Class Upgrade Downgrade New Flight Booking Booking Conflict Resolution Flight Delay Compensation

Metric scope

Workflow metrics are projected; evidence stays tied to analyzed threads

Projected workflow runs66 / moThis workflow represented 20 of 300 analyzed threads.

Analyzed workflow sample20 threadsFindings, recommendations, and evidence cards are still anchored to the normalized workflow sample.

Projection factor3.3xApplied to this workflow's spend, savings, runs, and token totals. Confidence: medium.

Source pool996 sessionsThe full source-pool population used by the dashboard projection.

derived-new-flight-booking · Based on 20 threads · medium confidence

New Flight Booking

Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.

Projected spend / mo$1.56sample $0.47

Projected savings / mo$0.46sample $0.14 · Could cut spend by ~29%

Projected runs / mo66sample 20

Projected total tokens240.7Kavg 3.6K per run

Projected input / output42.2K / 198.4K

This workflow

Token and spend trend

7 hour buckets

Dec 4 6AM

Dec 5 12AM

Dec 17 4PM

Dec 17 5PM

Dec 17 6PM

Dec 17 7PM

Dec 17 8PM

Input tokensOutput tokens

Model mix

Tokens and spend by model

5 models

Tokens by model

input + output

Claude Opus 4.552.9K tokens

114 calls40.6% of total

GPT-5.250.7K tokens

159 calls38.8% of total

gemini-3-pro-preview20.3K tokens

62 calls15.6% of total

Kimi-K25.4K tokens

19 calls4.2% of total

GPT-5.1-CODEX1.2K tokens

7 calls0.9% of total

Spend by model

estimated cost

Claude Opus 4.5$1.10

69.6% of total

GPT-5.2$0.44

27.7% of total

gemini-3-pro-preview$0.03

1.6% of total

Kimi-K2$0.01

0.6% of total

GPT-5.1-CODEX$0.01

0.5% of total

Opportunities

2 opportunities for this workflow

$0.46 projected

Tool misuse

Failed benchmark outcomes are still paying the full workflow cost

The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.

$0.27projected / month savedsample $0.08/mo

high riskmedium confidence

Recommended changes

Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.
Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.

Evidence (1)

StepImported failing outcome

Imported benchmark outcome ended with failure

Tool misuse

Tool loops are dense enough to need batching or early stopping

tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.

$0.20projected / month savedsample $0.06/mo

medium riskmedium confidence

Recommended changes

Batch or cache repeated tool calls where the inputs overlap across adjacent steps.
Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.

Prompt composition

Input token breakdown

12.7K tokens

user12.7K · 100%

Tool signals

How this workflow runs

Retries0

How often steps had to re-run.

Delegated subtasks0

Tasks handed off to sub-agents during the workflow.

Documents retrieved0

Total documents pulled in across all tool calls.

Median step latency0 ms

Typical time each step takes to finish.

Stage order

Typical workflow path

13 steps

RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
2
Loop×11
Loop: plan → tool — repeats 11 times.
Latency unavailable2.4K tok avg
1. 1
  PlanGPT-5.2
  Plan the next steps in the workflow.
  Latency unavailable93 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable121 tok avg
RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
4
Loop×6
Loop: plan → tool — repeats 6 times.
Latency unavailable1.3K tok avg
1. 1
  PlanGPT-5.2
  Plan the next steps in the workflow.
  Latency unavailable93 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable121 tok avg
RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
6
Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable642 tok avg
1. 1
  PlanGPT-5.2
  Plan the next steps in the workflow.
  Latency unavailable93 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable121 tok avg
RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
8
Loop×18
Loop: plan → tool — repeats 18 times.
Latency unavailable3.9K tok avg
1. 1
  PlanGPT-5.2
  Plan the next steps in the workflow.
  Latency unavailable93 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable121 tok avg
RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
10
Loop×3
Loop: plan → tool — repeats 3 times.
Latency unavailable642 tok avg
1. 1
  PlanGPT-5.2
  Plan the next steps in the workflow.
  Latency unavailable93 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable121 tok avg
RespondGPT-5.2
Respond step in the workflow.
Latency unavailable224 tok avg
Tooltool
Tool step in the workflow.
Latency unavailable121 tok avg
Verify
Verify step in the workflow.
Latency unavailable149 tok avg

Threads

Pick a thread to see what happened

20 threads

Cost per run$0.07

Monthly runs1

Monthly cost$0.07

Operation path52 named tool/models

1
RespondRespond
claude-opus-4-5claude-opus-4-5
7 tok · $0.00
2
RespondRespond
claude-opus-4-5claude-opus-4-5
140 tok · $0.00
3
PlanPlan
claude-opus-4-5claude-opus-4-5
38 tok · $0.00
4
toolTool
Run tooltool
207 tok
5
PlanPlan
claude-opus-4-5claude-opus-4-5
125 tok · $0.00
6
toolTool
Run tooltool
16 tok
7
PlanPlan
claude-opus-4-5claude-opus-4-5
63 tok · $0.00
8
toolTool
Run tooltool
1.8K tok
9
PlanPlan
claude-opus-4-5claude-opus-4-5
125 tok · $0.00
10
toolTool
Run tooltool
90 tok
11
RespondRespond
claude-opus-4-5claude-opus-4-5
204 tok · $0.00
12
PlanPlan
claude-opus-4-5claude-opus-4-5
134 tok · $0.00
13
toolTool
Run tooltool
3 tok
14
PlanPlan
claude-opus-4-5claude-opus-4-5
134 tok · $0.00
15
toolTool
Run tooltool
90 tok
16
RespondRespond
claude-opus-4-5claude-opus-4-5
116 tok · $0.00
17
PlanPlan
claude-opus-4-5claude-opus-4-5
124 tok · $0.00
18
toolTool
Run tooltool
947 tok
19
PlanPlan
claude-opus-4-5claude-opus-4-5
124 tok · $0.00
20
toolTool
Run tooltool
1.3K tok
21
PlanPlan
claude-opus-4-5claude-opus-4-5
124 tok · $0.00
22
toolTool
Run tooltool
10 tok
23
PlanPlan
claude-opus-4-5claude-opus-4-5
124 tok · $0.00
24
toolTool
Run tooltool
209 tok
25
RespondRespond
claude-opus-4-5claude-opus-4-5
300 tok · $0.01
26
PlanPlan
claude-opus-4-5claude-opus-4-5
83 tok · $0.00
27
toolTool
Run tooltool
14 tok
28
RespondRespond
claude-opus-4-5claude-opus-4-5
189 tok · $0.00
29
RespondRespond
claude-opus-4-5claude-opus-4-5
167 tok · $0.00
30
PlanPlan
claude-opus-4-5claude-opus-4-5
21 tok · $0.00
31
toolTool
Run tooltool
2 tok
32
toolTool
Run tooltool
5 tok
33
PlanPlan
claude-opus-4-5claude-opus-4-5
52 tok · $0.00
34
toolTool
Run tooltool
56 tok
35
PlanPlan
claude-opus-4-5claude-opus-4-5
54 tok · $0.00
36
toolTool
Run tooltool
417 tok
37
PlanPlan
claude-opus-4-5claude-opus-4-5
88 tok · $0.00
38
toolTool
Run tooltool
71 tok
39
PlanPlan
claude-opus-4-5claude-opus-4-5
51 tok · $0.00
40
toolTool
Run tooltool
14 tok
41
PlanPlan
claude-opus-4-5claude-opus-4-5
61 tok · $0.00
42
toolTool
Run tooltool
50 tok
43
PlanPlan
claude-opus-4-5claude-opus-4-5
46 tok · $0.00
44
toolTool
Run tooltool
146 tok
45
PlanPlan
claude-opus-4-5claude-opus-4-5
115 tok · $0.00
46
toolTool
Run tooltool
158 tok
47
PlanPlan
claude-opus-4-5claude-opus-4-5
115 tok · $0.00
48
toolTool
Run tooltool
12 tok
49
PlanPlan
claude-opus-4-5claude-opus-4-5
115 tok · $0.00
50
toolTool
Run tooltool
8 tok
51
RespondRespond
claude-opus-4-5claude-opus-4-5
188 tok · $0.00
52
dataset evaluationTool
Run dataset_evaluationdataset_evaluation
100 tok
53
Imported benchmark outcomeVerify
Imported benchmark outcome
214 tok

The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.

Snapshots

full_transcriptSnapshot 1 · imported

Hi! How can I help you today? Hi, I’d like to book a one-way flight from New York to Seattle on May 20th. I'd be happy to help you book a one-way flight from New York to Seattle o…