WorkflowBack to dashboard

Flight Reservation Modification Flight Cancellation Refund Other Flight Booking Flight Delay Compensation

derived-other · Based on 52 threads · low confidence

Other

Long-tail requests were grouped into Other after prompt-first consolidation of the major workflow types.

Projected spend / mo$2.56sample $0.77

Projected savings / mo$1.66sample $0.50 · Could cut spend by ~65%

Projected runs / mo173sample 52

Projected total tokens708.7Kavg 4.1K per run

Projected input / output204.5K / 504.2K

Metric scope

Workflow metrics are projected; evidence stays tied to analyzed threads

Projected workflow runs173 / moThis workflow represented 52 of 300 analyzed threads.

Analyzed workflow sample52 threadsFindings, recommendations, and evidence cards are still anchored to the normalized workflow sample.

Projection factor3.3xApplied to this workflow's spend, savings, runs, and token totals. Confidence: medium.

Source pool996 sessionsThe full source-pool population used by the dashboard projection.

This workflow

Token and spend trend

14 hour buckets

Dec 4 4AM

Dec 4 5AM

Dec 4 6AM

Dec 4 7AM

Dec 4 8AM

Dec 4 11PM

Dec 5 12AM

Dec 17 4PM

Dec 17 5PM

Dec 17 6PM

Dec 17 7PM

Dec 17 8PM

Dec 17 9PM

Dec 17 10PM

Input tokensOutput tokens

Model mix

Tokens and spend by model

6 models

Tokens by model

input + output

gemini-3-pro-preview175.7K tokens

398 calls37.3% of total

Kimi-K2122.4K tokens

257 calls25.9% of total

Claude Opus 4.594.8K tokens

175 calls20.1% of total

GPT-5.263.6K tokens

155 calls13.5% of total

GPT-5.1-CODEX7.6K tokens

29 calls1.6% of total

Qwen3-Coder7.6K tokens

26 calls1.6% of total

Spend by model

estimated cost

Claude Opus 4.5$1.83

67.5% of total

GPT-5.2$0.45

16.5% of total

gemini-3-pro-preview$0.20

7.4% of total

Kimi-K2$0.18

6.6% of total

GPT-5.1-CODEX$0.05

1.9% of total

Qwen3-Coder$0

0.1% of total

Opportunities

5 opportunities for this workflow

$1.66 projected

Outcome cohort gap

Failed runs diverge from successful runs before the final outcome

This workflow has enough scored runs to compare outcomes directly. tool and adjacent tools appear more often in failed runs.

$0.53projected / month savedsample $0.16/mo

high riskhigh confidence

Recommended first move

Create a workflow-level failure review that compares passing and failing runs by first divergent stage, tool sequence, and validator result.

Learn more

What we saw

21 failed or partial runs spent $0.46 in the analyzed sample.
Successful runs average 8.7 tool calls; failed runs average 27.6.
Failed runs average 2,094 input tokens per run versus 563 on successful runs.

More recommended changes

Turn the successful cohort into a checklist: required context, required tools, stop condition, and final verification.
Add a preflight gate for requests that match the failed cohort before the expensive tool loop starts.

Evidence (2)

StepFailed cohort outcome

Imported benchmark outcome ended with failure

StepSuccessful cohort outcome

Imported benchmark outcome ended with success

Tool misuse

Failed benchmark outcomes are still paying the full workflow cost

The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.

$0.37projected / month savedsample $0.11/mo

high riskmedium confidence

Recommended first move

Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.

Learn more

More recommended changes

Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.

Evidence (1)

StepImported failing outcome

Imported benchmark outcome ended with failure

Tool misuse

Tool loops are dense enough to need batching or early stopping

tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.

$0.30projected / month savedsample $0.09/mo

medium riskmedium confidence

Recommended first move

Batch or cache repeated tool calls where the inputs overlap across adjacent steps.

Learn more

More recommended changes

Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.

Tool error loop

The same tool error repeats instead of triggering a new plan

Self-correction is not bounded tightly enough. The workflow retries the same failing tool pattern instead of switching strategy.

$0.27projected / month savedsample $0.08/mo

high riskhigh confidence

Recommended first move

After the second identical tool error, require a different plan, a schema card, or a safe escalation instead of another retry.

Learn more

What we saw

89 repeated error results were observed across 10 runs.
The normalized error signature is "nameerror".
3,648 chars of error output were copied back into the workflow.

More recommended changes

Add an actionable error contract for tool so the model receives allowed next steps, not raw stack text.
Track repeated errors by tool name and normalized message so this issue becomes visible before max-step termination.

Evidence (1)

DataRepeated tool error

NameError: name 'get_cheapest_route' is not defined

Output contract mismatch

Users are correcting missing fields or output shape

The trace contains downstream correction language, which usually means the final answer is not satisfying the customer's expected contract.

$0.20projected / month savedsample $0.06/mo

medium riskhigh confidence

Recommended first move

Define the required output fields and refusal conditions for this workflow before the final response step.

Learn more

What we saw

10 correction-like user messages appeared after an assistant response.
5 of 52 analyzed runs had at least one correction signal.

More recommended changes

Validate final answers against the output contract and route missing fields back through a cheap repair step.
Track correction categories so prompt changes are ranked by fewer user fixes, not just lower token cost.

Evidence (1)

DataUser correction signal

Hmm, that’s strange. I thought I booked Newark to Milan for May 21, but maybe I made a mistake. I only use this user profile and email for bookings, so it should be under ivan_ros…

Prompt composition

Input token breakdown

61.5K tokens

user61.5K · 100%

Tool signals

How this workflow runs

Retries0

How often steps had to re-run.

Delegated subtasks0

Tasks handed off to sub-agents during the workflow.

Documents retrieved0

Total documents pulled in across all tool calls.

Median step latency0 ms

Typical time each step takes to finish.

Stage order

Typical workflow path

5 steps

1
Loop×2
Loop: respond → plan → tool — repeats 2 times.
Latency unavailable936 tok avg
1. 1
  Respondgemini-3-pro-preview
  Respond step in the workflow.
  Latency unavailable259 tok avg
2. 2
  Plangemini-3-pro-preview
  Plan the next steps in the workflow.
  Latency unavailable127 tok avg
3. 3
  Tooltool
  Tool step in the workflow.
  Latency unavailable82 tok avg
2
Loop×37
Loop: plan → tool — repeats 37 times.
Latency unavailable7.7K tok avg
1. 1
  Plangemini-3-pro-preview
  Plan the next steps in the workflow.
  Latency unavailable127 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable82 tok avg
Respondgemini-3-pro-preview
Respond step in the workflow.
Latency unavailable259 tok avg
4
Loop×59
Loop: plan → tool — repeats 59 times.
Latency unavailable12.3K tok avg
1. 1
  Plangemini-3-pro-preview
  Plan the next steps in the workflow.
  Latency unavailable127 tok avg
2. 2
  Tooltool
  Tool step in the workflow.
  Latency unavailable82 tok avg
Verify
Verify step in the workflow.
Latency unavailable136 tok avg

Threads

Pick a thread to see what happened

52 threads

Cost per run$0.15

Monthly runs1

Monthly cost$0.15

Operation path91 named tool/models

1
RespondRespond
claude-opus-4-5claude-opus-4-5
7 tok · $0.00
2
RespondRespond
claude-opus-4-5claude-opus-4-5
172 tok · $0.00
3
PlanPlan
claude-opus-4-5claude-opus-4-5
72 tok · $0.00
4
toolTool
Run tooltool
296 tok
5
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
6
toolTool
Run tooltool
9 tok
7
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
8
toolTool
Run tooltool
9 tok
9
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
10
toolTool
Run tooltool
4 tok
11
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
12
toolTool
Run tooltool
4 tok
13
PlanPlan
claude-opus-4-5claude-opus-4-5
65 tok · $0.00
14
toolTool
Run tooltool
181 tok
15
PlanPlan
claude-opus-4-5claude-opus-4-5
85 tok · $0.00
16
toolTool
Run tooltool
210 tok
17
PlanPlan
claude-opus-4-5claude-opus-4-5
150 tok · $0.00
18
toolTool
Run tooltool
13 tok
19
PlanPlan
claude-opus-4-5claude-opus-4-5
69 tok · $0.00
20
toolTool
Run tooltool
1.3K tok
21
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
22
toolTool
Run tooltool
45 tok
23
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
24
toolTool
Run tooltool
3 tok
25
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
26
toolTool
Run tooltool
682 tok
27
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
28
toolTool
Run tooltool
318 tok
29
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
30
toolTool
Run tooltool
727 tok
31
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
32
toolTool
Run tooltool
273 tok
33
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
34
toolTool
Run tooltool
6 tok
35
PlanPlan
claude-opus-4-5claude-opus-4-5
129 tok · $0.00
36
toolTool
Run tooltool
4 tok
37
RespondRespond
claude-opus-4-5claude-opus-4-5
406 tok · $0.01
38
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
39
toolTool
Run tooltool
8 tok
40
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
41
toolTool
Run tooltool
11 tok
42
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
43
toolTool
Run tooltool
8 tok
44
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
45
toolTool
Run tooltool
8 tok
46
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
47
toolTool
Run tooltool
3 tok
48
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
49
toolTool
Run tooltool
88 tok
50
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
51
toolTool
Run tooltool
35 tok
52
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
53
toolTool
Run tooltool
118 tok
54
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
55
toolTool
Run tooltool
14 tok
56
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
57
toolTool
Run tooltool
16 tok
58
PlanPlan
claude-opus-4-5claude-opus-4-5
256 tok · $0.00
59
toolTool
Run tooltool
293 tok
60
RespondRespond
claude-opus-4-5claude-opus-4-5
532 tok · $0.01
61
RespondRespond
claude-opus-4-5claude-opus-4-5
577 tok · $0.01
62
PlanPlan
claude-opus-4-5claude-opus-4-5
35 tok · $0.00
63
toolTool
Run tooltool
5 tok
64
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
65
toolTool
Run tooltool
6 tok
66
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
67
toolTool
Run tooltool
5 tok
68
PlanPlan
claude-opus-4-5claude-opus-4-5
66 tok · $0.00
69
toolTool
Run tooltool
53 tok
70
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
71
toolTool
Run tooltool
28 tok
72
PlanPlan
claude-opus-4-5claude-opus-4-5
37 tok · $0.00
73
toolTool
Run tooltool
2 tok
74
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
75
toolTool
Run tooltool
10 tok
76
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
77
toolTool
Run tooltool
57 tok
78
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
79
toolTool
Run tooltool
3 tok
80
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
81
toolTool
Run tooltool
10 tok
82
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
83
toolTool
Run tooltool
33 tok
84
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
85
toolTool
Run tooltool
12 tok
86
PlanPlan
claude-opus-4-5claude-opus-4-5
135 tok · $0.00
87
toolTool
Run tooltool
12 tok
88
PlanPlan
claude-opus-4-5claude-opus-4-5
146 tok · $0.00
89
toolTool
Run tooltool
287 tok
90
RespondRespond
claude-opus-4-5claude-opus-4-5
203 tok · $0.00
91
dataset evaluationTool
Run dataset_evaluationdataset_evaluation
100 tok
92
Imported benchmark outcomeVerify
Imported benchmark outcome
309 tok

The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.

Snapshots

full_transcriptSnapshot 1 · imported

Hi! How can I help you today? Hi! I have a couple of questions. First, could you please tell me the total balance I have on my gift cards and also the total balance on my certific…