SupernovaAgentic Workflow Analysis and Optimization

derived-flight-reservation-modification · Based on 116 threads · medium confidence

Flight Reservation Modification

Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.

Projected spend / mo$7.84sample $2.36
Projected savings / mo$5.21sample $1.57 · Could cut spend by ~66%
Projected runs / mo385sample 116
Projected total tokens1.6Mavg 4.2K per run
Projected input / output286.2K / 1.3M

This workflow

Token and spend trend

17 hour buckets
Input tokensOutput tokens

Model mix

Tokens and spend by model

6 models

Tokens by model

input + output
Claude Opus 4.5307.3K tokens
578 calls36.8% of total
gemini-3-pro-preview193.3K tokens
521 calls23.1% of total
GPT-5.2152.4K tokens
407 calls18.2% of total
Kimi-K2133K tokens
492 calls15.9% of total
GPT-5.1-CODEX39.9K tokens
113 calls4.8% of total
Qwen3-Coder9.1K tokens
26 calls1.1% of total

Spend by model

estimated cost
Claude Opus 4.5$6.08
75.6% of total
GPT-5.2$1.26
15.7% of total
GPT-5.1-CODEX$0.26
3.2% of total
gemini-3-pro-preview$0.23
2.8% of total
Kimi-K2$0.20
2.5% of total
Qwen3-Coder$0.01
0.1% of total

Opportunities

5 opportunities for this workflow

$5.21 projected
Outcome cohort gap

Failed runs diverge from successful runs before the final outcome

47% of runs fail or partially pass. Failed runs average 19.8 tool calls and 934 input tokens versus 9.9 calls and 565 tokens for successful runs.

$1.63projected / month savedsample $0.49/mo
high riskhigh confidence
Recommended first move

Set a tool call limit per run to stop runaway loops early.

Learn more
What we saw
  • Nearly half of all scored runs fail or only partially pass.
  • Failed runs make roughly twice as many tool calls as successful ones.
  • Failed runs consume 65% more input tokens, raising cost with no benefit.
  • Extra tool calls suggest the agent loops or retries instead of resolving the task.
More recommended changes
  • Alert or pause a run when tool calls exceed roughly 12 to 14.
  • Review the most-used tool in failed runs to find where the agent gets stuck.
Evidence (2)
StepFailed cohort outcome
Imported benchmark outcome ended with failure
StepSuccessful cohort outcome
Imported benchmark outcome ended with success
Tool misuse

Failed benchmark outcomes are still paying the full workflow cost

55 of 116 runs fail at full cost. Adding early exit checks before heavy tool calls could cut waste fast.

$1.23projected / month savedsample $0.37/mo
high riskmedium confidence
Recommended first move

Add a preflight prompt step to validate inputs before any tool calls run.

Learn more
What we saw
  • 47% of runs end in failure or partial success, paying full token and tool call cost.
  • No early exit appears to stop failing runs before they reach expensive steps.
  • Preflight checks are likely absent or too weak to catch bad inputs early.
  • Monthly spend is low now, but failure rate will scale cost as volume grows.
More recommended changes
  • Set an early exit rule that stops a run when key retrieval returns no results.
  • Log the step where each failure first occurs to find the most costly failure point.
Evidence (1)
StepImported failing outcome
Imported benchmark outcome ended with failure
Tool misuse

Tool loops are dense enough to need batching or early stopping

tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.

$0.93projected / month savedsample $0.28/mo
medium riskmedium confidence
Recommended first move

Batch or cache repeated tool calls where the inputs overlap across adjacent steps.

Learn more
More recommended changes
  • Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Tool error loop

The same tool error repeats instead of triggering a new plan

The agent retried a broken tool call 193 times across 2 threads without changing its approach, burning context and making no progress.

$0.80projected / month savedsample $0.24/mo
high riskhigh confidence
Recommended first move

Add a circuit breaker that stops retrying after 3 identical errors.

Learn more
What we saw
  • The same SyntaxError fired 193 times with no change in the tool input.
  • Two threads were affected, so the loop spread beyond a single run.
  • Each failed call consumed context, shrinking room for useful work.
  • No recovery plan triggered after the first failure.
More recommended changes
  • Prompt the agent to revise its tool input when an error repeats.
  • Log the first error and skip further calls until input changes.
Evidence (1)
DataRepeated tool error
SyntaxError: invalid syntax (<unknown>, line 1)
Output contract mismatch

Users are correcting missing fields or output shape

Users are fixing incomplete or incorrectly shaped responses from your assistant. This suggests the assistant isn't following its output rules consistently. A validator can catch these problems before users see them.

$0.63projected / month savedsample $0.19/mo
medium riskhigh confidence
Recommended first move

Define exact output shape: required fields, data types, and format rules

Learn more
What we saw
  • 20 correction messages across 10 threads show users fixing missing data or wrong format
  • Users mention expected fields the assistant didn't include or got wrong
  • Pattern appears in ~9% of runs, indicating a systematic gap not random errors
  • High confidence finding based on clear correction language in transcripts
More recommended changes
  • Add a check after each assistant response to verify it matches the contract
  • Log failures to find which prompts or tools cause shape mismatches
Evidence (1)
DataUser correction signal
I see the difference is $222, but I have insurance on my reservation. According to your website, change fees and fare differences should be waived for insured tickets. Can you ple…

Prompt composition

Input token breakdown

85.9K tokens
user85.9K · 100%

Tool signals

How this workflow runs

Retries0

How often steps had to re-run.

Delegated subtasks0

Tasks handed off to sub-agents during the workflow.

Documents retrieved0

Total documents pulled in across all tool calls.

Median step latency0 ms

Typical time each step takes to finish.

Stage order

Typical workflow path

3 steps
  1. RespondKimi-K2

    Respond step in the workflow.

    Latency unavailable232 tok avg
  2. Loop×99

    Loop: plan → tool — repeats 99 times.

    Latency unavailable24K tok avg
    1. 1
      PlanKimi-K2

      Plan the next steps in the workflow.

      Latency unavailable101 tok avg
    2. 2
      Tooltool

      Tool step in the workflow.

      Latency unavailable141 tok avg
  3. Verify

    Verify step in the workflow.

    Latency unavailable143 tok avg

Threads

Pick a thread to see what happened

116 threads
Cost per run$0.13
Monthly runs1
Monthly cost$0.13
Operation path68 named tool/models
  1. 1
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    7 tok · $0.00
  2. 2
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    101 tok · $0.00
  3. 3
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    45 tok · $0.00
  4. 4
    toolTool
    Run tooltool
    237 tok
  5. 5
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    170 tok · $0.00
  6. 6
    toolTool
    Run tooltool
    16 tok
  7. 7
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    131 tok · $0.00
  8. 8
    toolTool
    Run tooltool
    229 tok
  9. 9
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    248 tok · $0.01
  10. 10
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  11. 11
    toolTool
    Run tooltool
    237 tok
  12. 12
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  13. 13
    toolTool
    Run tooltool
    98 tok
  14. 14
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  15. 15
    toolTool
    Run tooltool
    16 tok
  16. 16
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    81 tok · $0.00
  17. 17
    toolTool
    Run tooltool
    233 tok
  18. 18
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  19. 19
    toolTool
    Run tooltool
    38 tok
  20. 20
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  21. 21
    toolTool
    Run tooltool
    38 tok
  22. 22
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  23. 23
    toolTool
    Run tooltool
    43 tok
  24. 24
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    154 tok · $0.00
  25. 25
    toolTool
    Run tooltool
    1.1K tok
  26. 26
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    178 tok · $0.00
  27. 27
    toolTool
    Run tooltool
    39 tok
  28. 28
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    428 tok · $0.01
  29. 29
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    156 tok · $0.00
  30. 30
    toolTool
    Run tooltool
    6 tok
  31. 31
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    156 tok · $0.00
  32. 32
    toolTool
    Run tooltool
    9 tok
  33. 33
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    156 tok · $0.00
  34. 34
    toolTool
    Run tooltool
    9 tok
  35. 35
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    156 tok · $0.00
  36. 36
    toolTool
    Run tooltool
    9 tok
  37. 37
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    370 tok · $0.01
  38. 38
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    174 tok · $0.00
  39. 39
    toolTool
    Run tooltool
    6 tok
  40. 40
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    148 tok · $0.00
  41. 41
    toolTool
    Run tooltool
    163 tok
  42. 42
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    148 tok · $0.00
  43. 43
    toolTool
    Run tooltool
    9 tok
  44. 44
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    148 tok · $0.00
  45. 45
    toolTool
    Run tooltool
    9 tok
  46. 46
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    148 tok · $0.00
  47. 47
    toolTool
    Run tooltool
    5 tok
  48. 48
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    148 tok · $0.00
  49. 49
    toolTool
    Run tooltool
    14 tok
  50. 50
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    417 tok · $0.01
  51. 51
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  52. 52
    toolTool
    Run tooltool
    315 tok
  53. 53
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  54. 54
    toolTool
    Run tooltool
    188 tok
  55. 55
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  56. 56
    toolTool
    Run tooltool
    12 tok
  57. 57
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  58. 58
    toolTool
    Run tooltool
    70 tok
  59. 59
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  60. 60
    toolTool
    Run tooltool
    12 tok
  61. 61
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  62. 62
    toolTool
    Run tooltool
    9 tok
  63. 63
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  64. 64
    toolTool
    Run tooltool
    9 tok
  65. 65
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    192 tok · $0.00
  66. 66
    toolTool
    Run tooltool
    175 tok
  67. 67
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    470 tok · $0.01
  68. 68
    dataset evaluationTool
    Run dataset_evaluationdataset_evaluation
    100 tok
  69. 69
    Imported benchmark outcomeVerify
    Imported benchmark outcome
    120 tok

The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.

Snapshots
full_transcriptSnapshot 1 · imported

Hi! How can I help you today? Hi! I’d like to make some changes to my upcoming flight. Can you help me with that? Of course, I'd be happy to help you make changes to your upcoming…