SupernovaAgentic Workflow Analysis and Optimization

derived-flight-cancellation-refund · Based on 87 threads · medium confidence

Flight Cancellation Refund

Derived primarily from user-authored prompts across a 300-thread slice. Full-slice prompt clustering ran on every thread, and Claude consolidated the major workflow types from cluster exemplars because the slice exceeds the non-sampling threshold.

Projected spend / mo$3.75sample $1.13
Projected savings / mo$2.22sample $0.67 · Could cut spend by ~59%
Projected runs / mo289sample 87
Projected total tokens819.4Kavg 2.8K per run
Projected input / output189.7K / 629.7K

This workflow

Token and spend trend

12 hour buckets
Input tokensOutput tokens

Model mix

Tokens and spend by model

6 models

Tokens by model

input + output
gemini-3-pro-preview145.3K tokens
431 calls30.2% of total
GPT-5.2125.4K tokens
358 calls26.1% of total
Claude Opus 4.5116K tokens
228 calls24.1% of total
Kimi-K271.4K tokens
366 calls14.9% of total
GPT-5.1-CODEX15K tokens
49 calls3.1% of total
Qwen3-Coder7.7K tokens
37 calls1.6% of total

Spend by model

estimated cost
Claude Opus 4.5$2.34
62.8% of total
GPT-5.2$1.02
27.4% of total
gemini-3-pro-preview$0.17
4.6% of total
Kimi-K2$0.11
2.9% of total
GPT-5.1-CODEX$0.08
2.2% of total
Qwen3-Coder$0
0.1% of total

Opportunities

5 opportunities for this workflow

$2.22 projected
Outcome cohort gap

Failed runs diverge from successful runs before the final outcome

This workflow has enough scored runs to compare outcomes directly. tool and adjacent tools appear more often in failed runs.

$0.56projected / month savedsample $0.17/mo
medium riskhigh confidence
Recommended first move

Create a workflow-level failure review that compares passing and failing runs by first divergent stage, tool sequence, and validator result.

Learn more
What we saw
  • 34 failed or partial runs spent $0.49 in the analyzed sample.
  • Successful runs average 10.1 tool calls; failed runs average 18.1.
  • Failed runs average 795 input tokens per run versus 563 on successful runs.
More recommended changes
  • Turn the successful cohort into a checklist: required context, required tools, stop condition, and final verification.
  • Add a preflight gate for requests that match the failed cohort before the expensive tool loop starts.
Evidence (2)
StepFailed cohort outcome
Imported benchmark outcome ended with failure
StepSuccessful cohort outcome
Imported benchmark outcome ended with success
Tool misuse

Failed benchmark outcomes are still paying the full workflow cost

The imported outcome labels show a high failure rate after the workflow has already spent tokens and tool calls, which points to missing early exits or weak preflight checks.

$0.53projected / month savedsample $0.16/mo
high riskmedium confidence
Recommended first move

Compare passing and failing traces for this workflow and add an early gate before the expensive tool loop starts.

Learn more
More recommended changes
  • Use the imported outcome label as an evaluation dimension so regressions are ranked by wasted spend, not just by raw failure count.
Evidence (1)
StepImported failing outcome
Imported benchmark outcome ended with failure
Tool misuse

Tool loops are dense enough to need batching or early stopping

tool dominates repeated tool activity, so the workflow is likely doing incremental calls where batching, caching, or tighter stop conditions would reduce churn.

$0.46projected / month savedsample $0.14/mo
medium riskmedium confidence
Recommended first move

Batch or cache repeated tool calls where the inputs overlap across adjacent steps.

Learn more
More recommended changes
  • Add a per-run tool budget and stop condition so failed runs do not keep exploring after the likely answer is already unreachable.
Tool error loop

The same tool error repeats instead of triggering a new plan

Self-correction is not bounded tightly enough. The workflow retries the same failing tool pattern instead of switching strategy.

$0.37projected / month savedsample $0.11/mo
high riskhigh confidence
Recommended first move

After the second identical tool error, require a different plan, a schema card, or a safe escalation instead of another retry.

Learn more
What we saw
  • 122 repeated error results were observed across 3 runs.
  • The normalized error signature is "error: reservation : not found".
  • 3,660 chars of error output were copied back into the workflow.
More recommended changes
  • Add an actionable error contract for tool so the model receives allowed next steps, not raw stack text.
  • Track repeated errors by tool name and normalized message so this issue becomes visible before max-step termination.
Evidence (1)
DataRepeated tool error
Error: Reservation : not found
Output contract mismatch

Users are correcting missing fields or output shape

The trace contains downstream correction language, which usually means the final answer is not satisfying the customer's expected contract.

$0.30projected / month savedsample $0.09/mo
medium riskhigh confidence
Recommended first move

Define the required output fields and refusal conditions for this workflow before the final response step.

Learn more
What we saw
  • 10 correction-like user messages appeared after an assistant response.
  • 4 of 87 analyzed runs had at least one correction signal.
More recommended changes
  • Validate final answers against the output contract and route missing fields back through a cheap repair step.
  • Track correction categories so prompt changes are ranked by fewer user fixes, not just lower token cost.
Evidence (1)
DataUser correction signal
Sure, my user ID is sophia_taylor_9065 and my reservation number is PEP4E0. Can you please check if the insurance is there? I really need to have it added if it’s missing.

Prompt composition

Input token breakdown

56.9K tokens
user56.9K · 100%

Tool signals

How this workflow runs

Retries0

How often steps had to re-run.

Delegated subtasks0

Tasks handed off to sub-agents during the workflow.

Documents retrieved0

Total documents pulled in across all tool calls.

Median step latency0 ms

Typical time each step takes to finish.

Stage order

Typical workflow path

3 steps
  1. Loop×2

    Loop: respond → plan → tool — repeats 2 times.

    Latency unavailable772 tok avg
    1. 1
      RespondKimi-K2

      Respond step in the workflow.

      Latency unavailable218 tok avg
    2. 2
      PlanKimi-K2

      Plan the next steps in the workflow.

      Latency unavailable84 tok avg
    3. 3
      Tooltool

      Tool step in the workflow.

      Latency unavailable84 tok avg
  2. Loop×97

    Loop: plan → tool — repeats 97 times.

    Latency unavailable16.3K tok avg
    1. 1
      PlanKimi-K2

      Plan the next steps in the workflow.

      Latency unavailable84 tok avg
    2. 2
      Tooltool

      Tool step in the workflow.

      Latency unavailable84 tok avg
  3. Verify

    Verify step in the workflow.

    Latency unavailable135 tok avg

Threads

Pick a thread to see what happened

87 threads
Cost per run$0.08
Monthly runs1
Monthly cost$0.08
Operation path41 named tool/models
  1. 1
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    7 tok · $0.00
  2. 2
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    167 tok · $0.00
  3. 3
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    76 tok · $0.00
  4. 4
    toolTool
    Run tooltool
    193 tok
  5. 5
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    106 tok · $0.00
  6. 6
    toolTool
    Run tooltool
    165 tok
  7. 7
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    92 tok · $0.00
  8. 8
    toolTool
    Run tooltool
    167 tok
  9. 9
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    79 tok · $0.00
  10. 10
    toolTool
    Run tooltool
    151 tok
  11. 11
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    504 tok · $0.01
  12. 12
    toolTool
    Run tooltool
    3 tok
  13. 13
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    162 tok · $0.00
  14. 14
    toolTool
    Run tooltool
    3 tok
  15. 15
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    115 tok · $0.00
  16. 16
    toolTool
    Run tooltool
    115 tok
  17. 17
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    115 tok · $0.00
  18. 18
    toolTool
    Run tooltool
    121 tok
  19. 19
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    394 tok · $0.01
  20. 20
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    176 tok · $0.00
  21. 21
    toolTool
    Run tooltool
    30 tok
  22. 22
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    176 tok · $0.00
  23. 23
    toolTool
    Run tooltool
    354 tok
  24. 24
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    176 tok · $0.00
  25. 25
    toolTool
    Run tooltool
    67 tok
  26. 26
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    176 tok · $0.00
  27. 27
    toolTool
    Run tooltool
    8 tok
  28. 28
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    500 tok · $0.01
  29. 29
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    373 tok · $0.01
  30. 30
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    46 tok · $0.00
  31. 31
    toolTool
    Run tooltool
    5 tok
  32. 32
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    146 tok · $0.00
  33. 33
    toolTool
    Run tooltool
    7 tok
  34. 34
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    57 tok · $0.00
  35. 35
    toolTool
    Run tooltool
    56 tok
  36. 36
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    146 tok · $0.00
  37. 37
    toolTool
    Run tooltool
    12 tok
  38. 38
    PlanPlan
    claude-opus-4-5claude-opus-4-5
    146 tok · $0.00
  39. 39
    toolTool
    Run tooltool
    13 tok
  40. 40
    RespondRespond
    claude-opus-4-5claude-opus-4-5
    229 tok · $0.00
  41. 41
    dataset evaluationTool
    Run dataset_evaluationdataset_evaluation
    100 tok
  42. 42
    Imported benchmark outcomeVerify
    Imported benchmark outcome
    121 tok

The old plan/tool string was the normalized span order. Rows above use imported operation records; when a tool name is missing, the source only provided the normalized stage and operation label.

Snapshots
full_transcriptSnapshot 1 · imported

Hi! How can I help you today? Hi there! I’d like some help with a few of my upcoming reservations. I need to cancel two of them and change another one to a nonstop flight if that’…