Supernova | Dashboard

SummarySpend, savings, accuracy, and core charts CostSpend reporting and savings opportunities AccuracyOutcome coverage and improvement signals

Accuracy reporting

Outcome signal coverage

reward

Outcome coverage100%

300/300 sessions have score evidence

Selected source100%

reward coverage

Observed success56.7%

56.7%

Low-confidence scores550

3,880 total score records

LLM judge

Workflow evaluation

estimate pending

WorkflowJudge model

Sample size

Threads

Total estimate—Claude Sonnet 4.6

Discovery——

Final run——

Range—pricing source

Selected score source

Outcome distribution

100% selected-source coverage

Success170

Failure130

Partial0

Unknown0

Analyzed sample

Quality by workflow

5 workflows

WorkflowCoverageSuccessOutcome gapTop signal

Flight Reservation Modificationderived-flight-reservation-modification100%52.6%47.4 ppOutcome cohort gap Flight Cancellation Refundderived-flight-cancellation-refund100%60.9%39.1 ppOutcome cohort gap Otherderived-other100%59.6%40.4 ppOutcome cohort gap Flight Bookingderived-flight-booking100%45.8%54.2 ppOutcome cohort gap Flight Delay Compensationderived-flight-delay-compensation100%66.7%33.3 ppOutcome cohort gap

Accuracy opportunities

What could improve outcomes

25 signals

Tool misuse

55 of 116 runs fail at full cost. Adding early exit checks before heavy tool calls could cut waste fast.

NextAdd a preflight prompt step to validate inputs before any tool calls run.

105 workflows

Outcome cohort gap

47% of runs fail or partially pass. Failed runs average 19.8 tool calls and 934 input tokens versus 9.9 calls and 565 tokens for successful runs.

NextSet a tool call limit per run to stop runaway loops early.

55 workflows

Flight Reservation Modification47.4 pp gaphigh riskhigh confidence Flight Cancellation Refund39.1 pp gapmedium riskhigh confidence Other40.4 pp gaphigh riskhigh confidence Flight Booking54.2 pp gaphigh riskhigh confidence Flight Delay Compensation33.3 pp gapmedium riskhigh confidence

Tool error loop

The agent retried a broken tool call 193 times across 2 threads without changing its approach, burning context and making no progress.

NextAdd a circuit breaker that stops retrying after 3 identical errors.

55 workflows

Flight Reservation Modification193 errorshigh riskhigh confidence Flight Cancellation Refund122 errorshigh riskhigh confidence Other89 errorshigh riskhigh confidence Flight Booking23 errorshigh riskhigh confidence Flight Delay Compensation54 errorsmedium riskhigh confidence

Output contract mismatch

Users are fixing incomplete or incorrectly shaped responses from your assistant. This suggests the assistant isn't following its output rules consistently. A validator can catch these problems before users see them.

NextDefine exact output shape: required fields, data types, and format rules

55 workflows

Flight Reservation Modification8.6 pp correctedmedium riskhigh confidence Flight Cancellation Refund4.6 pp correctedmedium riskhigh confidence Other9.6 pp correctedmedium riskhigh confidence Flight Booking33.3 pp correctedmedium riskhigh confidence Flight Delay Compensation28.6 pp correctedmedium riskhigh confidence