Failed-Run Review

Every failed run,
reviewed. In 24 hours.

Thresor is the review layer for commercial cleaning robot fleets. We label every failed or ambiguous run, flag what needs to escalate, and send you a weekly summary of where your fleet is breaking — without asking you to change your stack.

Start a 4-week pilot→ See a sample report →

SLA

24 hours

Pilot scope

4 weeks · up to 500 runs

Integration

Zero-change — you send, we return

The problem

Fleet review is eating your ops team.

Cleaning robots generate too many failed or unclear runs. Someone expensive inside your company is watching the videos, guessing at the labels, and losing the pattern.

× Current state

Engineers and deployment leads review video by hand
Labels are inconsistent run-to-run and week-to-week
Repeat failure modes go un-identified for weeks
Feedback loops to product and ops are slow and noisy
Nobody owns "is this a site issue or a software issue?"

✓ With Thresor

You forward failed runs to us — we do the review
Consistent rubric, gold-set, double-reviewed samples
Per-run labels returned within 24 hours
Weekly summary shows repeat modes and site patterns
Your team reads the report; we do the watching

How it works

Four steps. One SLA.

No new dashboards, no new API. Send us the runs you already flag; we return structured output in the format you already use.

01

You send

Daily or batched export of failed or ambiguous runs: mission video or replay, basic event logs, task metadata, site and robot ID, and your SOP or review rubric.
02

We review

Our review team watches the video, reads the logs, and applies structured labels against the agreed rubric. Unclear cases escalate to lead review.
03

You receive

Per-run output: success/failure decision, failure type, severity, escalation flag, reviewer note, root-cause bucket. Returned in spreadsheet, CSV, Airtable, or your format.
04

We summarize

Weekly failure summary: top modes, repeat sites, robots with unusual failure rates, operational vs. product patterns, and concise action recommendations.

Deliverables

Structured data, on your schedule.

Every reviewed run returns a consistent row. Every Monday, a one-page failure summary lands in your inbox.

Sample · per-run output

thresor_week_17.csv

Mission	Decision	Failure type	Severity	Escalate	Note
M-40281	Failure	Blocked aisle	Low	—	Pallet left in zone 4B after 17:40 shift.
M-40284	Failure	Navigation failure	High	Yes	Robot R-12 stuck at same dock pillar 3rd time this week.
M-40287	Success	—	—	—	Initially flagged; review confirms complete clean.
M-40291	Failure	Mapping / site change	Med	Yes	New rack layout in B-row; map needs refresh.
M-40293	Unclear	Needs technical review	Med	Yes	LiDAR returns inconsistent — recommend on-site check.

01 · Per-run labels

Consistent, rubric-driven labels

Success/failure, failure type, severity, escalation flag, reviewer note, and root-cause bucket — in your format.

02 · Escalation flags

Know what needs a human today

High-severity, repeat-mode, and safety-relevant runs surface immediately so your on-call isn't reading every row.

03 · Weekly summary

The pattern, not the noise

Top failure modes, repeat sites, robots with unusual failure rates, and operational-vs-product signal. One page, Monday.

Quality controls

Why the labels hold up.

Shared rubric

Agreed before launch, not after the first disagreement.

Gold set

Canonical examples for each failure type, reviewed together.

Double review

Sample of live work is reviewed twice; agreement rate tracked.

Weekly calibration

30 minutes with your team to resolve edge cases and recalibrate.

Pricing

Start with a fixed-fee pilot.

One workflow, one SLA, one volume cap. If it doesn't save your team more hours than it costs, we shouldn't expand it.

4-week pilot

$7,500 – $10,000

Fixed fee. Scope and price set before kickoff and doesn't change during the pilot.

✓ Up to 500 failed or ambiguous runs reviewed
✓ One task family — e.g. warehouse cleaning
✓ 24-hour turnaround on standard queue
✓ Shared rubric, gold set, calibration
✓ Weekly failure summary
✓ Delivery in spreadsheet, CSV, Airtable, or agreed format

Start a pilot→

Post-pilot pricing set on volume.

Tell us what's breaking.

30-minute working session: align on the first rubric, confirm data fields, pick a start date. No deck required.

Every failed run, reviewed. In 24 hours.