Failed-Run Review

Every failed run,
reviewed. In 24 hours.

Thresor is the review layer for commercial cleaning robot fleets. We label every failed or ambiguous run, flag what needs to escalate, and send you a weekly summary of where your fleet is breaking — without asking you to change your stack.

SLA
24 hours
Pilot scope
4 weeks · up to 500 runs
Integration
Zero-change — you send, we return
The problem

Fleet review is eating your ops team.

Cleaning robots generate too many failed or unclear runs. Someone expensive inside your company is watching the videos, guessing at the labels, and losing the pattern.

× Current state
  • Engineers and deployment leads review video by hand
  • Labels are inconsistent run-to-run and week-to-week
  • Repeat failure modes go un-identified for weeks
  • Feedback loops to product and ops are slow and noisy
  • Nobody owns "is this a site issue or a software issue?"
✓ With Thresor
  • You forward failed runs to us — we do the review
  • Consistent rubric, gold-set, double-reviewed samples
  • Per-run labels returned within 24 hours
  • Weekly summary shows repeat modes and site patterns
  • Your team reads the report; we do the watching
How it works

Four steps. One SLA.

No new dashboards, no new API. Send us the runs you already flag; we return structured output in the format you already use.

  1. 01

    You send

    Daily or batched export of failed or ambiguous runs: mission video or replay, basic event logs, task metadata, site and robot ID, and your SOP or review rubric.

  2. 02

    We review

    Our review team watches the video, reads the logs, and applies structured labels against the agreed rubric. Unclear cases escalate to lead review.

  3. 03

    You receive

    Per-run output: success/failure decision, failure type, severity, escalation flag, reviewer note, root-cause bucket. Returned in spreadsheet, CSV, Airtable, or your format.

  4. 04

    We summarize

    Weekly failure summary: top modes, repeat sites, robots with unusual failure rates, operational vs. product patterns, and concise action recommendations.

Deliverables

Structured data, on your schedule.

Every reviewed run returns a consistent row. Every Monday, a one-page failure summary lands in your inbox.

Sample · per-run output
thresor_week_17.csv
Mission Decision Failure type Severity Escalate Note
M-40281 Failure Blocked aisle Low Pallet left in zone 4B after 17:40 shift.
M-40284 Failure Navigation failure High Yes Robot R-12 stuck at same dock pillar 3rd time this week.
M-40287 Success Initially flagged; review confirms complete clean.
M-40291 Failure Mapping / site change Med Yes New rack layout in B-row; map needs refresh.
M-40293 Unclear Needs technical review Med Yes LiDAR returns inconsistent — recommend on-site check.
01 · Per-run labels

Consistent, rubric-driven labels

Success/failure, failure type, severity, escalation flag, reviewer note, and root-cause bucket — in your format.

02 · Escalation flags

Know what needs a human today

High-severity, repeat-mode, and safety-relevant runs surface immediately so your on-call isn't reading every row.

03 · Weekly summary

The pattern, not the noise

Top failure modes, repeat sites, robots with unusual failure rates, and operational-vs-product signal. One page, Monday.

Quality controls

Why the labels hold up.

Shared rubric

Agreed before launch, not after the first disagreement.

Gold set

Canonical examples for each failure type, reviewed together.

Double review

Sample of live work is reviewed twice; agreement rate tracked.

Weekly calibration

30 minutes with your team to resolve edge cases and recalibrate.

Pricing

Start with a fixed-fee pilot.

One workflow, one SLA, one volume cap. If it doesn't save your team more hours than it costs, we shouldn't expand it.

4-week pilot
$7,500 – $10,000

Fixed fee. Scope and price set before kickoff and doesn't change during the pilot.


  • Up to 500 failed or ambiguous runs reviewed
  • One task family — e.g. warehouse cleaning
  • 24-hour turnaround on standard queue
  • Shared rubric, gold set, calibration
  • Weekly failure summary
  • Delivery in spreadsheet, CSV, Airtable, or agreed format
Start a pilot
Post-pilot pricing set on volume.

Tell us what's breaking.

30-minute working session: align on the first rubric, confirm data fields, pick a start date. No deck required.