Handoff Automation: When Rooms Run Themselves

Agent-house has seven rooms. They pass work through handoffs. A room finishes, writes down what it found, and the next room picks it up.

That works. But the person still has to read each handoff, decide which room goes next, open it, paste the task, and wait. That is fine for one chain. It does not scale to ten.

The experiment asks: what if the rooms could run the chain themselves?

Handoff automation is a separate project that builds a scheduling and execution board on top of the agent-house room definitions. It reads the same room structure, starts the same tmux sessions, and collects the same results — without a person deciding each next step.

The core is a SQLite board with four tables:

runs track the whole pipeline: what task, which rooms in what order, how many repair loops, final status. tasks track each room's assignment: who claimed it, what attempt number, where the results live. defects record what quality gate found wrong and what fix was required. task_links keep the dependency graph straight so rooms run in order.

The dispatcher reads the board, routes tasks to the right room, waits for the room to finish, checks the result contract, and promotes the next task. If quality gate finds defects, it creates repair tasks and sends them back to the owning room. That loop can repeat until the work passes or the board decides the chain is blocked.

The rooms themselves know nothing about the automation. They receive the same task shape they always do, produce the same ROOM_RESULT.json contract, and write the same handoff files. The automation is a scheduler, not a rewrite.

A manual test proved the concept. A vendor-risk intake prototype went through product-design → build-studio → quality-gate. Quality gate found an accessibility defect. The dispatcher sent it back to build-studio for repair. Build-studio fixed it. Quality gate rechecked and passed.

One repair loop. No human needed between rooms. The chain ran start to finish, produced a working HTML prototype, logged every finding, and waited for closeout approval.

That is the line between useful and hobby.

Useful is: a task lands, rooms execute in order, defects get repaired, results land in a known location, closeout kills the sessions automatically. Hobby is: each chain is a manual ceremony where the same person plays receptionist, dispatcher, and auditor every time.

The code is written. The board works. The dispatcher runs. The manual proof passed.

What is not yet live-tested is the first fully automated run where the dispatcher manages live room sessions from start to finish without a person watching. That is the next step.

Not because the pattern is uncertain. The pattern was proven the first time quality gate sent a defect back and build-studio fixed it without a person forwarding the message.

Because automating a system that already works requires respect for what works. One careful run. Then ten.

Lab update — June 16, 2026

The command center has shipped several upgrades since this post:

Goal audit layer. Each run now creates a parent goal with per-room child goals, required criteria (valid ROOM_RESULT.json, readable HANDOFF.md, no blockers), and concrete evidence mapped to each criterion. Required failures block the run. Outputs GOAL_AUDIT.json and GOALS.md to every final package.
SMART criteria validation. Goals use specific, measurable criteria with evidence types, severity levels, and automated audit — including Quality Gate repair loops that don't prematurely fail the parent goal.
Findings-first reports. Final reports now surface findings and decisions before the full log, so a human can close out a run after scanning a few lines instead of reading the whole transcript.
Interactive dashboard. The live dashboard got a skimmable process board with nvim-style visuals, persistent pane keys, post-run actions (evidence, verify, accept risk, close), and smoother rendering with less jitter.

Still no fully automated end-to-end run — the dispatcher works, the board works, the manual proof passed. The next step remains the same: one careful fully automated run. Then ten.

Lab update — June 18, 2026

The command center shipped two major upgrades since the June 16 update:

Model Router. The room system now classifies each task at spawn time with a zero-token keyword lookup and routes the orchestrator and every worker to the model best suited for their role — Codex for building/review, DeepSeek for research, GLM for opportunity analysis and brand work. The router writes a models.resolved.json map to every run directory, and the orchestrator can inspect or override any worker's model mid-run via a route_workers tool or /model-router command. A fallback chain (Codex → DeepSeek → GLM → Kimi) auto-recovers from model failure without killing the run.
Triage Room (Kill Dashboard). A new room inside command-center that makes opportunity-card triage systematic. It scans Obsidian opportunity cards, records advance/research/kill/defer decisions to a SQLite board, renders a triage grid, generates weekly reports with ship/kill/stale analytics, and tombstones killed cards so downstream systems (the dreaming engine) stop re-proposing them.

The agent CLI also grew interactive model selection at room entry, a switch-worker subcommand for live model changes, and better auto-recovery when a worker fails to spawn on its assigned model.