How the Dreaming Engine Learned to Score, Remember, and Self-Correct

The lab runs a dreaming engine. Every day it reads goals, sessions, projects, and source intake — then writes a dream diary, proposes ideas, and sends a digest. The output lands in Obsidian and, for the urgent ones, Telegram.

The problem was that the engine had no memory. It proposed the same ideas in different words week after week. It never learned which proposals had been built, killed, or shipped. Its "promotion" system searched for the literal phrase "high confidence" in the model output. Its cards sat at status: review forever. And if a quiet week passed with few completed goals, the engine just skipped the whole run.

That felt like a creature grooming its shell without ever crossing open ground. The engine was producing proposals. It was not calibrating against reality.

So we rebuilt the pipeline across four tiers.

T1: The engine learned what it already said

The first change was ingestion. The prompt now receives the last 20 proposal titles and their final status — built, killed, or stale — with an instruction: do not re-propose killed ideas, evolve in-progress ones. A portfolio snapshot injects counts of active projects, open opportunity cards, WIP goals, and recent kills. The model now knows what capacity exists before it proposes anything.

Proposals must name a concrete artifact, a user it serves, and a falsifiable success signal. Project paths are config-driven instead of hardcoded. Mission context is protected when context overflows — low-value intake gets cut first. And quiet weeks no longer kill the run; the engine drops into low-activity mode and biases toward finishing and maintenance.

T2: Real scoring replaced keyword grep

The structural changes were the biggest leap. The model now emits structured JSON proposals with fields for title, insight, enablers, first steps, confidence, effort, impact, and novelty. A real score replaces the old "promotion" grep: mean(confidence, impact, novelty, 1-effort). Cards score above the configured promotion_threshold get promoted to Memory.md.

An accept subcommand links proposals to goals — writing goal_id into the card frontmatter without touching the goal pipeline. A standalone dream-backflow script syncs card status from goal outcomes, so proposals learn whether they shipped or stalled. The backlog now ranks cards using the configured weights that were always defined but never applied.

Old cards auto-archive after 14 days. Dreams.md rotates when it passes 150k characters. CrossRef queries are clean and unambiguous instead of picking up random headers.

T3: Interaction without leaving the session

Before the upgrade, the only way to interact with a proposal was to read the Telegram digest and then open the card in Obsidian. Now you stay in the session.

/dream proposal N surfaces any card with its score and body. /dream kill N marks a card as killed with a reason — the next dream run will not re-propose it. /dream refine N re-runs the full engine in evolve mode targeting that specific card. Four modes are available: evolve (polish an existing idea), contrarian (force one idea against current direction), kill-list (propose what to stop), and net-new (default invention).

Telegram digests now include title, one-line insight, effort/impact badges, and score. The engine scales from 1 to 5 proposals based on context richness. One slot per cycle is reserved for an out-of-focus domain from a rotatable list — so proposals do not cluster on the same topics every day.

T4: Hardening what runs

The final tier was operations. A model failure no longer kills the whole run. The engine falls back to a configured model_fallback, records the failure, and continues. Temperature is configurable. Cost and token telemetry flows into state.json.

Outcome telemetry lives in a new SQLite database (dreaming.db) that tracks runs, cards, and card events. A /dream calibrate command emits ship, kill, stale, and review rates — so you can see whether the engine is producing good ideas or noise. The prompt auto-tunes: if the kill rate exceeds 60%, a grounding directive gets injected into the next run.

Every ingested context — goals, sessions, inbox items, research notes — is wrapped in a BEGIN/END UNTRUSTED CONTEXT sandwich before reaching the model. Same-day re-runs are idempotent: the daily file overwrites cleanly, Dreams.md replaces its section instead of appending, and new cards deduplicate against today's by title similarity.

84 tests that never touch the vault

The whole upgrade shipped with a hermetic pytest suite — 84 tests that patch every engine path onto a per-test temp tree. No tests hit /root/Obsidian or the live config. They run in under a second. The engine went from ~640 lines to ~1,400, but the test suite is the real measure of whether the system is safe to change.

The dream diary from the first post-upgrade run caught the shift: "The agent is no longer exploring the cave. It is furnishing the rooms." The engine still produces proposals. But now it remembers what it proposed last week, knows whether it shipped, adjusts its temperature when it keeps getting killed, and never re-proposes what you already said no to.

The system is no longer grooming its shell. It is crossing open ground.