A unified UI for three agent engine layers.
A single-pane Next.js cockpit for the work the Marketing Ops Agent used to do as a stack of Claude skills. Three engines — Identity (dedup, title hygiene, HubSpot orphans), Attribution (24-rule LeadSource decision tree), Sync (post-merge regression detection) — share a gather → reason → review → apply → log → eval → scan pattern. Python pipeline drives state, Next.js drives the review surface, JSONL drives the audit trail and the eval harness. The continuous learning loop is the durable part: when a rule-action-confidence stratum hits >95% agreement for two weeks, it graduates from supervised to auto-apply.
The skill bundle ran my RevOps for months. Attribution at 95%, dedup at 35+ rules, hygiene at 54% auto-fix, a weekly refresh handling the operational tax. The numbers were real. The workflow wasn't.
Five skills meant five places to look. Five places state lived, five places audit got written. Cron-style scheduling against Claude skills is awkward in the first place, and 'what rules am I overriding the most this week' was a question I could only answer by grepping memory at the terminal. The next move — supervised to auto-apply — needed a coverage number and an eval harness, not a feeling.
The agent was the right architecture for exploration. It wasn't the right architecture for the next step.
RevOps Cockpit is a local-only Next.js 14 app that consolidates the agent's work into a single review-and-apply surface. Three engines — Identity, Attribution, Sync — each follow the same shape: gather → reason → review → apply → log → eval → scan. Different domain, same skeleton.
Nothing about the agent gets rewritten. The Python pipeline is still the source of truth for anything that mutates Salesforce. The cockpit spawns scripts via child_process and streams stdout back over SSE. Rules live where they always lived — versioned Python modules, not app code. The cockpit's job is to surface what the engine proposes, let me approve or override, and write every interaction to JSONL for the learning loop downstream.
Python is the engine. Next.js is the surface that lets me steer it.
The engines:
Every Apply logs both the engine's original suggestion (engine_proposed) and my final value. The override scanner diffs them and groups divergences by (rule, field, before → after). The Continuous Learning page shows agreement windows + the rules where the engine and I disagree most.
The graduation criterion is explicit. When a (rule, action, confidence) stratum hits >95% agreement for two consecutive weeks, that stratum becomes a candidate for auto-apply. I supervise edges. The engine handles the routine. The cockpit is what makes the graduation decision provable instead of vibes-based.
I supervise edges. Engine handles the routine.
The cockpit went live mid-2026 against an attribution log with 256+ historical entries — the eval baseline. Identity dedup is running through nightly batches at 50 pairs per session. Attribution runs three times a day under launchd (9am, 12pm, 3pm). Sync regression detection is live. Title hygiene and HubSpot orphan archival are queued for the same treatment.
The work didn't change. The view on it did. The work now has a coverage number, an agreement metric, and a UI my future self can reason about. The agent layer is still there — same Python, same rules, same memory. It just has a face now.