AI-Powered Testing Platform

6-Agent System for Legacy Modernization

Vertex AIClaude Sonnet 4.6PlaywrightFastAPIPostgres + pgvectorReactViteTailwindCloud RunCloud SQLIdentity-Aware ProxyWorkload Identity Federation

56/

Agents Closed-Loop

9,420

Indexed Symbols

148

Specs / Run

Platform Screenshots

multnomah-county-accessibility.app

Operator's chief-of-staff dashboard: 1 active tenant, 4 of 5 agents closed-loop, 223 items needing review, plus the OPERATOR agent's daily briefing on top findings and what to triage first

multnomah-county-accessibility.app

SME confirmation queue: 'Confirm what we extracted from SME conversations' — 45 confirmed sessions in pool, audit sample with per-session SME attribution (Rikki, Margretta) and confidence-banded promotion path

multnomah-county-accessibility.app

Knowledge Session Detail — UCR Medical Alert Workflow

Confirmed SME knowledge session expanded: full workflow context (UCR Medical Alert Service Request lifecycle), attendees (Rikki Thunstrom as primary SME, Loren as AI lead), and 13 extracted key facts that downstream TESTGEN will turn into Playwright specs

multnomah-county-accessibility.app

Per-tenant scoping: UCR Legacy view with Overview, Workflows, Work items, Knowledge, SME review, CODEX explorer, Code drift, Open questions, TESTGEN, and GUARDIAN — every agent surfaces its tenant-relevant state in one place

Overview

A 6-agent system that lets legacy systems stay safe to change. CODEX (codebase intelligence with 9,420 indexed symbols + 8 drift detectors) reads the existing code; KNOWLEDGE (typed extraction across 704 sessions with plain-language search) captures tribal knowledge from retiring SMEs before they walk out the door; TESTGEN (Playwright generation at v2 + auto-confirm policy moving 146 drafts/day) turns that knowledge into executable specs; GUARDIAN (regression watch + queued-runner pattern) runs the specs and flags drift; OPERATOR (the orchestrator — Mounika's chief of staff) synthesizes cross-agent state into a daily briefing with actionable proposals; INTEGRATION (Phase F) is designed-not-built. Privacy guardrails verified on real data — a two-stage PHI classifier (heuristic short-circuit + LLM second-pass) held 321 files for human review, and after-the-fact audits found 11 leaks the heuristic missed and the LLM caught — purged, with the routing config updated so they SKIP on re-ingest. Multi-tenant architecture from day one: RLS on every table keyed by tenant_id, per-agent service accounts, IAP gating, Workload Identity Federation (no service-account JSON keys). UCR is tenant 1; ACHP onboards in Phase G.

Impact & Results

5/6

Agents Closed-Loop

CODEX · KNOWLEDGE · TESTGEN · GUARDIAN · OPERATOR shipped

9,420

Indexed Symbols

CODEX codebase intelligence with 8 drift detectors

146 / day

Auto-Confirm Rate

drafts moved out of SME backlog (38% inbox clearance)

148 specs

Latest GUARDIAN Run

144 passed in 22 minutes against UCR QAT

PHI Leak Catches

missed by heuristic, caught by LLM second-pass, purged

1 → N

Tenants

UCR is tenant 1; ACHP onboards in Phase G

Key Features

6 specialized agents: CODEX (codebase intelligence + 8 drift detectors), KNOWLEDGE (typed SME extraction), TESTGEN (Playwright generation), GUARDIAN (regression watch), OPERATOR (orchestrator), INTEGRATION (designed)

TESTGEN v2 prompt iteration after live QAT runs — 208 drafts at 98.1% compile-clean under sequential-workflow + UCR hover-menu nav discipline

Auto-confirm policy moves 146 drafts/day out of SME review backlog via pattern-match — 38% inbox clearance without SME involvement, fully audited

GUARDIAN queued-runner pattern — browser "Trigger run" button writes to Postgres work queue; polling runner claims jobs and executes Playwright (~35s end-to-end)

OPERATOR deterministic rule engine reads cross-agent state and produces Mounika a prioritized daily briefing + Approve/Reject proposals — solves the "who orchestrates the orchestrator?" gap that breaks 5-specialist fleets at 20 tenants

Two-stage PHI privacy guardrails (heuristic short-circuit + LLM second-pass) validated on real data — 321 files held for review, 11 missed leaks caught + purged + routing updated

Multi-tenant from day one — RLS keyed by tenant_id, per-agent service accounts, IAP gating, Workload Identity Federation (no service-account JSON keys)

Stakeholder hub with open-questions surface — Rikki, Margretta, Antonio, Michelle answer tribal-knowledge questions async; 50 of 77 resolved

Next Project

UCR Modernization