Skip to main content

Underwriting Review Console Implementation Plan

For agentic workers: Implement phase-by-phase. Each phase is its own branch and PR — do not bundle phases. Steps use checkbox (- [ ]) syntax for tracking. Do not edit this plan file while implementing; mark todos in_progress as you go.
Goal: Build an internal tool for the MFM team’s weekly underwriting review (CONTEXT.md → “Underwriting review”): pick recent credit pulls, replay each Decision in slow motion (gates + compounding multipliers), judge accuracy, tweak calibration knobs and see the effect across the whole batch, then emit a Cursor prompt + calibration-log entry that an agent applies as a code change. Architecture: The underwriting orchestrator (runUnderwriting) is already a pure function and is the only compute seam this tool needs. v1 adds no schema and no new mutations — reads are admin cross-location queries, what-if compute is pure (runUnderwriting runs client-side), and all persistence happens through the generated Cursor prompt (the browser never writes files or config). The one engine change is an optional what-if override seam (ADR docs/adr/0001) that lets the tool preview changes to hardcoded multiplier constants while staying byte-identical when unused. Tech Stack: Next.js App Router, React, TypeScript, Convex queries (no new mutations in v1), Bun tests, ESLint. Package manager: bun only. Read before starting:
  • CONTEXT.md → “Underwriting orchestration” and “Underwriting review” (vocabulary — use these terms exactly)
  • docs/adr/0001-underwriting-whatif-override-seam.md (the override seam contract + the “production must never pass this” constraint)
  • convex/_generated/ai/guidelines.md (Convex rules)
CI gates (every phase must pass before PR):
  • bunx tsc --noEmit → 0 errors
  • bun run lint → 0 errors (pre-existing warnings OK)
  • bun test → full suite green

Key facts the implementation depends on

  • Pure orchestrator: runUnderwriting(input: UnderwritingInput): UnderwritingDecision in convex/lib/runUnderwriting.ts. No ctx/IO. Importable from client code (the test suite imports it directly; so can the browser).
  • Trace already exists: decision.results.trace (convex/lib/buildUnderwritingTrace.ts) carries multipliers (score, newCards, derog, thinFile, latePayment, unsecuredRecent, oldBankruptcy, each { value, band, reason, inputs }), gates (derogAutoDecline, derogManualReview, newCardHardDecline, thinFileDQ, each { triggered, reason, thresholds }), pathRationale, and a flat thresholds snapshot. The replay is a presentation transform over this + the raw multiplier values on results.
  • Runtime knobs (changeable by passing a different thresholds): the ~27 fields of ResolvedThresholds (convex/lib/resolveThresholds.ts), incl. BASE_MULT_WITH_TIB/NO_TIB/BUSINESS.
  • Constant knobs (hardcoded; require the override seam to preview): in lib/types/funding-categories.tsSCORE_MULT_BANDS (gte760:1.2, gte710:1.1, gte680:1.0, gte650:0.85), NEW_CARD_MULT (0:1.0,1:0.9,2:0.7,3:0.4), THIN_MULT_CONFIG; plus inline literals in convex/lib/creditMetrics.ts — the 0.65^n old-BK penalty, the business-tier cutoffs, and the 760/710/680/650 score-band boundaries.
  • Ruleset discipline: RULESET_CODE_VERSION in convex/lib/rulesetVersion.ts (currently "2026.05.18-2"). Editing any constant knob in a real code change requires bumping this and updating golden snapshots. The override seam, when overrides are absent, must NOT change behavior → no bump.
  • Golden snapshots: tests/underwriting.test.ts (+ .snap), 12 persona fixtures in tests/fixtures/personas.ts, driven via runUnderwriting(buildUnderwritingInput(persona)) (PR #842). Reuse buildUnderwritingInput as the canonical input builder.
  • Input reconstruction: UnderwritingInput needs business inputs (statedIncome, timeInBiz, avgRevenue, avgBankDeposits) that are NOT in the bureau reportData blob. The stored underwritingResults.fullResults echoes these (see buildResults in convex/lib/estimateBuilder.ts); the data layer (Phase 3) must surface them so the input can be rebuilt faithfully. reportData itself is the creditData shape (scores/tradelines/inquiries/publicRecords/requestData/dti).
  • Admin auth: admin-internal pages use useAdminLocation() + Convex useQuery against queries gated by requireAdminAccess(undefined, adminLocationId) (NOT requireGHLInstallation, NOT Convex identity). Cross-location reads use the by_createdAt index (see convex/adminMetricsReads.ts).

File Map

Create
  • docs/adr/0001-underwriting-whatif-override-seam.md — already created.
  • convex/lib/underwritingOverrides.ts — types + defaults for the what-if override object; resolveMultiplierConfig(overrides?) returning the live config (defaults = current constants).
  • convex/lib/underwritingReviewReplay.ts — pure buildStageTimeline(decision) transform (ordered stages + running multiplier product + inline gates).
  • lib/underwriting/knobRegistry.ts — the typed knob registry (single source of truth for the tweak panel AND the prompt generator).
  • lib/underwriting/buildCalibrationPrompt.ts — pure builder: (session verdicts + knob changes) → { cursorPrompt, calibrationLogEntry }.
  • convex/underwritingReview.ts — admin cross-location queries (recent pulls + funded comparison + input reconstruction fields).
  • app/admin/internal/underwriting-review/page.tsx — the console.
  • app/admin/internal/underwriting-review/_components/* — picker, replay stepper, verdict control, tweak panel, batch-diff table, output panel.
  • underwriting/CALIBRATION_LOG.md — created on first real calibration entry (by the agent executing a generated prompt, not by this build).
  • Test files per phase (see phases).
Modify
  • convex/lib/creditMetrics.ts — accept optional multiplier-config override (defaults to current constants). Add the ADR-referencing comment at the seam.
  • convex/lib/runUnderwriting.ts — thread the optional override from UnderwritingInput into computeCreditMetrics. Add the ADR-referencing comment.
  • convex/lib/buildUnderwritingTrace.ts — only if the timeline needs a value not currently exposed (prefer reading existing fields).

Phase 1 — What-if override seam (engine foundation, no UI)

Branch/PR: feat/uw-override-seam. Keystone — must land first, fully isolated, with golden-snapshot proof. Files:
  • Create: convex/lib/underwritingOverrides.ts
  • Modify: convex/lib/creditMetrics.ts, convex/lib/runUnderwriting.ts
  • Create: tests/underwritingOverrides.test.ts
  • Step 1: Define the override type + resolver. In underwritingOverrides.ts, define MultiplierOverrides (all-optional: scoreMultBands?, newCardMult?, thinMultConfig?, oldBkPenaltyBase?, businessTierCutoffs?, scoreBandBoundaries?) and resolveMultiplierConfig(overrides?): ResolvedMultiplierConfig that deep-merges over the current constants from lib/types/funding-categories.ts. When overrides is undefined, it must return values ===-equivalent to today’s constants.
  • Step 2: Thread into computeCreditMetrics. Add an optional field to CreditMetricsInput (e.g. multiplierConfig?: ResolvedMultiplierConfig); replace the direct SCORE_MULT_BANDS / NEW_CARD_MULT / THIN_MULT_CONFIG / 0.65^n / business-tier / band-boundary literals with reads off the resolved config (defaulting via resolveMultiplierConfig(undefined) when absent). Add a comment: // What-if override seam — see docs/adr/0001. Production passes nothing; only the Underwriting Review Console passes overrides.
  • Step 3: Thread into runUnderwriting. Add optional multiplierOverrides?: MultiplierOverrides to UnderwritingInput; resolve once and pass to computeCreditMetrics. Same ADR comment at the seam.
  • Step 4: Prove byte-identical default behavior. Run bun test tests/underwriting.test.ts — all 12 golden snapshots must remain byte-identical (no .snap diff). Do NOT bump RULESET_CODE_VERSION.
  • Step 5: Test the override changes outputs. In underwritingOverrides.test.ts: assert (a) no override → Decision deep-equals baseline for ≥2 personas; (b) raising scoreMultBands.gte710 raises the estimated range for a 710–759 persona; (c) resolveMultiplierConfig(undefined) equals the constants.
Acceptance: golden snapshots unchanged; override path provably alters outputs; ADR comments present at both seams; CI green.

Phase 2 — Knob registry, replay timeline, prompt builder (pure libs, no UI)

Branch/PR: feat/uw-review-pure-libs. Files:
  • Create: lib/underwriting/knobRegistry.ts, convex/lib/underwritingReviewReplay.ts, lib/underwriting/buildCalibrationPrompt.ts
  • Create: tests/underwritingReviewReplay.test.ts, tests/knobRegistry.test.ts, tests/buildCalibrationPrompt.test.ts
  • Step 1: Knob registry. Export KNOBS: KnobDef[] where KnobDef = { id; label; kind: 'runtime' | 'constant'; file; symbol; defaultValue; min?; max?; step?; appliesTo: 'thresholds' | 'multiplierOverrides'; path }. Cover every runtime threshold (→ thresholds) and every constant knob (→ multiplierOverrides). file/symbol/path must be exact (drive the prompt generator).
  • Step 2: applyKnobChanges(base, changes). Pure: given baseline { thresholds, multiplierOverrides } and a list of { knobId, newValue }, return the modified thresholds and multiplierOverrides for re-running runUnderwriting.
  • Step 3: Replay timeline. buildStageTimeline(decision): Stage[] where each Stage = { id; label; band?; reason; multiplier?; runningProduct; gate?: { triggered; reason } }. Order: base mult → score → newCards → derog → thinFile → latePayment → unsecuredRecent → oldBankruptcy → clamp → range = averageLimit × rawMult. runningProduct compounds stage-by-stage and must equal rawMult at the clamp step. Gates render inline at the stage they relate to. Pure transform over decision.results (+ trace); no engine logic.
  • Step 4: Prompt builder. buildCalibrationOutput(session): { cursorPrompt: string; calibrationLogEntry: string }. Input: verdicts + chosen knob changes + motivating report ids + ruleset version. Output prompt must include, per changed knob: exact file/symbol/old→new (from registry); plus “bump RULESET_CODE_VERSION in convex/lib/rulesetVersion.ts”, “update affected golden snapshots in tests/underwriting.test.ts (expected to change: …; must NOT change: …)”, “append this entry to underwriting/CALIBRATION_LOG.md”, and the motivating reports.
  • Step 5: Tests. Timeline runningProduct matches rawMult; gates land on correct stages; registry paths resolve against real symbols (source-string assertion); prompt contains exact file/symbol/old→new + the 3 chore reminders.
Acceptance: all pure, unit-tested; registry paths verified against real source; CI green.

Phase 3 — Admin data layer

Branch/PR: feat/uw-review-data. Files:
  • Create: convex/underwritingReview.ts
  • Create: tests/underwritingReview.test.ts
  • Step 1: Recent-pulls query. listRecentPullsForReview (query): args { adminLocationId; sinceMs?; untilMs?; limit? }, requireAdminAccess(undefined, adminLocationId), scan creditReportRequests by_createdAt over the window (default last 7 days, cap e.g. 50). Join latest underwritingResults per report. Return list rows: requestId, contact name, createdAt, score, decision, estimated range, fundingTrack.
  • Step 2: Single-report hydration. getReportForReview (query): args { adminLocationId; requestId }. Return the reportData blob (= creditData) AND the reconstructed business inputs from underwritingResults.fullResults (statedIncome, timeInBiz, avgRevenue, avgBankDeposits, dti) plus the persisted Decision (for baseline comparison) so the client can rebuild UnderwritingInput and re-run.
  • Step 3: Funded comparison. Include, when present, the contact’s funded fundingPlans rows (sum of funded amount) so the UI can show estimated-range vs actual-funded. Bound the read (indexed by contact/location; .take(N), never unbounded .collect()).
  • Step 4: Tests. Source/behavior assertions for requireAdminAccess usage, the by_createdAt window, bounded reads, and that hydration returns the fields needed to rebuild UnderwritingInput.
Acceptance: admin-gated, cross-location, bounded reads (pre-commit linter clean); returns enough to faithfully reconstruct inputs; CI green.

Phase 4 — Console UI: picker + replay stepper (read-only)

Branch/PR: feat/uw-review-ui-readonly. Files:
  • Create: app/admin/internal/underwriting-review/page.tsx + _components/{ReportPicker,ReplayStepper,FundedComparison}.tsx
  • Create: tests/underwritingReviewUi.test.ts (source-wiring assertions, matching repo convention)
  • Step 1: Page shell under the admin-internal layout, useAdminLocation(), window controls (default last 7 days).
  • Step 2: Picker from listRecentPullsForReview — table of recent pulls with score/decision/range; row select hydrates via getReportForReview.
  • Step 3: Replay stepper — rebuild UnderwritingInput client-side, call runUnderwriting, feed buildStageTimeline, render a click-through stepper (advance one stage; show band/reason, stage multiplier, running product, inline gates; final range step). Use on-brand components (default gold Button, Sora/DM Sans). No raw multiplier jargon beyond what the stepper intentionally surfaces (this is an internal tool — exact values are fine here).
  • Step 4: Funded comparison panel when data exists.
Acceptance: select a real recent pull and step through its Decision; numbers reconcile with the persisted Decision; CI green.

Phase 5 — Verdicts + tweak panel + batch evaluation

Branch/PR: feat/uw-review-tweak-batch. Files:
  • Create: _components/{VerdictControl,TweakPanel,BatchDiffTable}.tsx, lib/underwriting/useReviewSession.ts (client session state)
  • Create: tests/reviewSession.test.ts
  • Step 1: Review session state (client-only, ephemeral in v1): selected report ids, per-report Review verdict (correct|too_high|too_low|approved_in_error|declined_in_error) + note, and the working knob changes.
  • Step 2: Verdict control per report (informed by the funded comparison; not auto-derived).
  • Step 3: Tweak panel driven by KNOBS — sliders/inputs with min/max/step; produces { thresholds, multiplierOverrides } via applyKnobChanges.
  • Step 4: Batch eval. On any knob change, re-run runUnderwriting (with overrides) across ALL reviewed reports; render BatchDiffTable: per-report before→after decision + range, moved-toward vs moved-away-from the recorded verdict, and regression flags for any report marked correct whose Decision changed.
  • Step 5: Persona guardrail. Also re-run the 12 personas (buildUnderwritingInput + overrides) and surface which golden snapshots would break.
Acceptance: a knob change shows batch-wide before/after with regression + persona-snapshot flags; CI green.

Phase 6 — Output generation

Branch/PR: feat/uw-review-output. Files:
  • Create: _components/OutputPanel.tsx
  • Create: tests/outputPanel.test.ts
  • Step 1: Wire buildCalibrationOutput to the session (verdicts + chosen knob changes + motivating report ids + current ruleset version).
  • Step 2: Output panel showing the generated Cursor prompt and calibration-log entry, each with copy buttons. Make explicit in the UI that applying the change is done by pasting the prompt into Cursor — the tool itself changes nothing in code or config.
  • Step 3: Test that the rendered prompt for a sample session contains the exact file/symbol/old→new for each changed knob plus the ruleset-bump, golden-snapshot, and calibration-log reminders.
Acceptance: a full session (pick → judge → tweak → verify batch) yields a copy-pasteable prompt that an agent can execute to make the code change, bump the ruleset, fix snapshots, and append the calibration log; CI green.

Out of scope for v1 (note for a future phase)

  • Persisting verdicts/sessions to Convex (would power “already reviewed” badges + cross-week history). v1 keeps sessions ephemeral and pushes the durable record into underwriting/CALIBRATION_LOG.md via the generated prompt.
  • Persisting knob changes as operator customValues (deliberately rejected — see ADR 0001; these are global calibration, not per-operator config).
  • Auto-applying the code change from the browser (deliberately rejected — all writes go through the Cursor agent).