Skill example · logbooks marketplace

deep-code-review

Hotspot-first deep code review for PRs, branches, and diffs.

/plugin install deep-code-review@logbooks /deep-code-review GitHub →

What it does

deep-code-review is optimized to be right about a few important things rather than produce many comments. A run models behavior changes from the diff, picks risky hotspots, fans out per-hotspot lens subagents, runs a skeptic pass, and surfaces at most 5 outputs as findings or questions. Each run is persisted to a per-PR SQLite ledger and a per-run JSONL trace under ./.logbooks/code-review/ in the reviewed repo, so follow-up reviews can see what was already flagged. Formatting-only changes, trivial renames, and speculative style comments are ignored by default.

How it does it

01

Gather inputs

Resolve the review target (PR, branch, pasted diff, or current WIP) into a stable PR_REF, then initialize the SQLite ledger and JSONL trace under ./.logbooks/code-review/.

JSONL trace{ type: "run", run_id, pr_ref, target_repo, started_at }
02

Build change map + select hotspots

Read the diff once, classify edits into archetypes (guard-removed, public-contract-changed, persistence-schema-changed…), then pick up to 8 risky changed units worth focused review.

SQLite hotspots rowhotspot_id, file_path, symbol, lines, change_archetypes, risk_tags, why_selected
03

Pick lenses per hotspot

Choose review lenses for each hotspot from a fixed catalog (correctness, security, concurrency, performance, api-contract, …). Correctness and maintainability are always-on.

JSONL trace{ type: "hotspot", hotspot_id, lenses: ["correctness", "security", ...] }
04

Generate candidate findings (parallel fan-out)

One subagent per (hotspot × lens) combination acquires minimal local context, then emits findings or questions with evidence, severity, and a local confidence score.

Parallel fan-out: one subagent per (hotspot × lens) combination.
SQLite candidate_findings rowscandidate_id, hotspot_id, lens, output_type, summary, evidence, severity, confidence_local
05

Skeptic + dedup + priority score

A skeptic pass challenges each candidate, root-cause fingerprints collapse duplicates within the run, and a multi-factor priority score (0–100) ranks survivors.

Patch: candidate row updatesdetection_state ∈ {selected, dropped, duplicate-in-run}, priority_score (0–100)
06

Persist + report

The top-ranked survivors (≤5) are surfaced as PR comments or a chat report; everything else stays in the ledger as suppressed judgment for future runs.

JSONL trace + PR output{ type: "output", surfacing_state: "posted" } + ≤5 PR comments emitted

Schema overview

Table / StreamPurpose
hotspots (SQLite) Planning — which units got reviewed, why selected, which lenses applied
candidate_findings (SQLite) Judgment — every candidate the model produced, with evidence + severity + state
JSONL trace Action log — run, hotspot, candidate, decision, output, pr_comment_dedup records
(computed view) Surfacing — only the ≤5 candidates with surfacing_state = posted

Four concerns kept separate: trace, judgment, planning, presentation. (findings.logbook.md:40-43)

Go deeper