Skill example · logbooks marketplace

deep-code-review

Hotspot-first deep code review for PRs, branches, and diffs.

/plugin install deep-code-review@logbooks /deep-code-review GitHub →

What it does

deep-code-review is optimized to be right about a few important things rather than produce many comments. A run models behavior changes from the diff, picks risky hotspots, fans out per-hotspot lens subagents, runs a skeptic pass, and surfaces at most 5 outputs as findings or questions. Each run is persisted to a per-PR SQLite ledger and a per-run JSONL trace under ./.logbooks/code-review/ in the reviewed repo, so follow-up reviews can see what was already flagged. Formatting-only changes, trivial renames, and speculative style comments are ignored by default.

How it does it

Gather inputs

Resolve the review target (PR, branch, pasted diff, or current WIP) into a stable PR_REF, then initialize the SQLite ledger and JSONL trace under ./.logbooks/code-review/.

JSONL trace{ type: "run", run_id, pr_ref, target_repo, started_at }

Build change map + select hotspots

Read the diff once, classify edits into archetypes (guard-removed, public-contract-changed, persistence-schema-changed…), then pick up to 8 risky changed units worth focused review.

SQLite hotspots rowhotspot_id, file_path, symbol, lines, change_archetypes, risk_tags, why_selected

Pick lenses per hotspot

Choose review lenses for each hotspot from a fixed catalog (correctness, security, concurrency, performance, api-contract, …). Correctness and maintainability are always-on.

JSONL trace{ type: "hotspot", hotspot_id, lenses: ["correctness", "security", ...] }

Generate candidate findings (parallel fan-out)

One subagent per (hotspot × lens) combination acquires minimal local context, then emits findings or questions with evidence, severity, and a local confidence score.

Parallel fan-out: one subagent per (hotspot × lens) combination.

SQLite candidate_findings rowscandidate_id, hotspot_id, lens, output_type, summary, evidence, severity, confidence_local

Skeptic + dedup + priority score

A skeptic pass challenges each candidate, root-cause fingerprints collapse duplicates within the run, and a multi-factor priority score (0–100) ranks survivors.

Patch: candidate row updatesdetection_state ∈ {selected, dropped, duplicate-in-run}, priority_score (0–100)

Persist + report

The top-ranked survivors (≤5) are surfaced as PR comments or a chat report; everything else stays in the ledger as suppressed judgment for future runs.

JSONL trace + PR output{ type: "output", surfacing_state: "posted" } + ≤5 PR comments emitted

Schema overview

Table / Stream	Purpose
`hotspots` (SQLite)	Planning — which units got reviewed, why selected, which lenses applied
`candidate_findings` (SQLite)	Judgment — every candidate the model produced, with evidence + severity + state
JSONL trace	Action log — `run`, `hotspot`, `candidate`, `decision`, `output`, `pr_comment_dedup` records
(computed view)	Surfacing — only the ≤5 candidates with `surfacing_state = posted`

Four concerns kept separate: trace, judgment, planning, presentation. (findings.logbook.md:40-43)

deep-code-review

What it does

How it does it

Gather inputs

Build change map + select hotspots

Pick lenses per hotspot

Generate candidate findings (parallel fan-out)

Skeptic + dedup + priority score

Persist + report

Schema overview

Go deeper