Audit Prompt Quality · GoingNinja Manual Knowledge Wiki

1. Why Prompt Quality Matters

The system does not rely on one giant review prompt.

Audit quality depends on four things staying explicit:

which files are in scope
which reviewer role is active
which failure modes the reviewer should look for
which output format the reviewer must return

If those points stay vague, the audit becomes hard to trust and hard to automate.

2. What The Official Docs Actually Support

Tool	Official support we rely on	Why it matters for audit prompts
current proof builder surface	OpenAI documents Codex as a public coding-agent surface with repo context and task execution: Codex developer overview, Codex product page	prompts should be scoped, concrete, file-aware, and structured rather than conversational
Claude Code	Anthropic documents `CLAUDE.md` project memory, structured memory, and `.claude/settings.json` as the official settings surface: Claude Code memory, Claude Code settings	review prompts should stay short because stable project context belongs in repo memory and settings, not repeated inside every audit prompt
Gemini CLI	Google documents non-interactive `-p`, settings-based behavior, and the repo-based `GEMINI.md` pattern in the official CLI repository: google-gemini/gemini-cli	audit prompts should be compatible with headless CLI execution, scripting, and explicit context loading

The official docs do not tell us to use one universal mega-prompt for UX, content, systems, and investor story at once. That multi-goal pattern is a platform choice, and it is a bad one.

3. GoingNinja Audit Prompt Rules

These are platform rules derived from the official tool surfaces above.

3.1 Prompt Ownership And Role Handoff

GoingNinja treats prompt creation and audit execution as two different jobs.

Default handoff:

the current proof builder surface scopes the change, assembles the exact file list, and writes the audit prompt
Claude Code receives that prompt as the hard-review lane and returns blocking findings
Gemini CLI receives that prompt as the broad-reading lane and returns contradiction or scope-risk findings

Why this split exists:

OpenAI documents Codex as a public coding-agent surface for scoped repo work
Anthropic documents stable Claude project context in CLAUDE.md and .claude/settings.json, which makes Claude a better fit for repeatable review against a repo contract
Google documents Gemini CLI as a scriptable non-interactive CLI surface, which makes explicit one-lane audit runs and contradiction scans practical

Prompt authorship by the builder CLI is therefore a GoingNinja convention derived from these primitives, not a vendor-mandated role. The builder does not invent the audit criteria from taste. It translates the human-signed repo contract such as PLAN.md, REQUIREMENTS.json, and the declared review lane into a machine-runnable prompt for the reviewer tools.

GoingNinja therefore does not let the auditor invent its own role on the fly. The role is assigned by the prompt author, and the auditor stays inside that lane.

3.2 One Prompt, One Lane

Each prompt audits one lane only:

UX and information architecture
content and documentation precision
systems and vendor-proof

Reason:

OpenAI recommends well-scoped tasks
Claude project memory already carries shared context
Gemini works best in explicit non-interactive runs when the goal is narrow

3.3 Exact File List

Every audit prompt must name the exact files under review.

That keeps the audit:

reproducible
diffable
compatible with headless CLI use

Without an explicit file list, the reviewer can silently read too much or too little.

3.4 Explicit Reviewer Role

Every prompt must name the reviewer lens:

information architect
technical documentation architect
systems engineer

This is a platform convention, not a vendor primitive. The reason is simple: it forces one clear evaluation standard per run.

3.5 Explicit Standards

Every prompt must say what “good” means.

Examples:

readable by professors
precise enough for senior engineers
credible for operators
not decorative

This prevents vague praise and vague criticism.

3.6 Explicit Failure Modes

Every prompt should name the concrete failure modes it cares about:

duplicated explanations
broken read path
weak diagrams
vendor claims stated too strongly
stale cross-links
claims stronger than proof
sticky navigation that only works in one viewport state
compact menus that stay open after navigation or hide the destination content
sticky header and sticky sidebar gaps caused by bad shared offsets
desktop menus that disappear after a compact-to-desktop resize
compact menu shells whose width or spacing drifts away from the shared shell contract

This follows the same logic as the rest of GoingNinja: review should be driven by observable failure modes, not by taste alone.

3.7 Fixed Output Shape

Every audit prompt must force a small deterministic output:

Verdict
Findings
What Works
one best rewrite or closure pass

This is not a vendor requirement. It is a platform rule that makes:

repeated audits comparable
closure easier
CI integration possible later

3.9 One Deterministic Workspace For System Lanes

If an audit lane spans multiple repos, the prompt must not point the reviewer CLI at a giant parent directory and hope the tool infers the right boundary.

For system lanes and other multi-repo audits, GoingNinja stages one copied workspace from the frozen snapshot first.

Why this rule exists:

headless CLIs often apply workspace boundaries, startup scans, or path safety checks
symlinked or sprawling parent volumes create false negatives that look like audit findings but are really execution-surface bugs
a copied staged tree gives Claude and Gemini the same deterministic path set

The platform path is:

npm --prefix /absolute/path/to/goingninja-os run audit:stage -- \
  --name wiki-system-audit \
  /absolute/path/to/goingninja-manual \
  /absolute/path/to/goingninja-os \
  /absolute/path/to/example-product

The resulting AUDIT_WORKSPACE.json becomes the machine-readable record of which copied repos and snapshot refs were actually reviewed.

3.8 The Auditor Does Not Rewrite The Contract

The prompt may ask Claude or Gemini to criticize the current contract, but it must not leave the reviewer free to redefine:

which files are in scope
which lane is active
whether the run is blocking or non-blocking
who owns merge authority

Those points belong to the platform contract, not to the auditor's mood in one run.

4. What We Avoid On Purpose

GoingNinja does not use these audit-prompt patterns:

one mega-prompt that mixes UX, content, systems, strategy, and marketing
prompts that omit the file list
prompts that ask for “thoughts” instead of findings
prompts that let the model choose its own output format
prompts that repeat stable repo rules instead of relying on AGENTS.md, CLAUDE.md, GEMINI.md, and settings

These anti-patterns produce long but low-signal output and make closure hard.

5. Current Prompt Contract

The active prompt family in this repo follows the rules above:

a current UX/UI surface audit prompt
a current systems surface audit prompt
a current UX closure prompt
a current content closure prompt

The canonical prompt files live under docs/audits/prompts/ in the repo. Older 2026-04-16 prompt files remain there as archived predecessors, not as the primary current naming family.

Each of them:

fixes the file list
fixes the reviewer role
fixes the output shape
avoids mixed audit goals

6. Headless And CI Constraints

Prompt quality is not only about wording. It also has to match the execution surface.

For headless audits, the prompt must be compatible with:

claude --print ... or the current non-interactive Claude Code equivalent documented by Anthropic for the CLI version in use
gemini -p ...

Google explicitly documents non-interactive gemini -p use in the official Gemini CLI repository. That matters because an audit prompt that only works interactively is not yet CI-ready.

Anthropic documents that stable project context belongs in CLAUDE.md and .claude/settings.json. That matters because repeating the full repo policy in every audit prompt makes headless runs heavier and more fragile than necessary.

For system lanes, headless compatibility also means the filesystem boundary must be deterministic. If the reviewer cannot traverse the same copied workspace that the prompt names, the execution surface is still wrong even if the prompt wording looks good.

7. Practical Rule For GoingNinja Operators

In practice, the working rule is:

the builder surface prepares and scopes the audit in the current proof
Claude and Gemini receive a defined reviewer role
the human stays the final merge owner

This keeps the audit deterministic and prevents “agent theater,” where one tool improvises builder, reviewer, and approver roles at the same time.

8. Closure Writeback

External audits are not the final memory of the system.

If a bug class repeats across audit rounds, the closure is incomplete until the lesson is written back into:

AGENTS.md
INVARIANTS.md
a reusable template
or a deterministic check

That is how GoingNinja keeps future repos from relearning the same lesson by pain.

9. Closure Mode And Audit Caps

Prompt quality is not only about wording. It is also about when to stop using a normal audit prompt and switch to a closure prompt.

GoingNinja now uses this rule:

if one audit lane returns passable repeatedly
and the remaining issues are medium/low contract points rather than a new architecture failure

then the next prompt must be a closure prompt, not just another ordinary review prompt.

That closure prompt must ask for:

the full remaining blocker list to reach strong
separation of required fixes from optional polish
an explicit answer on whether the current snapshot is already strong enough

Scope must be declared before the first external run:

single-surface: one runtime surface or one static builder, with no database/auth/provider-plane change.
multi-surface: two or three tightly coupled surfaces in one repo; shared route, asset, or security contracts are allowed.
system: anything larger, or anything with cross-repo/provider/data-plane change.

Audit caps then follow that declared scope:

single-surface: up to three normal audit rounds, then one closure-mode audit, then one confirmation audit after the closure fixes.
multi-surface: up to four normal audit rounds, then one closure-mode audit, then one confirmation audit after the closure fixes.
system: no fixed numeric cap, but two repeated passable verdicts in the same lane force a prompt/process rewrite before more runs.

If the applicable cap does not converge, the prompt/process contract must be rewritten before more runs are burned.

This is no longer prose alone. The active lane must be recorded in AUDIT_STATE.json with:

scope class
frozen snapshot ref
current mode
normal round count
closure round count
confirm round count
repeated passable streak for system lanes

That file is the mechanical exit path for capped audits.

After one complete Claude closure list on a fresh snapshot, the next Claude step should usually be a narrow confirm on the exact prior blockers, not another broad closure rerun. If the broad claude --print lane is silent for too long, reduce Claude effort on that narrow confirm instead of reopening a full audit lane.

10. Why This Matters

The failure mode is simple:

an ordinary audit prompt is optimized to find the next issue
a closure prompt is optimized to say what still truly blocks strong

If we keep using the first kind of prompt too long, the auditor can keep finding one more small point forever.

That is not rigor. That is a broken stopping condition.