GoingNinja Manual Knowledge Wiki logo
GoingNinja Manual Knowledge Wiki Lean MVP bootstrap documentation // Felix Rascher (WIP)

1. Why Prompt Quality Matters

The system does not rely on one giant review prompt.

Audit quality depends on four things staying explicit:

  • which files are in scope
  • which reviewer role is active
  • which failure modes the reviewer should look for
  • which output format the reviewer must return

If those points stay vague, the audit becomes hard to trust and hard to automate.

2. What The Official Docs Actually Support

Tool Official support we rely on Why it matters for audit prompts
current proof builder surface OpenAI documents Codex as a public coding-agent surface with repo context and task execution: Codex developer overview, Codex product page prompts should be scoped, concrete, file-aware, and structured rather than conversational
Claude Code Anthropic documents CLAUDE.md project memory, structured memory, and .claude/settings.json as the official settings surface: Claude Code memory, Claude Code settings review prompts should stay short because stable project context belongs in repo memory and settings, not repeated inside every audit prompt
Gemini CLI Google documents non-interactive -p, settings-based behavior, and the repo-based GEMINI.md pattern in the official CLI repository: google-gemini/gemini-cli audit prompts should be compatible with headless CLI execution, scripting, and explicit context loading

The official docs do not tell us to use one universal mega-prompt for UX, content, systems, and investor story at once. That multi-goal pattern is a platform choice, and it is a bad one.

3. GoingNinja Audit Prompt Rules

These are platform rules derived from the official tool surfaces above.

3.1 Prompt Ownership And Role Handoff

GoingNinja treats prompt creation and audit execution as two different jobs.

Default handoff:

  • the current proof builder surface scopes the change, assembles the exact file list, and writes the audit prompt
  • Claude Code receives that prompt as the hard-review lane and returns blocking findings
  • Gemini CLI receives that prompt as the broad-reading lane and returns contradiction or scope-risk findings

Why this split exists:

  • OpenAI documents Codex as a public coding-agent surface for scoped repo work
  • Anthropic documents stable Claude project context in CLAUDE.md and .claude/settings.json, which makes Claude a better fit for repeatable review against a repo contract
  • Google documents Gemini CLI as a scriptable non-interactive CLI surface, which makes explicit one-lane audit runs and contradiction scans practical

Prompt authorship by the builder CLI is therefore a GoingNinja convention derived from these primitives, not a vendor-mandated role. The builder does not invent the audit criteria from taste. It translates the human-signed repo contract such as PLAN.md, REQUIREMENTS.json, and the declared review lane into a machine-runnable prompt for the reviewer tools.

GoingNinja therefore does not let the auditor invent its own role on the fly. The role is assigned by the prompt author, and the auditor stays inside that lane.

3.2 One Prompt, One Lane

Each prompt audits one lane only:

  • UX and information architecture
  • content and documentation precision
  • systems and vendor-proof

Reason:

  • OpenAI recommends well-scoped tasks
  • Claude project memory already carries shared context
  • Gemini works best in explicit non-interactive runs when the goal is narrow

3.3 Exact File List

Every audit prompt must name the exact files under review.

That keeps the audit:

  • reproducible
  • diffable
  • compatible with headless CLI use

Without an explicit file list, the reviewer can silently read too much or too little.

3.4 Explicit Reviewer Role

Every prompt must name the reviewer lens:

  • information architect
  • technical documentation architect
  • systems engineer

This is a platform convention, not a vendor primitive. The reason is simple: it forces one clear evaluation standard per run.

3.5 Explicit Standards

Every prompt must say what “good” means.

Examples:

  • readable by professors
  • precise enough for senior engineers
  • credible for operators
  • not decorative

This prevents vague praise and vague criticism.

3.6 Explicit Failure Modes

Every prompt should name the concrete failure modes it cares about:

  • duplicated explanations
  • broken read path
  • weak diagrams
  • vendor claims stated too strongly
  • stale cross-links
  • claims stronger than proof
  • sticky navigation that only works in one viewport state
  • compact menus that stay open after navigation or hide the destination content
  • sticky header and sticky sidebar gaps caused by bad shared offsets
  • desktop menus that disappear after a compact-to-desktop resize
  • compact menu shells whose width or spacing drifts away from the shared shell contract

This follows the same logic as the rest of GoingNinja: review should be driven by observable failure modes, not by taste alone.

3.7 Fixed Output Shape

Every audit prompt must force a small deterministic output:

  • Verdict
  • Findings
  • What Works
  • one best rewrite or closure pass

This is not a vendor requirement. It is a platform rule that makes:

  • repeated audits comparable
  • closure easier
  • CI integration possible later

3.9 One Deterministic Workspace For System Lanes

If an audit lane spans multiple repos, the prompt must not point the reviewer CLI at a giant parent directory and hope the tool infers the right boundary.

For system lanes and other multi-repo audits, GoingNinja stages one copied workspace from the frozen snapshot first.

Why this rule exists:

  • headless CLIs often apply workspace boundaries, startup scans, or path safety checks
  • symlinked or sprawling parent volumes create false negatives that look like audit findings but are really execution-surface bugs
  • a copied staged tree gives Claude and Gemini the same deterministic path set

The platform path is:

npm --prefix /absolute/path/to/goingninja-os run audit:stage -- \
  --name wiki-system-audit \
  /absolute/path/to/goingninja-manual \
  /absolute/path/to/goingninja-os \
  /absolute/path/to/example-product

The resulting AUDIT_WORKSPACE.json becomes the machine-readable record of which copied repos and snapshot refs were actually reviewed.

3.8 The Auditor Does Not Rewrite The Contract

The prompt may ask Claude or Gemini to criticize the current contract, but it must not leave the reviewer free to redefine:

  • which files are in scope
  • which lane is active
  • whether the run is blocking or non-blocking
  • who owns merge authority

Those points belong to the platform contract, not to the auditor's mood in one run.

4. What We Avoid On Purpose

GoingNinja does not use these audit-prompt patterns:

  • one mega-prompt that mixes UX, content, systems, strategy, and marketing
  • prompts that omit the file list
  • prompts that ask for “thoughts” instead of findings
  • prompts that let the model choose its own output format
  • prompts that repeat stable repo rules instead of relying on AGENTS.md, CLAUDE.md, GEMINI.md, and settings

These anti-patterns produce long but low-signal output and make closure hard.

5. Current Prompt Contract

The active prompt family in this repo follows the rules above:

  • a current UX/UI surface audit prompt
  • a current systems surface audit prompt
  • a current UX closure prompt
  • a current content closure prompt

The canonical prompt files live under docs/audits/prompts/ in the repo. Older 2026-04-16 prompt files remain there as archived predecessors, not as the primary current naming family.

Each of them:

  • fixes the file list
  • fixes the reviewer role
  • fixes the output shape
  • avoids mixed audit goals

6. Headless And CI Constraints

Prompt quality is not only about wording. It also has to match the execution surface.

For headless audits, the prompt must be compatible with:

  • claude --print ... or the current non-interactive Claude Code equivalent documented by Anthropic for the CLI version in use
  • gemini -p ...

Google explicitly documents non-interactive gemini -p use in the official Gemini CLI repository. That matters because an audit prompt that only works interactively is not yet CI-ready.

Anthropic documents that stable project context belongs in CLAUDE.md and .claude/settings.json. That matters because repeating the full repo policy in every audit prompt makes headless runs heavier and more fragile than necessary.

For system lanes, headless compatibility also means the filesystem boundary must be deterministic. If the reviewer cannot traverse the same copied workspace that the prompt names, the execution surface is still wrong even if the prompt wording looks good.

7. Practical Rule For GoingNinja Operators

In practice, the working rule is:

  • the builder surface prepares and scopes the audit in the current proof
  • Claude and Gemini receive a defined reviewer role
  • the human stays the final merge owner

This keeps the audit deterministic and prevents “agent theater,” where one tool improvises builder, reviewer, and approver roles at the same time.

8. Closure Writeback

External audits are not the final memory of the system.

If a bug class repeats across audit rounds, the closure is incomplete until the lesson is written back into:

  • AGENTS.md
  • INVARIANTS.md
  • a reusable template
  • or a deterministic check

That is how GoingNinja keeps future repos from relearning the same lesson by pain.

9. Closure Mode And Audit Caps

Prompt quality is not only about wording. It is also about when to stop using a normal audit prompt and switch to a closure prompt.

GoingNinja now uses this rule:

  • if one audit lane returns passable repeatedly
  • and the remaining issues are medium/low contract points rather than a new architecture failure

then the next prompt must be a closure prompt, not just another ordinary review prompt.

That closure prompt must ask for:

  • the full remaining blocker list to reach strong
  • separation of required fixes from optional polish
  • an explicit answer on whether the current snapshot is already strong enough

Scope must be declared before the first external run:

  • single-surface: one runtime surface or one static builder, with no database/auth/provider-plane change.
  • multi-surface: two or three tightly coupled surfaces in one repo; shared route, asset, or security contracts are allowed.
  • system: anything larger, or anything with cross-repo/provider/data-plane change.

Audit caps then follow that declared scope:

  • single-surface: up to three normal audit rounds, then one closure-mode audit, then one confirmation audit after the closure fixes.
  • multi-surface: up to four normal audit rounds, then one closure-mode audit, then one confirmation audit after the closure fixes.
  • system: no fixed numeric cap, but two repeated passable verdicts in the same lane force a prompt/process rewrite before more runs.

If the applicable cap does not converge, the prompt/process contract must be rewritten before more runs are burned.

This is no longer prose alone. The active lane must be recorded in AUDIT_STATE.json with:

  • scope class
  • frozen snapshot ref
  • current mode
  • normal round count
  • closure round count
  • confirm round count
  • repeated passable streak for system lanes

That file is the mechanical exit path for capped audits.

After one complete Claude closure list on a fresh snapshot, the next Claude step should usually be a narrow confirm on the exact prior blockers, not another broad closure rerun. If the broad claude --print lane is silent for too long, reduce Claude effort on that narrow confirm instead of reopening a full audit lane.

10. Why This Matters

The failure mode is simple:

  • an ordinary audit prompt is optimized to find the next issue
  • a closure prompt is optimized to say what still truly blocks strong

If we keep using the first kind of prompt too long, the auditor can keep finding one more small point forever.

That is not rigor. That is a broken stopping condition.

11. Sources