The reflexive audit
There is no separate audit verb. The skill audits itself in three layers. Each layer catches something the others cannot.
Why three layers and not one
A single post-emit check would catch consistency violations after they had already landed on disk. A single pre-emit check would catch structural drift but miss the moments during extraction where the model was uncertain. A marker discipline alone would catch in-the-moment uncertainty but miss the cross-file consistency only a script can see.
The three layers cover three distinct failure modes. They are run in this order: in-flight markers accumulate during extraction, a pre-emit tick-list is pasted just before scaffold, and a deterministic script runs after persist.
Layer A · Pre-emit structural re-read
runs just before scripts/scaffold.sh in Phase 3
The agent re-reads the audit section of SKILL.md and the per-component contract, then pastes a six-point tick-list back into the conversation as visible chain-of-thought. The tick-list is the contract:
- All eight required sections. Every component file has public imports, when to use, key props, accessibility, composition examples, source references, common mistakes, things to never invent. Missing sections are not allowed. The explicit no-op bullet covers components with nothing distinctive to say.
- Every extracted rule cited. Each rule points at a
file:linesource, or carries a[VERIFY]marker, or has been dropped. Uncited rules without a marker are forbidden. - Scope clean. No copy, voice, or casing rules made it into the skill. Any copy rule surfaced during extraction was recorded in the discovery summary as a candidate for a sibling copy skill.
- Routing table rows resolve. Every row in the SKILL.md routing table points at a file the scaffolder is about to write. Phantom rows pointing at files that will not exist are the most common consistency bug. The pre-emit tick is the cheapest place to catch them.
- Slugs follow the registry. Every rule slug uses the pattern
component/<name>-<rule>,token/<name>-<rule>, orasset/<name>-<rule>. Hyphenated, namespaced, one slug per concept. - Examples lift from real files. Every file under
references/examples/corresponds to a realapp/**/page.tsxorcomponents/showcase/*.tsxin the reference project. If the reference project ships zero exemplars, the examples directory is omitted, not stubbed.
If any tick fails, the agent fixes the problem in scratch and re-runs the tick-list. The tick-list lives in the conversation so a human reviewing the transcript can verify the audit actually ran.
What this layer catches: structural drift before any file is written. Missing sections, phantom routing rows, untracked slug formats, fabricated example files.
Layer B · In-flight [VERIFY] markers
runs continuously during extraction in Phase 1 and Phase 2
The discipline is simple. The moment the agent extracts a rule it cannot fully back up with source code, it writes the literal token [VERIFY] inline at the point of extraction, followed by a one-line reason. The token never gets silently dropped. It never gets paraphrased ("probably", "seems to", "likely"). It is the literal seven-character string the post-emit script greps for.
A running tally surfaces twice:
- In the Phase 2 proof-point line. "K open
[VERIFY]markers" appears alongside the props-verified, tokens-resolved, and hallucinations counts. The list with file paths and reasons prints below. - In the Phase 3 closing message. Every remaining
[VERIFY]appears one last time so you see exactly what landed on disk unverified.
A [VERIFY] marker is not a defect. An undecided [VERIFY] at the end of Phase 3 is. Every marker needs a decision before the gate: accept as known limitation, re-read source, or drop the rule.
What this layer catches: uncertainty as it happens. The marker is cheap. A rule with a known gap is safer than a rule that pretends to be grounded.
Layer C · Post-emit consistency script
runs after Phase 3 writes complete
scripts/check-skill-docs.sh runs against the persisted skill on disk. Its stdout is the audit result. Exit code 0 means every assertion passed. Non-zero means at least one failed, the script names the failures, and the closing message does not declare success.
The assertions:
- Every routing-table row in SKILL.md resolves to a real file under
.claude/skills/<slug>/. - Every rule slug in the registry resolves to a citation in a reference file.
- Every component file ships all eight required sections.
- Every
[VERIFY]marker in any file is grep-counted and surfaced. design-craft.mdis a byte-for-byte copy of the meta-skill's original asset.TOKEN_COVERAGEstill passes against the lifted CSS imports.SHELL_INVARIANTSverifies that any wiring lifted in Phase 2 produced the matching Hard Rules and anti-patterns rows.- The component slate declared in SKILL.md cross-checks against the emitted component contract sections.
One more assertion runs through a sibling script: scripts/verify-citations.sh re-reads every file:line citation in every persisted file (including files carried over byte-for-byte from a previous run) and checks the cited line still states the claim. A drifted line number, an unsupported claim, or an unregistered cite fails the run with a precise reason.
What this layer catches: consistency violations a human or a model would not reliably see. Phantom routing rows, drifted citations, missing slugs, accidental design-craft regeneration, missing shell-invariant rules.
How the three layers cover each other
| Failure mode | Caught by |
|---|---|
| Agent forgets a required section in a component file | A (visible tick-list) + C (post-emit assertion) |
| Agent invents a prop the types file does not export | B (the agent flags it as [VERIFY]) + the Phase 2 typecheck |
| Routing table row points at a file scaffolder will not write | A (pre-emit) + C (post-emit if the agent skipped A) |
| Citation drifts a line or two from the text it anchors | C (verify-citations.sh re-reads every cite) |
design-craft.md gets accidentally regenerated | C (byte-diff against the original asset) |
| A copy rule slips in despite the scope guardrail | A (the agent re-reads the scope rule in the tick-list) |
| Carried-over file from a prior run has a stale citation | C (citations are re-checked on every run, no carry-over exemption) |
Why there is no separate audit verb
The rules in SKILL.md and the reference files are the rubric. A separate audit-ds-skill verb would be a second source of truth, with all the drift problems that come with it. The same rules that the meta-skill follows during extraction are the rules check-skill-docs.sh re-asserts after persist.
A future re-extract verb might need a standalone audit. For now, the three layers are the audit.
What to take away
- Layer A is structural. A six-point tick-list re-read just before scaffold catches drift before any file is written.
- Layer B is in-the-moment.
[VERIFY]markers go inline at the point of uncertainty. They are cheap, visible, and require a decision before the gate. - Layer C is mechanical. A script runs after persist and its exit code is the truth. It also re-reads every citation, so prose-only discipline cannot drift between runs.
- Together the layers cover three different failure shapes: structural drift, in-the-moment uncertainty, and cross-file consistency.
Primary source: .claude/skills/extract-ds-skill/SKILL.md ("Reflexive audit" section) and references/validate.md (citation-verification step).