Generate

This is the hands-on. You build the merged-PR theater (the most satisfying moment on GitHub, the seconds where a pull request goes from waiting to Merged) with the primer-react skill loaded. One prompt, real Primer React components, about 13 minutes. That's the time the stage teaching fills while your run churns.

The skill is not a component cheat sheet. It routes the agent to the installed package's types, the real design tokens, and the system's own rules, and it makes the agent check its work before it finishes. What you're testing is the quality of what comes out: state, color, density, and timing.

Run it

Launch in fast mode. Start Claude with permissions skipped so the run is hands-off (Opus, fast mode):
claude --dangerously-skip-permissions
Run the one prompt. Paste this into the session:
/primer-react implement prompts/pr-merged-switch-dark-mode.md
Then eyes up: let it churn. No stepping in, no follow-ups. The agent reads the prompt, loads the skill's references, builds the sequence, and checks it in both color modes before it finishes. Work on main. Don't push.
Watch it land. The agent starts the dev server itself. If it didn't, run pnpm dev and open the printed URL. Reload the page and watch the sequence play.

What to watch

Checks resolve: they clear one by one over about six seconds.
The merge stays locked: while checks run, merging is genuinely unavailable. A keyboard user can't trigger it, and a screen reader doesn't announce it as actionable. It's not just greyed out.
The "ready" cue: once the checks pass, the box signals it can act.
The box opens: the merge box appears and you can act on it.
The two flips: the light↔dark color-mode toggle flips the whole page, including the background; and the capsule flips from green Open to purple Merged. Both come from the design system's own tokens and motion, not hex codes and magic numbers.

Try a second model

Run it once first. If you have time, run it again with a different model or effort and compare density, restraint, and how each one stages the motion. Same skill, same prompt: the model is the only variable.

The standard comparison is the Evidence on the big screen: the same prompt and skill on Sonnet versus Opus. The skill holds the quality floor; the model changes the ceiling. Across models, it's about quality, not cost.

A tiny note if it comes up: the floor run costs about $2, the ceiling $5–14. That's a side note, not the point.

When the run finishes, the agent prints a report about its own work. Check Your Output explains how to read it.

Next: Check Your Output