Skill Quality Playbook

This playbook captures practical quality rules for Claude Arsenal skills. It is intended to complement Skill Format Policy: format policy says where files live, while this playbook says what makes a skill useful after the model discovers it.

Source context: Anthropic's June 2026 post, "Lessons from building Claude Code: How we use skills", describes skills as folders of instructions, scripts, and resources, not just markdown files. It also highlights trigger descriptions, gotchas, progressive disclosure, verification scripts, setup state, persistent memory, hooks, distribution, composition, and usage measurement.

Skill Types

Pick one primary type before writing or expanding a skill. Skills that straddle several types tend to confuse the model.

Type	Best use
Library and API reference	Correct use of a library, CLI, SDK, internal platform, or common footguns.
Product verification	Browser, CLI, tmux, API, or state checks that prove the product works.
Data fetching and analysis	Canonical queries, dashboards, metric names, and analysis helpers.
Business process automation	Repeatable team workflow with required fields, previous-state logs, or handoffs.
Code scaffolding and templates	Boilerplate where natural-language requirements still matter.
Code quality and review	Deterministic style, review, testing, or policy checks.
CI/CD and deployment	Build, smoke test, deploy, monitor, rollback, or PR babysitting.
Runbooks	Symptom-driven investigation across tools that returns a structured report.
Infrastructure operations	Guarded maintenance workflows, especially destructive or costly operations.

Quality Bar

Every new or materially changed skill should satisfy these checks:

Trigger description: description is written for the model, not humans. It says when to use the skill, includes user phrasing or symptoms, and names near boundaries when mis-triggering is likely.
Non-obvious value: SKILL.md should not restate what the model can infer by reading the repo or general documentation. Put scarce knowledge first.
Gotchas: Mature skills include real failure modes, naming mismatches, false-success signals, risky defaults, or common recovery paths.
Progressive disclosure: Keep SKILL.md as the router. Move detailed API tables, long examples, templates, scripts, evals, and assets into support directories and reference them from SKILL.md.
Verification: Workflow skills should include checks that prove completion: scripts, assertions, smoke tests, browser checks, health checks, or explicit done-when signals.
Setup state: Skills that need user or environment context should define a config file or setup flow instead of asking repeatedly.
Memory: Repeated business workflows may store append-only logs, JSON, or a small database in the plugin data directory when history changes the next run.
Hooks: Use on-demand hooks only for cases where temporary guardrails are valuable, such as production operations or frozen edit scopes.

Review Workflow

Use the lightweight audit before opening a PR:

python3 scripts/audit_skill_quality.py

Scope it to a changed skill when working incrementally:

python3 scripts/audit_skill_quality.py skill-creator

Use warnings as review prompts, not as automatic blockers. A warning can be acceptable when the skill is intentionally tiny, subjective, or still in manual validation. For release gates, opt in explicitly:

python3 scripts/audit_skill_quality.py --fail-on-warn

The existing registry validation remains the required correctness gate:

python3 scripts/validate_skills.py --check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skill Quality Playbook

Skill Types

Quality Bar

Review Workflow

FilesExpand file tree

skill-quality-playbook.md

Latest commit

History

skill-quality-playbook.md

File metadata and controls

Skill Quality Playbook

Skill Types

Quality Bar

Review Workflow