Agent Skills, Production-grade Engineering Skills for AI Coding Agents
Project Introduction
https://github.com/addyosmani/agent-skills
Author Addy Osmani has long worked on developer experience and engineering efficiency. When AI coding agents emerged, he discovered a key problem: AI has the ability to write code, but lacks the discipline to write good code. So he encoded Google's years of accumulated engineering best practices (code review standards, testing pyramid, change management, etc., Software Engineering at Google and Google's engineering practices guide) into structured workflows that AI can follow—this is Agent Skills.
This project overlaps functionally with the Superpowers that Zirui Sheng discussed earlier, but the two projects have different positioning. Superpowers focuses more on workflow orchestration (how to organize work), while Agent Skills focuses more on engineering practices (how to do each step well).
Background Problems
Problems encountered in AI coding
- Used a function that doesn't exist in the package
- Changed file A without realizing file B depends on A, causing B to crash
- Wrote code directly without clarifying requirements and boundaries first
- Changed a dozen files at once, can't track what changed where
- Wrote no tests, uncertain if changes are correct
What's the root cause of these problems
AI agent's "shortest path" problem
AI is trained to "give answers quickly." When you say "implement X feature," its default behavior is:
Receive requirement ──→ Write code directly ──→ "Done!"
But a reliable development process should be:
Receive requirement → Clarify requirement → Plan approach → Step-by-step implementation → Test & verify → Review quality → Commit
AI skips all the intermediate steps that make code reliable.
Even more dangerous, AI will make excuses for itself—
"This change is too simple, doesn't need testing."
"I'll add tests later."
"I already know how to do it, don't need to check documentation."
These excuses sound reasonable, but each one can lead to bugs.
Agent Skills: Installing Engineering Discipline in AI
Agent Skills is a set of carefully designed Markdown workflow files. When loaded into an agent's context, the AI must follow the process rather than taking the "shortest path." It gives AI a set of disciplines it must follow.
Three Core Mechanisms
Mechanism 1: Seven Slash Commands Form a Complete Development Lifecycle
Earlier we mentioned AI's default behavior is "receive requirement → write code directly → done." Agent Skills uses 7 slash commands to replace this "shortest path" with a complete engineering pipeline:
| What you're doing | Command |
|---|---|
| ① Define | /spec |
| ② Plan | /plan |
| ③ Build | /build |
| ④ Verify | /test |
| ⑤ Review | /review |
| ⑥ Simplify | /code-simplify |
| ⑦ Ship | /ship |
Each command automatically calls one or more Skills (22 total), where each Skill has its own workflow.
These 7 commands are not 7 independent tools, but together form a complete development lifecycle. Going through them in order completes the entire process from "having an idea" to "code is deliverable":

Mechanism 2: Rationalization Reversal Table
This is Agent Skills' most unique design. Each Skill has a built-in "excuses and rebuttals" table:
| AI's common excuses | The reality is... |
|---|---|
| "This is too simple, doesn't need testing" | Simple things become complex. Testing is proof, not overhead |
| "I'll add tests later" | "Later" never comes. Write tests first, then implementation |
| "Skipping specs is faster" | Code written without thinking will be rewritten multiple times |
| "This is just a small change" | Small changes can introduce big bugs |
When AI tries to skip steps, this table stops it.
Mechanism 3: Evidence Verification
"Seems right" is never enough. Each Skill requires AI to provide evidence at the end:
- Screenshots of test passes
- Build success logs
- Actual code execution results
No evidence = not done.
A Case Study Using Several Core Skills
spec-driven-development — Think clearly before starting
Problem solved: AI starts writing code before understanding what you want, finishes 200 lines only to find the direction is wrong.
How it works: Forces AI to output a structured specification before writing the first line of code:
You say: "Help me write a module to run HF with PySCF, then convert results to quantum circuit initial state"
AI won't write code directly, but first responds:
| Specification |
|
▸ Goal: Use PySCF for RHF/UHF calculation, generate HF reference state needed for quantum computing (molecular orbital coefficients + occupation numbers) ▸ Input: Molecular coordinates (xyz or Z-matrix), basis set, charge/spin ▸ Output: MO coefficient matrix, 1e/2e integrals, Hartree-Fock energy, occupied orbital information ▸ Boundary conditions: How to handle open-shell? How to choose active space? ▸ Success criteria: H₂O/STO-3G HF energy matches literature value within < 1e-8 Hartree My assumptions: 1. Quantum computing framework is Qiskit (not TensorCircuit) 2. Default Jordan-Wigner transformation to map to qubit → Is the above correct? |
Value: Discover misunderstandings before AI starts working—for example, you're using TensorCircuit not Qiskit, or you need Parity not Jordan-Wigner. At this point, the cost of correction is almost zero.
planning-and-task-breakdown — Break large tasks into small pieces
Problem solved: When facing a complex task, AI tries to write everything at once, resulting in long messy code, and when problems arise, can't tell where the error is.
How it works: Breaks a large task into small, ordered steps, each with clear "completion criteria":
Without this Skill: With this Skill:
───────────────── ─────────────────
Task: Write entire HF→quantum Task 1: Build molecule object + run RHF
state module → Verify: H₂O energy matches literature
Task 2: Extract MO coefficients and 1e/2e integrals
→ Verify: Reconstruct HF energy with integrals, error < 1e-8
Task 3: Choose active space (frozen core)
→ Verify: Number of active orbitals = expected value
Task 4: Fermion→qubit mapping (Jordan-Wigner)
→ Verify: Qubit Hamiltonian eigenvalue = FCI energy
Task 5: Generate HF reference state quantum circuit
→ Verify: Circuit measurement expectation = HF energy
Key principles:
- Vertical slicing: Not "finish all PySCF parts first, then all quantum circuit parts," but "run complete workflow for H₂ first, then extend to more complex molecules"
- Verifiable each step: Each Task can be run and validated after completion, never in a "half-written can't run" state
- Task size control: If a task modifies more than 5 files, it's too large and needs further breakdown
incremental-implementation — Do one thing at a time
Problem solved: AI writes 500 lines of code at once, modifies a dozen files, when a bug appears you can't tell which step introduced it.
How it works: Strictly follows the "implement one piece → test → verify → commit → next piece" cycle:
| Implement | → | Test | → | Verify | → | Commit |
↑ Next piece
Core rules:
| Rule | Description |
|---|---|
| One thing at a time | One commit for one logical change, no mixing |
| Keep it runnable | After each change, code must run |
| Simplicity first | Write the simplest correct version first, optimize after tests pass |
| Don't expand scope | Only change what's needed for current task; note "nice to haves" but don't touch |
Value: If the active space selection step has a problem, you know exactly it was introduced at that step—because the earlier RHF and integral extraction have already been verified and committed.
test-driven-development — Write tests before code
Problem solved: AI's code "looks right" but has no verification, you can't confirm if it's actually correct.
How it works: Forces Red-Green-Refactor flow—
| Red Write a test It must fail |
→ | Green Write minimal code to pass |
→ | Refactor Clean up code Keep tests passing |
→ | Repeat |
| ↓ Prove test really detects |
↓ Code is correct |
↓ Code is clean |
Example:
# Red: Write test first, it will fail (function not implemented)
def test_rhf_energy():
"""H₂O/STO-3G RHF energy must match known value"""
mol = build_molecule("O 0 0 0; H 0 0.757 0.587; H 0 -0.757 0.587",
basis="sto-3g")
result = run_rhf(mol)
assert abs(result.energy - (-74.942080)) < 1e-6 # Literature value
assert result.mo_coeff.shape[0] == 7 # STO-3G: 7 basis functions
assert result.n_occupied == 5 # H₂O: 5 occupied orbitals
# Green: Write minimal code to pass the test
def run_rhf(mol):
# ... Implement RHF calculation with PySCF ...
# Refactor: Clean up code, tests still pass
Value: Every piece of code has accompanying tests. Run the tests to know if code is correct—no manual inspection needed.
debugging-and-error-recovery — Systematically solve bugs
Problem solved: When encountering errors, AI starts guessing blindly, "try this," "change that," getting messier and messier.
How it works: Forces five-step method, systematically finding root cause like an experiment:
| ① Reproduce Stable reproduction |
→ | ② Locate Find which layer |
→ | ③ Isolate Minimal case |
→ | ④ Fix Target root cause |
→ | ⑤ Prevent Add tests to prevent |
Key discipline—Stop-the-Line:
When encountering an error, immediately stop writing new code. Don't think "skip this bug and continue with the rest"—errors accumulate. For example, if MO coefficient extraction has a bug, the subsequent active space selection and Jordan-Wigner transformation are all built on a wrong foundation.
Value: AI no longer blindly "tries and changes," but systematically solves problems with "control variables, step-by-step investigation" thinking.
code-review-and-quality — Review code from five dimensions
Problem solved: Code is done, "runs" doesn't mean "no problems." Self-review easily focuses only on whether logic is correct, missing security, performance, readability and other dimensions.
How it works: Forces review from five dimensions one by one, each finding marked with severity level:
| Correctness Logic OK? Edges OK? |
Readability Can others understand? |
Architecture Module division reasonable? |
Security Any vulnerabilities? |
Performance Any bottlenecks? |
Example: After AI reviews your PySCF module, it might give:
| Severity | Finding |
|---|---|
| Critical | run_rhf() doesn't handle SCF non-convergence, will return wrong energy |
| Important | MO coefficient matrix has no shape validation, changing basis set causes silent error |
| Suggestion | build_molecule() function is 60 lines too long, suggest splitting coordinate parsing and basis set setup |
Key design: Severity grading lets you know what must be changed, what can wait. Critical must be fixed, Suggestion can be noted for later—avoiding review becoming endless perfectionist changes.
shipping-and-launch — Three experts check in parallel, give go/no-go
Problem solved: Code passes review, but "can merge" doesn't mean "can deploy." One person can't see all risks.
How it works: Simultaneously dispatches three AI expert roles, each independently checks, finally merges opinions to give release decision:
| Your code | → | Code reviewer Logic, readability, architecture |
→ | Merge ↓ GO / NO-GO |
| → | Security auditor Vulnerabilities, permissions, input validation |
|||
| → | Test engineer Coverage, boundaries, omissions |
Core principle: Any Critical finding = default NO-GO. Also requires rollback plan—if problems occur after deployment, how to revert.
How Seven Skills Work Together
| spec (think clearly) |
→ | plan (break down) |
→ | build (step by step) |
→ | test (verify) |
→ | review (five dimensions) |
→ | ship (three experts) |
↓
Bug encountered → debug (systematic investigation) → After fix, continue
Quick Start
Set up
Claude Code
Marketplace install:
/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skills
Gemini
Install from the repo:
gemini skills install https://github.com/addyosmani/agent-skills.git --path skills
Install from a local clone:
gemini skills install ./agent-skills/skills/
Others
Skills are plain Markdown - they work with any agent that accepts system prompts or instruction files.
Tell the agent the repository URL https://github.com/addyosmani/agent-skills, let the agent help install, or copy md files to the corresponding folder yourself.
Workflow
The repository contains a Skill called using-agent-skills. When we tell the agent to use agent-skills to complete a task, it will automatically call the included skills to complete the Development Lifecycle mentioned above, rather than having to use Slash commands for each step.
Summary
Without Agent Skills With Agent Skills
───────────────── ─────────────────
Requirement → Write code → "Done" Requirement → Spec → Plan → Code → Test → Review → Done
↓ ↑
bugs, rework, unmaintainable Checkpoints and evidence at each step