Agent Skills, Production-grade Engineering Skills for AI Coding Agents

Zhongqi Zhao May 14, 2026 ai agent engineering tutorial

Project Introduction

https://github.com/addyosmani/agent-skills

Author Addy Osmani has long worked on developer experience and engineering efficiency. When AI coding agents emerged, he discovered a key problem: AI has the ability to write code, but lacks the discipline to write good code. So he encoded Google's years of accumulated engineering best practices (code review standards, testing pyramid, change management, etc., Software Engineering at Google and Google's engineering practices guide) into structured workflows that AI can follow—this is Agent Skills.

This project overlaps functionally with the Superpowers that Zirui Sheng discussed earlier, but the two projects have different positioning. Superpowers focuses more on workflow orchestration (how to organize work), while Agent Skills focuses more on engineering practices (how to do each step well).

Background Problems

Problems encountered in AI coding

Used a function that doesn't exist in the package
Changed file A without realizing file B depends on A, causing B to crash
Wrote code directly without clarifying requirements and boundaries first
Changed a dozen files at once, can't track what changed where
Wrote no tests, uncertain if changes are correct

What's the root cause of these problems

AI agent's "shortest path" problem

AI is trained to "give answers quickly." When you say "implement X feature," its default behavior is:

Receive requirement ──→ Write code directly ──→ "Done!"

But a reliable development process should be:

Receive requirement → Clarify requirement → Plan approach → Step-by-step implementation → Test & verify → Review quality → Commit

AI skips all the intermediate steps that make code reliable.

Even more dangerous, AI will make excuses for itself—

"This change is too simple, doesn't need testing."

"I'll add tests later."

"I already know how to do it, don't need to check documentation."

These excuses sound reasonable, but each one can lead to bugs.

Agent Skills: Installing Engineering Discipline in AI

Agent Skills is a set of carefully designed Markdown workflow files. When loaded into an agent's context, the AI must follow the process rather than taking the "shortest path." It gives AI a set of disciplines it must follow.

Three Core Mechanisms

Mechanism 1: Seven Slash Commands Form a Complete Development Lifecycle

Earlier we mentioned AI's default behavior is "receive requirement → write code directly → done." Agent Skills uses 7 slash commands to replace this "shortest path" with a complete engineering pipeline:

What you're doing	Command
① Define	/spec
② Plan	/plan
③ Build	/build
④ Verify	/test
⑤ Review	/review
⑥ Simplify	/code-simplify
⑦ Ship	/ship

Each command automatically calls one or more Skills (22 total), where each Skill has its own workflow.

These 7 commands are not 7 independent tools, but together form a complete development lifecycle. Going through them in order completes the entire process from "having an idea" to "code is deliverable":

Agent Skills Lifecycle

Mechanism 2: Rationalization Reversal Table

This is Agent Skills' most unique design. Each Skill has a built-in "excuses and rebuttals" table:

AI's common excuses	The reality is...
"This is too simple, doesn't need testing"	Simple things become complex. Testing is proof, not overhead
"I'll add tests later"	"Later" never comes. Write tests first, then implementation
"Skipping specs is faster"	Code written without thinking will be rewritten multiple times
"This is just a small change"	Small changes can introduce big bugs

When AI tries to skip steps, this table stops it.

Mechanism 3: Evidence Verification

"Seems right" is never enough. Each Skill requires AI to provide evidence at the end:

Screenshots of test passes
Build success logs
Actual code execution results

No evidence = not done.

A Case Study Using Several Core Skills

spec-driven-development — Think clearly before starting

Problem solved: AI starts writing code before understanding what you want, finishes 200 lines only to find the direction is wrong.

How it works: Forces AI to output a structured specification before writing the first line of code:

You say: "Help me write a module to run HF with PySCF, then convert results to quantum circuit initial state"

AI won't write code directly, but first responds:

Specification

▸ Goal: Use PySCF for RHF/UHF calculation, generate HF reference state needed for quantum computing (molecular orbital coefficients + occupation numbers)

▸ Input: Molecular coordinates (xyz or Z-matrix), basis set, charge/spin

▸ Output: MO coefficient matrix, 1e/2e integrals, Hartree-Fock energy, occupied orbital information

▸ Boundary conditions: How to handle open-shell? How to choose active space?

▸ Success criteria: H₂O/STO-3G HF energy matches literature value within < 1e-8 Hartree

My assumptions:
1. Quantum computing framework is Qiskit (not TensorCircuit)
2. Default Jordan-Wigner transformation to map to qubit
→ Is the above correct?

Value: Discover misunderstandings before AI starts working—for example, you're using TensorCircuit not Qiskit, or you need Parity not Jordan-Wigner. At this point, the cost of correction is almost zero.

planning-and-task-breakdown — Break large tasks into small pieces

Problem solved: When facing a complex task, AI tries to write everything at once, resulting in long messy code, and when problems arise, can't tell where the error is.

How it works: Breaks a large task into small, ordered steps, each with clear "completion criteria":

Without this Skill:               With this Skill:
─────────────────                 ─────────────────
Task: Write entire HF→quantum     Task 1: Build molecule object + run RHF
      state module                → Verify: H₂O energy matches literature
                                  Task 2: Extract MO coefficients and 1e/2e integrals
                                  → Verify: Reconstruct HF energy with integrals, error < 1e-8
                                  Task 3: Choose active space (frozen core)
                                  → Verify: Number of active orbitals = expected value
                                  Task 4: Fermion→qubit mapping (Jordan-Wigner)
                                  → Verify: Qubit Hamiltonian eigenvalue = FCI energy
                                  Task 5: Generate HF reference state quantum circuit
                                  → Verify: Circuit measurement expectation = HF energy

Key principles:

Vertical slicing: Not "finish all PySCF parts first, then all quantum circuit parts," but "run complete workflow for H₂ first, then extend to more complex molecules"
Verifiable each step: Each Task can be run and validated after completion, never in a "half-written can't run" state
Task size control: If a task modifies more than 5 files, it's too large and needs further breakdown

incremental-implementation — Do one thing at a time

Problem solved: AI writes 500 lines of code at once, modifies a dozen files, when a bug appears you can't tell which step introduced it.

How it works: Strictly follows the "implement one piece → test → verify → commit → next piece" cycle:

Implement

→

Test

→

Verify

→

Commit

↑ Next piece

Core rules:

Rule	Description
One thing at a time	One commit for one logical change, no mixing
Keep it runnable	After each change, code must run
Simplicity first	Write the simplest correct version first, optimize after tests pass
Don't expand scope	Only change what's needed for current task; note "nice to haves" but don't touch

Value: If the active space selection step has a problem, you know exactly it was introduced at that step—because the earlier RHF and integral extraction have already been verified and committed.

test-driven-development — Write tests before code

Problem solved: AI's code "looks right" but has no verification, you can't confirm if it's actually correct.

How it works: Forces Red-Green-Refactor flow—

Red Write a test It must fail	→	Green Write minimal code to pass	→	Refactor Clean up code Keep tests passing	→	Repeat
↓ Prove test really detects		↓ Code is correct		↓ Code is clean

Example:

# Red: Write test first, it will fail (function not implemented)
def test_rhf_energy():
    """H₂O/STO-3G RHF energy must match known value"""
    mol = build_molecule("O 0 0 0; H 0 0.757 0.587; H 0 -0.757 0.587",
                         basis="sto-3g")
    result = run_rhf(mol)
    assert abs(result.energy - (-74.942080)) < 1e-6  # Literature value
    assert result.mo_coeff.shape[0] == 7  # STO-3G: 7 basis functions
    assert result.n_occupied == 5  # H₂O: 5 occupied orbitals

# Green: Write minimal code to pass the test
def run_rhf(mol):
    # ... Implement RHF calculation with PySCF ...

# Refactor: Clean up code, tests still pass

Value: Every piece of code has accompanying tests. Run the tests to know if code is correct—no manual inspection needed.

debugging-and-error-recovery — Systematically solve bugs

Problem solved: When encountering errors, AI starts guessing blindly, "try this," "change that," getting messier and messier.

How it works: Forces five-step method, systematically finding root cause like an experiment:

① Reproduce
Stable
reproduction

→

② Locate
Find which
layer

→

③ Isolate
Minimal
case

→

④ Fix
Target root
cause

→

⑤ Prevent
Add tests
to prevent

Key discipline—Stop-the-Line:

When encountering an error, immediately stop writing new code. Don't think "skip this bug and continue with the rest"—errors accumulate. For example, if MO coefficient extraction has a bug, the subsequent active space selection and Jordan-Wigner transformation are all built on a wrong foundation.

Value: AI no longer blindly "tries and changes," but systematically solves problems with "control variables, step-by-step investigation" thinking.

code-review-and-quality — Review code from five dimensions

Problem solved: Code is done, "runs" doesn't mean "no problems." Self-review easily focuses only on whether logic is correct, missing security, performance, readability and other dimensions.

How it works: Forces review from five dimensions one by one, each finding marked with severity level:

Correctness
Logic OK?
Edges OK?

Readability
Can others
understand?

Architecture
Module division
reasonable?

Security
Any
vulnerabilities?

Performance
Any
bottlenecks?

Example: After AI reviews your PySCF module, it might give:

Severity	Finding
Critical	run_rhf() doesn't handle SCF non-convergence, will return wrong energy
Important	MO coefficient matrix has no shape validation, changing basis set causes silent error
Suggestion	build_molecule() function is 60 lines too long, suggest splitting coordinate parsing and basis set setup

Key design: Severity grading lets you know what must be changed, what can wait. Critical must be fixed, Suggestion can be noted for later—avoiding review becoming endless perfectionist changes.

shipping-and-launch — Three experts check in parallel, give go/no-go

Problem solved: Code passes review, but "can merge" doesn't mean "can deploy." One person can't see all risks.

How it works: Simultaneously dispatches three AI expert roles, each independently checks, finally merges opinions to give release decision:

Your code	→	Code reviewer Logic, readability, architecture	→	Merge ↓ GO / NO-GO
	→	Security auditor Vulnerabilities, permissions, input validation
	→	Test engineer Coverage, boundaries, omissions

Core principle: Any Critical finding = default NO-GO. Also requires rollback plan—if problems occur after deployment, how to revert.

How Seven Skills Work Together

spec
(think clearly)

→

plan
(break down)

→

build
(step by step)

→

test
(verify)

→

review
(five dimensions)

→

ship
(three experts)

↓
Bug encountered → debug (systematic investigation) → After fix, continue

Quick Start

Set up

Claude Code

Marketplace install:

/plugin marketplace add addyosmani/agent-skills
/plugin install agent-skills@addy-agent-skills

Gemini

Install from the repo:

gemini skills install https://github.com/addyosmani/agent-skills.git --path skills

Install from a local clone:

gemini skills install ./agent-skills/skills/

Others

Skills are plain Markdown - they work with any agent that accepts system prompts or instruction files.

Tell the agent the repository URL https://github.com/addyosmani/agent-skills, let the agent help install, or copy md files to the corresponding folder yourself.

Workflow

The repository contains a Skill called using-agent-skills. When we tell the agent to use agent-skills to complete a task, it will automatically call the included skills to complete the Development Lifecycle mentioned above, rather than having to use Slash commands for each step.

Summary

Without Agent Skills              With Agent Skills
─────────────────                 ─────────────────
Requirement → Write code → "Done"  Requirement → Spec → Plan → Code → Test → Review → Done
   ↓                                                                ↑
bugs, rework, unmaintainable                            Checkpoints and evidence at each step

← Back to all posts