When most people write their first Skill, they instinctively write it as a longer prompt.

Background, rules, caveats, examples, references — all stuffed into one SKILL.md. It looks complete. It usually isn’t.

The real value of a Skill is this:

When the user describes a certain kind of task, the agent recognizes the situation, loads the right workflow, picks the right tools, and finishes the job by a fixed method. And the next time a similar task comes along, the agent does it the same way, reliably.

A prompt is a one-off instruction.

A Skill is a reusable, self-triggering, maintainable, evolvable workflow.

1. What a Skill Is

1.1 How does a Skill relate to MCP?

Before getting into Skills, it helps to understand MCP.

MCP stands for Model Context Protocol. It’s a standard Anthropic designed to let Claude plug into external tools and data (Notion, Asana, your internal systems — all of that runs through MCP).

Skills and MCP are partners.

The Anthropic docs use a kitchen metaphor.

MCP gives you a “professional kitchen” — the tools, data, and call interfaces that connect to a service (an MCP server for Notion, Feishu, TickTick, WeChat Official Accounts, or Xiaohongshu, for example).

A Skill gives you the “recipe” — it tells Claude how to use those tools, step by step, to produce something useful.

MCP without a Skill: the user connects your MCP server but has no idea what to do next. Every conversation starts from scratch. Results are inconsistent.

A Skill without MCP: a Skill can also run on its own — generating documents, making images, organizing material. Anthropic’s own docx / pptx / xlsx skills are this kind.

Both together: the user can complete a real task right out of the gate. MCP decides what Claude can do; the Skill decides how Claude should do it.

1.2 Pin down the use case before you start

Before writing anything, answer four questions:

  • What is the user trying to accomplish?
  • What multi-step workflow does that involve?
  • Which tools are needed (Claude’s built-ins, or something from MCP)?
  • What domain knowledge or best practices should be baked in?

Anthropic recommends defining 2-3 concrete use cases when you start a Skill. For each one, spell out four things:

1
2
3
4
5
6
7
8
Use case: Weekly WeChat Official Account topic planning
Trigger: User says "help me pick next week's topics" or "what should I write this week"
Steps:
  1. Pull the last 7 days from the WeChat groups, X bookmarks, and Xiaohongshu viral hits the user follows
  2. Filter out anything off-brand for the user's account positioning (e.g. "AI tools + careers")
  3. Produce 5 candidate topics, each with a suggested title and angle
  4. Tag which one is most likely to go viral, which is easiest to write, and which the user hasn't covered before
Result: A list of 5 candidate topics the user can immediately pick from to start writing

Another example, this one from white-collar daily life:

1
2
3
4
5
6
7
8
Use case: Weekly status report
Trigger: User says "write my weekly report" or "summarize what I did this week"
Steps:
  1. Read this week's calendar, Notion task records, and email drafts
  2. Bucket items into "done / in progress / next week"
  3. Translate each item into language a manager cares about (outcome-focused, not action-focused)
  4. Output a weekly draft under 300 words, with 3 highlights from the week
Result: A weekly draft ready to send to your manager

If you can’t come up with 2-3 concrete use cases, what you’re describing is probably a one-off prompt, not a Skill.

1.3 Figure out which category your Skill falls into

Anthropic groups all Skills into three categories. Which one you’re building changes what you focus on.

Category 1: Document and resource creation.

For producing carefully formatted, high-quality output — documents, slide decks, spreadsheets, webpages, design files, code.

The canonical examples are the official docx / pptx / xlsx generation skills, plus the frontend-design skill.

What to focus on for this category: embed a style guide, use a template structure for consistency, and run a quality checklist before final delivery. You usually don’t need external tools — Claude’s built-in capabilities are enough.

Category 2: Workflow automation.

For multi-step workflows where the steps are fixed and there’s a real methodology to follow, often spanning multiple MCP servers.

The canonical example is the skill-creator skill (which walks the user step by step through use-case definition, frontmatter generation, instruction writing, and validation).

What to focus on for this category: gate every step with validation, give common structures a template, and document the rollback path when a step fails.

Category 3: MCP enhancement.

For attaching a “how to use this” manual to an MCP server you’ve already wired up. This category is mostly for MCP service providers writing skills for their own users —

a survey-tool company shipping a skill alongside their MCP that “automatically analyzes survey results and generates a chart report,”

a CRM company shipping one that “pulls customer data every week and generates a sales briefing.” If you’re not running an MCP service, this category is basically irrelevant to you — document generation (Category 1) and workflow automation (Category 2) are what most people will build.

Document generation skills live or die on the quality checklist. Workflow skills live or die on the seams between steps. MCP enhancement skills live or die on how well the domain knowledge is embedded. Decide which one you’re building before you start.

Now that we know what a Skill is, we can start building one.

2. Respect what makes a Skill a Skill: on-demand loading

A Skill is not the same as CLAUDE.md or a system prompt.

CLAUDE.md is more like persistent context. As long as you’re working in this project, it stays loaded and shapes the model’s behavior.

A slash command is a manual command. The user has to type the command — only then does the agent run the workflow.

A Skill sits between the two.

Its defining property is on-demand loading.

The full SKILL.md is not in context by default. It only loads when the user’s input matches the Skill’s description.

Two benefits:

  1. Saves context.
  2. Cuts down noise from rules that don’t apply to the current task.

But here’s the detail that matters:

The full Skill body isn’t persistent, but the description is — it stays in the matching pool the whole time.

Whether the description is well-written directly determines whether the Skill ever fires correctly.

2.1 The standard structure of a description

Anthropic gives a three-part formula:

[what this skill does] + [when to use it] + [key capabilities]

Three hard rules:

  • Stay under 1024 characters
  • Must include both “what it does” and “when to use it”
  • Must be written in the third person
1
2
3
4
✅ Processes Excel files and generates reports
✅ Breaks down viral Xiaohongshu notes and produces a reusable template
❌ I can help you process Excel files
❌ You can use this to break down Xiaohongshu viral hits

A side-by-side.

❌ Too vague:

1
description: Handles topic selection

This kind of description is almost guaranteed not to fire correctly — Claude sees the sentence and doesn’t know when to load it.

❌ Missing trigger phrases:

1
description: Generates well-structured multi-page WeChat Official Account posts

Says what it does, doesn’t say when to use it.

✅ What a good description looks like:

1
description: Breaks down the cover, headline, opening, structure, and keywords of a viral Xiaohongshu note and outputs a reusable template. Use this skill when the user says "break down this note," "analyze this viral hit," "Xiaohongshu breakdown," or pastes a Xiaohongshu link for Claude to learn from.

Says what it does (breaks down viral notes → outputs a template) and when to use it (the user says break down / analyze / pastes a Xiaohongshu link).

2.2 What to do when you have too many Skills

If you’ve installed a lot of Skills, or one Skill is unusually long and specialized but used rarely, you can selectively turn off its auto-loading.

Claude Code provides a dedicated frontmatter field (Claude Code only):

1
2
3
4
5
---
name: publish-post
description: One-click publish the current draft to WeChat Official Account, Zhihu, and Xiaohongshu
disable-model-invocation: true
---

With disable-model-invocation: true, Claude won’t auto-load this skill. It only fires when the user manually types /publish-post.

Why force publish-style actions to be manual? Because they have side effects — once it runs, it’s out the door, and you can’t undo it. You don’t want Claude to read your draft, decide “looks good enough,” and hit publish on its own.

Skills that are worth turning off auto-loading:

  • Year-end review Skill: used a few times a year
  • Resume rewrite Skill: only during a job search
  • Cover image Skill: only when publishing content
  • Course notes Skill: only when working through a specific course
  • Deploy / commit / send-message actions, anything with side effects (you don’t want Claude deciding when to fire those)

Why turn off auto-loading: every Skill’s description gets preloaded into Claude’s system prompt at startup and stays in the matching pool. More Skills, more descriptions, higher persistent matching cost. Anthropic’s own recommendation: if you have more than 20-50 Skills enabled at once, take a look at which ones you can turn off.

The cost of turning one off is that the user has to remember to summon it.

There’s a counterpart to disable-model-invocation called user-invocable: false — the opposite direction, which blocks the user from manually invoking the skill from the / menu and only lets Claude trigger it. That fits “background knowledge” skills (something like context about an old internal system at your company that Claude needs in relevant tasks, but the user typing the command directly makes no sense).

So the first thing to decide about a Skill isn’t what rules to write. It’s whether the Skill should auto-trigger or only fire on manual call.

3. Set the right tool boundaries for a Skill

A Skill can restrict which tools it’s allowed to use.

This isn’t required, but it’s useful.

Because different Skills need different permissions.

In Claude Code, for example:

1
2
3
4
allowed-tools:
  - Bash
  - Read
  - Grep

For organizing study material, this might be enough:

1
2
3
4
allowed-tools:
  - Read
  - Grep
  - Write

For document organization:

1
2
3
4
allowed-tools:
  - Read
  - Write
  - Grep

For cover image generation — read references, write a prompt, then call the image tool:

1
2
3
4
allowed-tools:
  - Read
  - Write
  - ImageGenerate

For batch material processing, you’d open up more permissions:

1
2
3
4
5
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash

The principle is simple: grant only the minimum permissions needed to do the job.

Don’t let a Skill that just generates suggestions modify files by default.

Don’t let a read-only analysis Skill execute arbitrary commands by default.

More tools doesn’t mean more capable. Sometimes it just means more risk.

3.1 allowed-tools can be scoped down to specific subcommands

A lot of people don’t realize allowed-tools can go finer than tool types — you can also restrict the specific call patterns.

For example, a skill that publishes articles, only allowed to run publish-related commands and nothing else:

1
2
# Only runs the publish script and image uploads — can't touch other files
allowed-tools: "Bash(python publish.py *) Bash(curl *) Read Write"

Or a data analysis skill, allowed to read files and run Python scripts, not allowed to modify anything:

1
2
# Read-only + Python only — no writing files, no other commands
allowed-tools: "Read Bash(python:*)"

This way, even if the Skill gets misused, it can’t run anything dangerous like rm.

If you don’t write code and your Skill doesn’t call scripts, this section doesn’t apply — skip ahead to the security rules.

3.2 Security rules

This section isn’t a suggestion. These are hard rules Anthropic spells out explicitly.

No XML angle brackets < > in YAML frontmatter. The reason: frontmatter ends up in Claude’s system prompt. If a description or other field contains something like <instructions>do XYZ</instructions>, it can get treated as prompt injection.

Skill names can’t be prefixed with “claude” or “anthropic”. Those prefixes are reserved by Anthropic.

The filename has to be exactly SKILL.md. Case-sensitive. SKILL.MD / skill.md / Skill.md will all fail to upload.

The folder name must be kebab-case. No spaces, no underscores, no uppercase letters. For example:

  • ✅ notion-project-setup
  • ❌ Notion Project Setup
  • ❌ notion_project_setup
  • ❌ NotionProjectSetup

Don’t put a README.md inside the Skill folder. Any documentation Claude should read goes in SKILL.md or references/. A human-facing README belongs at the root of the GitHub repo when you publish the skill — not inside the skill folder.

4. Configure the right model for the Skill

A Skill can also specify a model, which gets overlooked but is useful.

Different tasks place different demands on the model.

Writing documents, generating images, analyzing data, scraping, organizing study notes, generating long-form posts — these really shouldn’t all default to the same model.

A few examples:

  • Writing documents: needs clear expression and stable structure. Use a model strong at writing and summarization.
  • Image / design work: needs visual understanding and layout sense. Multimodal or design-strong models fit better.
  • Data analysis: needs reliable handling of tables and explanation of results. A solid reasoning model at lower cost works.
  • Scraping: often batch processing and extraction. Doesn’t need the strongest model — a cheap, fast one is a better fit.
  • Study notes: needs categorization, distillation, and preservation of meaning. A stable cheap model is fine.
  • Long-form X posts: needs structure, voice, pacing, and judgment about what’s worth saying. Use a model strong at writing.
  • Resume editing: needs to weigh job requirements against what to emphasize. Use a more careful model.

Some models are stronger but more expensive.

Some are cheaper and already good enough for simple tasks.

A mature Skill system shouldn’t default every task to the same model.

It should pick the right execution model based on the task type.

How to write it

Claude Code provides a model field (Claude Code only). The official docs put it this way:

Model to use when this skill is active. The override applies for the rest of the current turn and is not saved to settings; the session model resumes on your next prompt. Accepts the same values as /model, or inherit to keep the active model.

Two things worth noting:

  1. The switch only applies to the current turn. The next time the user sends a new prompt, it automatically reverts to the original session model. This is a temporary switch, not a permanent settings change.
  2. inherit is the default behavior — keep the current session model. If you only want certain skills to specifically use a stronger model, write the specific model name in those skills’ frontmatter; leave the others blank or set them to inherit.

How to write it. Heavy task with a strong model:

1
2
3
4
5
6
7
8
---
name: deep-post-analysis
description: Deep breakdown of viral Xiaohongshu notes — cover, copy, pacing, psychological hooks, all dimensions
model: claude-opus-4-5
---

# Deep Viral Breakdown
...

Simple task with a cheap model:

1
2
3
4
5
---
name: daily-topic-pool
description: Pull today's candidate topics from the WeChat Official Accounts and X bookmarks the user follows
model: claude-haiku-4-5
---

Unspecified (sticks with the current session model):

1
2
3
4
5
---
name: weekly-summary
description: Organize this week's WeChat Official Account back-end analytics into a review report
model: inherit
---

A quick note on the effort field

Paired with model is the effort field (also Claude Code only), which controls reasoning depth:

1
effort: high   # low / medium / high / xhigh / max

Simple tasks use low effort to save time and money; complex decisions use high effort to trade cost for accuracy. The available levels depend on the model — Haiku doesn’t support max; Opus does.

5. Progressive disclosure: keep SKILL.md small

Skill file loading is a three-tier system:

  • Tier 1 (Description): always loaded into Claude’s system prompt. So the description needs to carry enough information for Claude to decide “when to use this skill,” but not be too long.
  • Tier 2 (SKILL.md body): loaded when Claude decides the skill is relevant to the current task. Contains the full instructions.
  • Tier 3 (bundled files): extra files sitting in the skill directory. Claude only reads them when it actually needs to.

Once you understand the three tiers, you understand why SKILL.md shouldn’t carry everything.

Think of it as the entry point.

It should hold:

  • When to trigger.
  • The principles.
  • The execution steps.
  • Which tools are needed.
  • Where the rest of the material lives.
  • How to validate at the end.

Don’t dump every reference, long example, script, and template into it.

A better approach is layered:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
generate-cover-image/
  SKILL.md
  references/
    cover-image-style-guide.md
    common-platform-sizes.md
    good-vs-bad-examples.md
  scripts/
    check-image-dimensions.py
    export-multi-platform.py
  assets/
    cover-image-template.md
    fonts-and-color-examples.json

Different directories hold different things.

references/ holds long docs, style guides, platform sizes, detailed cases.

scripts/ holds executable code — check image dimensions, batch rename files, export multi-platform versions, format tables. Deterministic operations are more reliable as scripts than as model improv every time.

assets/ holds templates, schemas, images, sample files, output formats.

The benefit:

  • The agent doesn’t have to load everything every time.
  • It reads SKILL.md first to understand the overall flow. When it actually needs detail, it reads a reference or runs a script.

That’s Progressive Disclosure.

Read on demand. Execute on demand.

Don’t shove everything into context up front.

5.1 Why bother with progressive disclosure

Reason 1: save tokens (save money)

Every Claude conversation is billed by token.

Every Skill stuffing all its content into context = a few thousand extra tokens per conversation. One user calling it 100 times a day, that’s tens of thousands of calls a month — real money on the API bill.

The Anthropic docs put it directly: the point of progressive disclosure is “minimizing token usage while preserving expertise.” The three-tier system loads content only when Claude actually needs it, saving money at the source.

Reason 2: save context window space (preserve model performance)

This one’s more subtle but just as critical.

The context window has a cap — say, 200K tokens. Everyone shares that space:

  • System prompt
  • The last few dozen rounds of conversation history
  • Project rules in CLAUDE.md
  • All currently loaded Skills
  • The user’s prompt this round
  • Tool call return values

If every Skill stuffs detailed references, case libraries, and long instructions into context, three things happen:

First, it squeezes out conversation history — old conversation gets auto-compacted or dropped, and Claude forgets what was said earlier. This is most obvious in long conversations — you hit turn twenty and notice Claude has forgotten the key decisions from the first ten turns, often because the space got eaten by a stack of skills.

Second, it hurts focus — too much content and the model gets lost in long stretches of text. Research calls this “lost in the middle”: key instructions buried in the middle get ignored.

Third, it triggers auto-compaction — past a threshold, the system force-summarizes the earlier context, original details are lost, and model performance drops. Claude Code has a mechanism called auto-compaction specifically for this — once it fires, only the last 5000 tokens of each skill’s most recent invocation are kept; anything earlier is gone.

What the three-tier system does here: only “navigation info” lives in context for SKILL.md; the detailed material stays on disk for Claude to read on demand. Material that isn’t in context doesn’t take up space, and doesn’t drag down the model.

Saving money is visible. Saving context space isn’t — but in long conversations the second matters more.

5.2 How big should SKILL.md be?

Claude Code’s docs put it directly: “Keep SKILL.md under 500 lines.”

Not because 500 is a magic number, but because past that length you’ve usually mixed too many things together.

If a Skill is too long, split it in this order:

  1. Move long explanations to references/.
  2. Turn stable operations into scripts/.
  3. Move templates and samples into assets/.

If it’s still long after that, you might not have one Skill — you have several.

6. After writing the Skill, you still need to validate, score, and iterate

Writing the Skill doesn’t mean it works.

This part matters.

If you want a Skill to be reliable, run at least three kinds of validation:

  1. Does it run.
  2. Does it trigger correctly.
  3. Are the results actually better than not using the Skill.

The third is the easiest one to skip.

Many Skills just “look done.” But did it raise quality? Did it cut errors? Did it make the output match what the user wanted? Without an evaluation, nobody actually knows.

Claude Code’s Skill Creator gives a useful framing: don’t ship a Skill on vibes. Set up test data, run an evaluation, look at the results, iterate.

The core flow:

  1. Define the task the Skill solves.
  2. Write a first draft of the Skill.
  3. Prepare test data / test prompts.
  4. Run the Skill against those tests.
  5. Evaluate each result.
  6. Score each one.
  7. Modify the Skill based on the failures.
  8. Run another round.
  9. Repeat until the main tests hit the minimum acceptable score.

Think of it as writing tests for the Skill.

Not as rigid as unit tests, but the idea is similar: don’t trust the Skill file itself, look at how it performs on sample tasks.

6.1 Three ways to test

Anthropic recommends a three-layer testing setup:

  • Manual testing in Claude: just type and try it. Fastest, zero setup, good for early rapid iteration.
  • Scripted testing in Claude Code: automated test cases you can rerun after every change, good for the mid-stage when you need repeatable validation.
  • Programmatic testing through the Skills API: a full evaluation suite running against a fixed test set. Right for a mature Skill being shipped to thousands of users.

Which one you pick depends on the Skill’s reach. Personal use can stop at the first layer; team use needs at least the second; thousands of enterprise users needs the third.

6.2 Layer 1: validate that it runs

Test the most basic execution first.

For example:

  • Are the file paths right?
  • Are the tool permissions enough?
  • Does the script execute?
  • Can the references be read?
  • Are the assets templates used correctly?
  • Does the output match the expected format?
  • Are any required steps missing?

If a Skill can’t even complete a real task, talking about triggers and quality is pointless.

6.3 Layer 2: validate that it triggers correctly

For auto-loaded Skills, also test triggering.

You can’t just test one phrase:

1
Please run the X long-form post skill.

Real users don’t talk like that.

Prepare a set of phrases real users might actually use:

1
2
3
4
5
Expand this idea into a long thread
Write me a long X post
Can this be turned into longer content
Rewrite this in my voice
Give me a more attention-grabbing opening

For a cover image Skill, you might test:

1
2
3
4
Make me a cover image
Add an image to this article
Make a tech-style poster for this title
Make a Xiaohongshu cover

That’s trigger testing.

Each test should be tagged:

1
2
3
4
[
  {"query": "Expand this idea into a long thread", "should_trigger": true},
  {"query": "Check today's weather for me", "should_trigger": false}
]

If something that should trigger doesn’t, add keywords to the description.

If something that shouldn’t trigger does, narrow the description.

This step optimizes the Skill’s entry point.

If the entry is wrong, the body doesn’t matter.

6.4 Layer 3: prepare test data and evaluation cases

The bigger task is validating result quality.

You need test data.

That doesn’t mean the user has to hand-craft a pile of test material.

A more reasonable approach: have the agent generate a first batch of test data and evaluation cases based on the Skill’s purpose, and the user just reviews whether they match real scenarios.

For example:

  • Cover image Skill: 5 titles, 5 content types, a few reference images.
  • Long-form X Skill: 5 raw ideas, target audience, ideal final-draft style.
  • Study notes Skill: a few class notes, excerpts, OCR text from screenshots.
  • Resume rewrite Skill: different job descriptions, the original resume, ideal direction.
  • Weekly status report Skill: a week of scattered notes, meeting minutes, completed items.

Each test case should ideally include:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "id": 1,
  "prompt": "How a user would phrase this task",
  "files": ["test input file"],
  "expected_output": "What counts as a good result",
  "assertions": [
    "Must preserve the user's original point",
    "Must produce a strong opening",
    "Must not fabricate experiences the user didn't mention"
  ]
}

The format isn’t the point. The mindset is:

Define “what counts as good” up front.

Otherwise the evaluation becomes a vibe check after the fact.

6.5 Layer 4: scoring needs both quantitative and qualitative metrics

Anthropic recommends splitting success criteria into two kinds:

Quantitative metrics (things you can count directly):

  • Skill triggers on 90% of relevant queries
  • Tool calls to finish the workflow ≤ X
  • 0 failed API calls
  • Tokens consumed

Qualitative metrics (things you have to observe to judge):

  • After running, the user doesn’t have to follow up with “now what”
  • Same request run 3-5 times produces consistent structure and quality
  • A new user can finish the task on their first try without much guidance

After running the tests, don’t just write “looks fine.”

Score each case, say 0 to 10.

A simple rubric:

  • 0-2: didn’t complete the task, or wrong direction.
  • 3-4: tangentially related, missed key requirements.
  • 5-6: usable but with obvious problems.
  • 7-8: stable quality, minor details to fix.
  • 9-10: matches expectations, can serve as a reference output.

A reasonable bar: main test cases score at least 5.

If an important case scores below 5, the Skill isn’t reliable yet.

For a high-frequency Skill, that usually means it’s broken in a common scenario.

6.6 Layer 5: baseline comparison

If you want to be more rigorous, run a baseline comparison.

Same test, two runs:

  1. Without the Skill.
  2. With the Skill.

Then compare.

Claude Code’s Skill Creator evaluation includes something like this — A/B comparison, skill-enabled vs skill-disabled.

Worth doing.

A Skill shouldn’t just “run.” It should prove it’s useful.

If the no-Skill result is already 7 and the Skill version is also 7, the Skill isn’t adding much.

If no-Skill is 4 and with-Skill is 8 — that’s the Skill actually encoding useful experience.

6.7 Layer 6: fix the Skill based on the failures

Scoring isn’t decoration. Scoring tells you what to change.

Common fixes:

  • Trigger fails: edit the description, add real trigger phrases.
  • False trigger: narrow the description, add negative cases.
  • Missing steps: edit the workflow in SKILL.md.
  • Unstable output format: add an output template to assets/.
  • Unstable deterministic checks: write a script.
  • Reference info too long: split it into references/.
  • Tool permissions too tight: add allowed-tools.
  • Tool permissions too loose: tighten allowed-tools.
  • Model too weak: switch to a better-fit model.
  • Reasoning depth too shallow: bump up the effort level.

Then run the evaluation again.

That’s the Skill optimization loop:

1
2
3
4
5
6
7
8
Write Skill
→ Prepare test data
→ Run evaluation
→ Score each case
→ Find failure causes
→ Modify the Skill
→ Re-run evaluation
→ Hit the minimum bar, or at least know its limits

For a high-frequency, complex, or shared Skill, run this loop.

Otherwise it’s just an unvalidated prompt file.

6.8 Use the official skill-creator to scaffold and review

Anthropic ships a built-in Skill called skill-creator. You can install it from the Claude.ai plugin directory, or download it directly in Claude Code.

It can do a few things for you:

  • Generate a first draft of a skill from a natural-language description
  • Produce a correctly formatted SKILL.md (with frontmatter)
  • Suggest trigger phrases and structure
  • Review an existing skill and flag common issues (vague description, missing triggers, structural problems)
  • After you run tests and hit edge cases, fold that feedback back in for iteration

Calling it is simple:

1
Use the skill-creator skill to help me build a skill for [my use case]

It won’t run evaluations automatically and won’t produce quantitative reports, but it’ll save you from the most common from-scratch mistakes.

For beginners, strongly recommend using skill-creator for the first-pass skeleton, then editing yourself.

7. How do you share a Skill once it’s written?

Once a Skill is written, the next step is getting it into other people’s hands.

Anthropic’s currently recommended path:

Step 1: host it on GitHub

  • Use a public repo (if it’s an open-source skill)
  • Write a clear README at the repo root (this is for human readers — doesn’t conflict with the “no README.md in the skill folder” rule, since the human README sits alongside the skill folder, not inside it)
  • Include a few screenshots of it in use

Step 2: upload to Claude.ai

  • Zip the entire skill folder
  • Open Claude.ai → Settings → Capabilities → Skills → Upload skill
  • Pick the zip → enable → test

Step 3: organization-level distribution

Anthropic has shipped organization-level skill deployment — an admin can push a skill to the entire workspace in one shot. Every member gets it automatically, updates automatically, managed centrally. If your skill is for a team, this path is more reliable than asking everyone to upload on their own.

Step 4: programmatic use via API

For apps, agents, or automation pipelines:

  • Use the /v1/skills endpoint to list and manage skills
  • Add skills to Messages API requests via the container.skills parameter
  • Version-control them in the Claude console
  • Combine with the Claude Agent SDK to build custom agents

Which path is best depends on the situation.

7.1 When describing your Skill, focus on outcomes

When writing a README, marketing copy, or the skill intro in MCP docs, remember: users don’t care what you are, they care what result you give them.

❌ Weak:

The ProjectHub skill is a folder containing YAML frontmatter and Markdown instructions that calls our MCP server’s tools.

✅ Strong:

The ProjectHub skill lets a team spin up a complete project workspace — pages, databases, templates — in 30 seconds, instead of 30 minutes of manual setup.

The first version describes the file structure (user doesn’t care). The second describes the time saved (user gets it instantly).

8. What does a good Skill look like?

What makes a good Skill?

A few things:

  1. It triggers when it should.
  2. It stays quiet when it shouldn’t.
  3. Tool permissions are enough, not excessive.
  4. It uses the right model and reasoning depth.
  5. SKILL.md is short enough (under 500 lines), with detailed material loaded on demand.
  6. Important Skills come with test data, an evaluation, failure cases, and iteration.

So when writing a Skill, don’t only ask:

“What do I tell the model?”

Also ask:

  • What user phrases should make the model invoke this Skill?
  • Once it fires, what workflow should it follow?
  • Which tools is it allowed to use, and which is it not?
  • Which model is best for it?
  • What content stays in the main file, and what gets split out?
  • How do I prove it actually helps?
  • How do other people install it, upgrade it, and recover when something breaks?

That’s the real difference between a Skill and a prompt.

Appendix A: SKILL.md starter template

Copy and adapt. Note: YAML field names (name / description / model, etc.) are hard requirements of the Skill system and must stay in English. But the field values, Markdown body, and examples can all be in your own language.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
name: weekly-topic-planner
description: Plans next week's WeChat Official Account topics for a content creator. Use when the user says "pick next week's topics," "what should I write this week," or "help me figure out what to publish next week."
---

# Weekly WeChat Official Account Topic Planning

## Steps

### Step 1: Gather material
Find hot topics from the last 7 days across the WeChat groups, X bookmarks, and viral Xiaohongshu notes the user follows.

### Step 2: Filter by account positioning
Read the account positioning in `references/account-positioning.md` and filter out anything off-brand.

### Step 3: Produce the topic list
Output 5 candidate topics, each with:
- Suggested title
- Angle
- Estimated viral probability
- Writing difficulty

(Add more steps as needed.)

## Examples

### Example 1: typical case
User says: "Help me pick next week's topics"
Steps:
1. Pull viral hits from the last 7 days (from the material pool)
2. Filter out off-brand topics by account positioning
3. Output a 5-topic candidate table
Result: 5 candidate topics, ready for the user to pick one and start writing

## Troubleshooting

### Error: can't pull material
**Cause:** source link broken or network down
**Fix:** check the links in `references/sources.md`, update broken sources

Appendix B: full Skill YAML field list

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
---
name: weekly-topic-planner
description: Plans next week's WeChat Official Account topics for a content creator. Use when the user says "pick next week's topics" or "what should I write this week."
license: MIT
allowed-tools: "Bash(python:*) Bash(npm:*) WebFetch"
model: claude-opus-4-5
effort: high
metadata:
  author: Zhang Da
  version: 1.0.0
  mcp-server: wechat-mp-server
  category: content-creator
---

A note for cross-platform skills: if you want the skill to also run on Codex or Hermes, stick to the universal fields.

Claude Code-only fields are ignored on other platforms.

Appendix C: quick checklist

Before you start

  • Defined 2-3 concrete use cases
  • Decided which tools to use (Claude’s built-ins, or MCP)
  • Read through a few example skills
  • Mapped out the folder structure

While writing

  • Folder name is kebab-case
  • SKILL.md filename is exactly right (case included)
  • YAML frontmatter has — delimiters
  • name field: kebab-case, no spaces, no uppercase
  • description includes both “what it does” and “when to use it”
  • No XML angle brackets < > anywhere
  • Instructions are clear and actionable
  • Error handling is included
  • Examples are provided
  • References are linked cleanly
  • SKILL.md is under 500 lines

Before uploading

  • Tested triggering for obvious tasks
  • Tested triggering for paraphrased requests
  • Verified it doesn’t trigger on unrelated topics
  • Functional tests pass
  • Tool integrations work (if any)
  • Zipped up

After uploading

  • Monitor for under-triggering and over-triggering
  • Collect user feedback
  • Iterate on description and instructions based on feedback

A prompt solves this one conversation.

A Skill solves the next hundred similar tasks.

It encodes experience into a triggerable, testable, distributable workflow.