Building an internal agent: Adding support for Agent Skills
7 mins read

Building an internal agent: Adding support for Agent Skills


When Anthropic introduced Agent Skills,
I was initially a bit skeptical of the problem they solved–can we just use prompts and tools?–but I’ve subsequently
come to appreciate them, and have explicitly implemented skills in our internal agent framework.
This post talks about the problem skills solves, how the engineering team at Imprint implemented them,
how well they’ve worked for us, and where we might work with them next.

This is part of the Building an internal agent series.

What problem do Agent Skills solve?

Agent Skills are a series of techniques that solve three important workflow problems:

  1. use progressive disclosure to more effectively utilize the constrained context windows
  2. minimize conflicting or unnecessary context in the context window
  3. provide reusable snippets for solving recurring problems to avoid individual workflow-creators having to solve
    recurring problems like e.g. Slack formatting or dealing with large files

All three of these problems initially seemed very insignificant when we started building out our internal workflows,
but once the number of internal workflows reached into the dozens, both become difficult to manage.
Without reusable snippets, I lost the leverage to improve all workflows at once, and without progressive disclosure
the agents would get a vast amount of irrelevant content that could confuse them, particularly when it came to things
like inconsistencies between Markdown and slack’s mrkdwn formatting language, both of which are important to different
tools used by our workflows.

How we implemented Agent Skills

As a disclaimer, I recognize that it’s not necessary to implement agent skills,
as you can integrate with e.g. Claude’s Agent Skills support for APIs.
However, one of our design decisions is being largely platform agnostic, such that we can switch
across model providers, and consequently we decided to implement skills within our framework.

With that out of the way, we started implementing by reviewing
the Agent Skills documentation at agentskills.io,
and cloning their Python reference implementation skills-ref
into our repository to make it accessible to Claude Code.

The resulting implementation has these core features:

  1. Skills are in skills/ repository, with each skill consisting of its own sub-directory
    with a SKILL.md

  2. Each skill is a Markdown file with metadata along these lines:

    ---
    name: pdf-processing
    description: Extract text and tables...
    metadata:
      author: example-org
      version: "1.0"
    ---
    
  3. The list of available skills–including their description from metadata–is injected into the system prompt at the beginning of each workflow,
    and the load_skills tool is available to the agent to load the entire file into the context window.

  4. Updated workflow configuration to optionally specify required, allowed, and prohibited skills to
    modify the list of exposed skills injected into the system prompt.

    My guess is that requiring specific skills for a given workflow is a bit of an anti-pattern, “just let the agent decide!”,
    but it was trivial to implement and the sort of thing that I could imagine is useful in the future.

  5. Used the Notion MCP to retrieve all the existing prompts in our prompt repository,
    identify existing implicit skills in the prompts we had created, write those initial
    skills, and identify which Notion prompts to edit to eliminate the now redundant sections
    of their prompts.

Then we shipped it into production.

How they’ve worked

Humans make mistakes all the time. For example, I’ve seen many dozens of JIRA tickets from
humans that don’t explain the actual problem they are having. People are used to that,
and when a human makes a mistake, they blame the human.
However, when agents make a mistake, a surprising percentage of people view it as a fundamental limitation of agents
as a category, rather than thinking that, “Oh, I should go update that prompt.”

Skills have been extremely helpful as the tool to continue refining down these edge cases
where we’ve relied on implicit behavior because specifying the exact behavior was simply overwhelming.
As one example, we ask that every Slack message end with a link to the prompt that drove the
response. That always worked, but the details of the formatting would vary in an annoying, distracting
way: sometimes it would be the equivalent of Building an internal agent: Adding support for Agent Skills(link), sometimes link, sometimes https://lethain.com/agents-skills/(link).
With skills, it is now (almost always) consistent, without anyone thinking to include those instructions
in their workflow prompts.

Similarly, handling large files requires a series of different tools that benefit from
In-Context Learning (aka ICL, which is a fancy term for including a handful of examples of correct and incorrect usage),
which absolutely no one is going to add to their workflow prompt but is extremely effective
at improving how the workflow uses those tools.

For something that I was initially deeply skeptical about, I now wish I had implemented skills much earlier.

Where we might go next

While our skills implementation is working well today, there are a few
opportunities I’d like to take advantage of in the future:

  1. Add a load_subskill skill to support files in skills/{skill}/* beyond the SKILL.md.
    So far, this hasn’t been a major blocker, but as some skills get more sophisticated,
    the ability to split varied use-cases into distinct files would improve our ability
    to use skills for progressive disclosure

  2. One significant advantage that Anthropic has over us is their sandboxed Python interpreter,
    which allows skills to include entire Python scripts to be specified and run by tools.
    For example, a script for parsing PDFs might be included in a skill, which is extremely handy.
    We don’t currently have a sandboxed interpreter handy for our agents,
    but this could, in theory anyway, significantly cut down on the number of custom skills
    we need to implement.

    At a minimum, it would do a much better job at operations that require reliable math
    versus relying on the LLM to do its best at performing math-y operations.

I think both of these are actually pretty straightforward to implement.
The first is just a simple feature that Claude could implement in a few minutes.
The latter feels annoying to implement, but could also be implemented in less than
an hour by running a second lambda running Nodejs with Pyodide,
and exposing access to that lambda as a tool. It’s just so inelegant for a Python process
to call a Nodejs process to run sandboxed Python that I haven’t done it quite yet.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *