What You Reviewed Is Not What Runs

Securing the Agentic Era — Edition 1

Welcome to the first edition of Securing the Agentic Era. Once a month I take one real development in AI and security and work through what it actually changes for the people building and shipping software. This first one starts where the thesis does: with something that passed every check and still went wrong.

A safe-rated AI skill, 26,000 users, and the security assumption that just quietly died.

Last week a research team did something that should bother every security and product leader shipping agentic features right now.

They built an AI agent skill called brand-landingpage, dressed up as a helper for building landing pages with Google's Stitch design tool. They chose that use case on purpose, because it appeals to the least technical people in any company: marketers, salespeople, designers. Then they did three unremarkable things. They submitted it to a popular open-source skills repository with around 36,000 GitHub stars, and the pull request was merged in a few days. They ran it through the leading skill scanners, including ones from Cisco, Nvidia, and skills.sh, and every one returned a clean verdict. And they bought an Instagram ad.

That was the whole operation. No exploit, no zero-day, no clever code. It reached more than 26,000 users, some of them on corporate accounts.

Here is the part that matters. At review time, the skill contained nothing malicious. It instructed the agent to set up a dependency by following instructions hosted at a domain the researchers controlled, configured to redirect to the real Stitch site so a human glancing at it saw nothing wrong. The scanners read the skill, read its bundled files, found nothing, and approved it. They were right. There was nothing to find. The malicious instructions did not exist yet.

After distribution, the researchers changed the page behind that domain. The new version told agents to download and run a script. In the test it only collected email addresses. It could have done almost anything on the machines running those agents.

## The assumption that died

The easy conclusion here is that the scanners failed. The more useful one is that they did not. They did exactly what they were designed to do, and the outcome was still bad. That points us away from any single product and toward an assumption we have all been building on for years.

Every model of software supply chain security we have rests on a single quiet assumption: that what you reviewed is what runs. You scan the artifact, you approve the artifact, the artifact is the thing that executes. The bytes you inspected are the bytes that ship. That assumption has held for packages, containers, and binaries for as long as we have had a supply chain to secure.

Skills break it. A skill is not the payload. It is a pointer to the payload, and the payload can live behind a mutable reference that someone else controls. You reviewed a referral, not a contract. This is a classic time-of-check versus time-of-use gap, except now it operates at internet distribution scale and the window between check and use is whatever the attacker wants it to be.

Which leads to the line I cannot stop thinking about: trust now has a half-life. In the package world, trust decays slowly. A maintainer goes rogue, a dependency gets compromised, and there is usually a code change somewhere that leaves a trace. In the skills world, an attacker can revoke your trust the instant after you grant it, with zero change on your side and nothing for a scanner to re-scan. The defender's safe window collapsed to nothing.

## We keep reaching for the wrong mental model

Part of why this is landing badly is that we keep grabbing the nearest familiar concept to describe a skill, and every one of them is wrong in a way that hides the risk.

It is just a prompt. No. Prompts do not fetch and execute external instructions with your credentials.

It is just a config file. No. Config does not change behavior after approval at the discretion of a third party.

It is just a package. Closer, but a package is immutable once pinned. A skill points outward by design.

A skill is an executable instruction bundle with deferred, mutable execution. You are not adopting code. You are adopting a standing relationship with whoever controls the endpoints it points at. And relationships can sour without you lifting a finger.

## Why agents make this sharper, not safer

There used to be a human in this loop, and that human was the control. Somewhere in a manual setup, a person would pause at "now install this from a domain you have never heard of" and feel the friction. That friction caught things.

Agents execute setup steps with the diligence of a junior who never says no. They follow instructions to fetch and run components with credentials and without skepticism. We did not just automate the work. We automated the install of trust, and we optimized away the one moment where doubt used to live. The 26,000 number is not a story about a clever payload. It is a story about how efficiently trust can now be laundered through legitimate channels and executed without question.

## What actually changes for leaders

The fix is not a better scanner. It is a different question.

The better question is not whether a skill is safe. That is a verdict about a snapshot, and the snapshot is exactly what the attacker controls. The question worth building around is whether a skill's behavior, right now, is within policy. Safe is a property of a file at a moment. Behavior is a property of a running system over time. Security has to move to where the second question lives.

In practice that means treating skills the way you already treat the rest of your supply chain, applied to a thing most organizations are still filing under text:

Inventory every skill in use, with ownership and a record of what it connects to and what data it is allowed to touch. Pin versions and either hash the external references a skill depends on or host them inside an environment you control, so the page cannot change underneath you. Enforce least privilege at the agent level, so a skill does not silently inherit the full access of the person running it. Restrict egress to approved domains and monitor it for behavior that does not match the approval. Validate continuously, not once at the gate.

None of this is exotic. It is the same lifecycle discipline we have applied to third-party code for years. The only new part is admitting that a skill belongs in that category and was never a document.

## The bigger arc

This is not a skills bug. It is the first clear look at where all of this is heading.

And it is not a single team's clever stunt. Three weeks before AIR published, Trail of Bits independently bypassed Cisco's scanner and all three scanners built into the major skill registries, and reached the same conclusion: a scanner checks a fixed package, while an attacker keeps changing the payload until it passes. When two respected teams arrive at the same structural finding by different routes, it is not a bug to be patched. It is the shape of the problem.

As software becomes agent-mediated and instruction-driven, the unit of trust moves from the artifact to the behavior. The artifact stops being the thing you can reason about, because the artifact is increasingly just a set of instructions for fetching and acting on things that are not in front of you yet. Security that was built to inspect artifacts has to follow the trust to where it now lives, which is runtime.

The teams that internalize this early will ship agentic products their customers can actually trust, because they will be governing behavior instead of grading files. The teams still scanning artifacts at the gate will keep getting clean verdicts, right up until the moment they don't.

I spend my days on exactly this problem, on the product side of how AI gets adopted safely inside large organizations. This research is the cleanest illustration I have seen of why the old gate is not enough, and why the work now is to make trust something you can observe continuously, not something you grant once and hope holds.

---

Based on original research by AIR, "The Story of Skills: How We Hijacked 26,000 Agents With One Instagram Ad". Coverage via CSO Online, "How a malicious AI agent skill passed security checks and reached 26,000 users."