The human layer in agentic design systems

This article is part 7 of the series "Towards an agentic design system".

Last week, one of our developers was building a warning message. She noticed the color contrast looked off. She could have done what developers do all the time in that situation: swap the token, hardcode a value that looks right, avoid the back-and-forth over something that simple. Ship it and move on.

She didn't. She came to me.

We checked the implementation together. Right tokens, right mapping, right usage. I ran the token-auditor skill to confirm. The component was using the system exactly as designed.

The error was mine.

I went back to the primitive yellow scale and found it. At some point, I had messed up the values. A bad decision baked into the foundation, invisible until someone looked closely enough to question it. I rebuilt the scale, tested it in Figma, exported the JSON, fixed the tokens, pushed the package. Done.

This is the moment I want to sit with. Because the interesting part isn't the fix. The fix took an hour. The interesting part is the sequence.

A developer caught a visual problem. She didn't patch around it. She traced it back through the system. The auditor confirmed the implementation was correct. The error pointed to the primitive layer. And the primitive layer pointed at me.

In a traditional workflow, that contrast failure gets caught (if it gets caught) during QA. Someone files a ticket. A designer investigates. Fingers point at the developer. The developer points at the spec. The spec points at the token. Nobody knows where the actual decision went wrong because the chain between decision and execution has too many links.

In our workflow, the developer trusted the system enough to trace the problem instead of working around it. That trust is the environment working. And the environment traced the failure back to me.

That's accountability. And it changes everything about how I think about my role.

What I actually built

I've spent the last six articles documenting the infrastructure: metadata, codebase indexing, orchestration, governance. Each piece solves a technical problem. But the thing that connects all of them isn't technical. It's organizational.

When I started building the agentic design system at Enara, my goal was practical. I'm the sole designer working directly with engineering. No design system team. No design team, period. I needed a way to codify design decisions into something engineers could consume without me being in every conversation, reviewing every PR, explaining every spacing choice.

I always say my biggest disadvantage is also my biggest advantage. I never had the infrastructure of a design systems team. I also never had the friction. No adoption politics. No evangelization cycles. No fighting for a seat at the table. I brought my own chair and claimed my space.

What I didn't expect was what that space would demand. When you own the system that encodes your decisions and executes them in production, you own the output. The yellow scale wasn't a "design system bug." It was my bug. And the system told me so.

The surface where designops meets devops

I keep running into something in conversations with other designers, especially design system practitioners: this work doesn't fit cleanly into either designops or devops. It sits at the interface where they share a surface.

A design token governance rule and a CI pipeline linting check serve the same structural purpose: enforce decisions at the point of execution. But they operate on different materials, different feedback loops, different failure modes. Designops cares about whether the right token exists and communicates the right intent. Devops cares about whether the build is clean and the component renders correctly. The agentic design system is infrastructure that works the seam between them.

My token auditor doesn't just check that tokens exist (that's a linter). It checks whether the relationships between tokens violate the design intent (that's governance). When it flags that --foreground-muted is being used for body copy, it's catching a designops failure at the point where devops would normally never look. When it reports that a component's shadow doesn't follow the elevation scale, it's enforcing a design decision inside the development pipeline.

This isn't merging the two disciplines. They still have different needs, different expertise, different reasons to exist. But the teams that treat them as fully separate are missing the joint. And the teams that try to collapse them lose necessary specificity. The work is in the seam.

The system is agentic when the human loop works

The agentic design system can resolve, build, audit, and report. I've documented all of that. But there's a nuance that gets lost in the technical framing: it only works correctly based on what's encoded as correct, wrong, and exceptional.

The yellow scale story proves this from both sides. The system executed my decision faithfully. It was wrong, and the system didn't know. A human caught it. She noticed the contrast looked off and traced the problem instead of patching it.

That's the feedback loop that actually matters. The system can enforce what you encode. It can audit against rules you define. It can report violations you've articulated. What it can't do is know that you encoded something wrong in the first place. That requires a human who trusts the system enough to trace problems through it, and a designer accountable enough to fix what they find at the source.

Teams need to sit down and set these rules. Surface errors when they catch them. If they don't, the result is predictable: errors scale, drift compounds, and AI confidently builds the wrong thing and ships it. The system doesn't create correctness. It scales whatever you give it. Including your mistakes.

The real shift

My role as a designer right now is not what it was a year ago. I'm not just a product designer anymore. I'm designing environments. The conditions and rules that enable correct execution at a speed I physically cannot match by reviewing every screen and every PR.

I didn't set out to do this. I set out to solve a practical problem: how can I codify design into something my engineers can consume without needing to evangelize about design? The system gave me ownership of the output. I didn't need to fight for influence. I needed to encode my decisions and accept that the execution would trace back to me.

That's the shift I care about. AI didn't change what I design. I still make decisions about color, spacing, hierarchy, type. What changed is what I own. I own the execution. When the environment is right, the system produces correct output without me in the room. When the environment is wrong, the system tells me.

The yellow scale incident is the clearest proof I have. The system didn't surface an error. It surfaced whose error it was. Mine. And it gave me the infrastructure to fix it in 30 minutes and push corrected tokens to the entire codebase.

Most designers hand off decisions and hope. I encode decisions and know. The tool makes that possible. The willingness to be accountable makes it real.

AIDesign SystemsDesignProduct DesignProduct

← Back to Thoughts