Encoding governance on agentic design systems

This article is part 6 of the series "Towards an agentic design system".

A developer asked Claude to generate a new component. Claude wrote var(--color-blue-500) in the CSS module. The component compiled. The PR passed review. The color never appeared in the browser because that color doesn't exist in our design system. Never did, never will.

Nobody caught it. Not the developer. Not the AI. Not code review. The component shipped with a silent failure baked in.

When I found it later while building the token auditor, my first reaction wasn't frustration. It was recognition. The developer didn't know the color didn't exist. Claude didn't know either. Neither of them had the system. They had instructions, and instructions have gaps. This is exactly what happens when a system grows beyond one person's head.

Shifting responsibilities

There's a conversation happening right now about where AI actually changes design. Most of the noise is around prototyping tools. Figma Make, v0, Lovable, Replit, you name it. Generate screens faster, explore ideas with less friction, reduce the gap between idea and artifact. That's real and useful in the right context.

But it "optimizes" something we already knew how to do.

The problem that breaks teams working on complex products at scale is maintaining coherence over time. Making sure what gets designed is what gets implemented. Making sure a system built by one person stays legible when ten people are building on top of it. Making sure decisions don't drift silently into chaos while everyone's focused on shipping.

Prompting faster won't solve that.

What I've been building for the past several months, and what moved into production at Enara last month, is something different: infrastructure where AI doesn't propose screens but enforces rules. Where design decisions aren't captured in a Figma file or a documentation page that someone has to read and interpret. They're encoded as executable constraints the system applies automatically, every time, without me.

Solo efforts can break the wall

I built the Enara design system infrastructure over a weekend.

Not because I'm fast. Because we had spent three months trying to do it the right way. Inside the main monorepo, using Tamagui Core as a foundation. Tamagui is opinionated in ways that made sense for its creators and made development miserable for ours. Devs struggled with it. AI tools struggled with it. Every component took way more than it should. The design system was supposed to accelerate the team and it was doing the opposite.

I pushed to kill it. Got the greenlight. Opened a blank repo on a Friday before lunch.

The first thing I scaffolded wasn't components. It was the monorepo structure: packages for the token system and component library, apps for a custom documentation site. Before I built a single component, I built the environment those components would live in and the observability layer we'd use to understand how they were doing.

Then the token system: JSON files following the W3C design token spec, processed through Style Dictionary so the tokens are platform-agnostic. Colors, spacing, typography, elevation, motion. All of it expressed once, in a format that can compile to CSS custom properties, React Native StyleSheet, or whatever comes next. Then six atoms. Then the skills and rules I'd been developing in my personal system for months, adapted to this new context.

By Monday it was running. Real constraints. Devs started adding CI/CD. A real team that would soon start building on top of it.

That's when governance became an actual problem instead of a theoretical one.

I was the system

My role changed when the system went into production.

Before, I was the design system. Every decision passed through me, explicitly or through osmosis. Developers would ask, I'd answer, the answer would propagate. I caught violations by reviewing. I maintained consistency by being the person everyone checked with. The mental model lived in me and I lent it out.

That doesn't scale. And honestly, it isn't interesting work.

If the knowledge lives in you, the system reflects your understanding. The moment someone else starts building components, that knowledge stays in you while the system keeps growing. This require interpretation. And interpretation is where drift begins.

Encoding design decisions

So I built something that would carry the knowledge instead of me.

Developers started using the skills I'd created: the scaffold-component skill, the component templates, the established patterns. The infrastructure was doing its job. Generating consistent file structures, correct TypeScript interfaces, CSS modules with token references.

My token-auditor skill scanned CSS modules, matched every var(--*) reference against the JSON source files, flagged anything that wasn't in the system. Hardcoded hex values. Hardcoded pixel spacing where tokens existed. Missing references. It found 43 violations across the molecule layer. Useful. Concrete. Actionable.

But it only knew one question: does this token exist?

Between v1 and v2, I did something harder than adding more checks. I encoded the design system's intent (the decisions that live in my head, the rules I apply automatically without thinking) into a knowledge layer the auditor could reason against.

Visual hierarchy. Foreground sequence. System feedback patterns. Elevation-size coherence. Each rule encoded the same way: here's what's valid, here's what violates the intent even when the syntax is correct. Using --foreground-muted for body copy isn't a missing token — it's the wrong position in the hierarchy. A dropdown with a custom rgba shadow isn't breaking syntax — it's bypassing a design decision. Those are different failures, and v1 had no way to tell them apart.

When I ran v2 on the same files, it found 64 violations. Twenty-one more than v1.

That number could look like regression. It isn't. The auditor got smarter. It found components where every individual token was valid but their relationships violated the hierarchy rules.

v1 checked existence. v2 checks intent.

The difference between those two things is the difference between a linter and governance.

FilterBar: zero violations. Fully tokenized. Clean. Also the component built most recently, after the infrastructure was established. I can't prove causation. But the scaffolding skill, the metadata structure, the established token conventions. They're easier to follow correctly than to work around. When the system is well-designed, staying on the rails is the path of least resistance. The auditor didn't make FilterBar clean. The environment did. The auditor is how we verify that the environment is working.

The goal isn't to catch violations. It's to make violations the harder path.

That shift changed my position in every conversation about quality. When design owns the infrastructure that enforces its own decisions, the question stops being "how do we get more design QA?" and becomes "design already has quality solved." The result: I'm now working directly on product UI, owning visual and behavior QA across the whole system. From designer who delivers to designer who guarantees.

Designing the conditions

The piece I wrote last month was about agent orchestration: structuring context, strategies, and tools so AI can participate meaningfully in a design pipeline. This is the layer that comes after you have that working.

Orchestration gets AI working inside the system. Governance keeps the system coherent as it grows.

They're not the same problem. Orchestration is about capability. Governance is about trust. About building the infrastructure that lets you trust that what gets built is what you designed, without having to verify every decision manually.

The harder part of governance isn't technical. It's epistemic. You have to articulate rules you've been following unconsciously. The process of building the token auditor forced me to write down what "correct" means. Not as documentation, but as knowledge the system can execute. Every rule I encoded was something I already knew. The work was making that knowledge legible to someone and/or something other than me.

That's design work at a different level of abstraction. You're not designing the interface. You're designing the conditions under which interfaces get built correctly.

The system doesn't just find individual violations. It surfaces the shape of how drift happens, which is the only way to stop it before it compounds.

That's self-healing infrastructure. Not autonomous. I still decide which rules to encode, which gaps to fill, which patterns mean something. But the execution doesn't need me. The system closes the loop.

The people talking about agentic design systems as a future direction are describing something that's already running. The gap between syntactically valid and semantically right (between code that works and code that belongs) is where design decisions either hold or dissolve over time.

This is how they hold.

Agentic AiDesign SystemsAi InfrastructureDesign Thinking

← Back to Thoughts