Skip to main content

Ed O'Connell

Content Infrastructure · Solution Architecture

I've been in content management for twenty years — UMass system, multi-college environments, the kind of places where six departments have six websites saying six different things about the same program. The problem has always been the same: how does an institution speak with one voice when its content is scattered across disconnected systems?

That problem just became urgent. AI doesn't create institutional incoherence — it accelerates whatever is already there. If your content is structured, typed, and relational, an AI agent can reason about your institution. If it isn't, the AI will generate plausible text that sounds institutional but isn't grounded in anything real. Contradictions compound as quickly as coherence.

Structured content has always been the answer to this. What changed is the consumption layer. AI agents can now traverse modeled objects — a program has a coach, a coach belongs to a department, a department serves a division. The agent doesn't infer those connections. They're in the schema. That's the difference between an AI that knows your institution and one that guesses.

The Situation

Springfield Commonwealth Academy is an international school in China. When their president said she wanted to be "AI-forward," I checked the infrastructure. The school's website ran on Webflow — professional-looking, but the content was locked inside a visual builder. No API. No structured data. No way for any system to query the school's information programmatically.

SCA also has multiple disconnected web domains and is pursuing Cognia accreditation as a system. That means coherent institutional data isn't optional — it's a compliance requirement. And the gap between "AI-forward" aspiration and Webflow reality was the entire problem.

Why Sanity

I evaluated Contentful and Strapi before choosing Sanity. The decision wasn't about features — it was about architecture. Sanity's schemas are defined in code, which means they live in version control and can be reviewed, tested, and evolved like any other part of the system. The API is real-time — no build step required to serve fresh content. GROQ gives you a query language that traverses relationships between content types. And Content Studio lets editors work without calling a developer.

But the deeper reason: Sanity functions as a structuring gate. What you know about your institution — what a human and an AI agent can inspect and perceive together — passes through a content model on its way to expression. The schema controls what gets through and how it's organized. That's not a CMS feature. That's content infrastructure.

What I Built

I modeled SCA's content architecture with an AI agent. Not over weeks of requirements gathering — in a few focused hours. The agent and I inspected the school's existing content together: I provided the institutional context and navigated the material constraints, the agent analyzed the DOM structure of every page, and we structured what we found into typed, relational data.

The Content Infrastructure
  • 12 document types with typed fields and relationships — pages, programs, news, people, departments, events, student projects, alumni stories, media galleries, admissions paths, boarding features, and site settings
  • Programmatic extraction — Playwright-based scripts inspected every Webflow page, extracted content as structured data, and converted HTML to Sanity-native Portable Text. Webflow had no API; everything was DOM-based.
  • Idempotent import pipeline — three-mode operation (dry-run, images-only, create) with slug-based deduplication. Run it twice, get the same result.
  • Automated provisioning — a Node.js agent polls Sanity for new student projects, provisions Google Drive folders via GAM with correct permissions, and updates the document with the folder URL. State machine with crash recovery: if the process dies mid-provision, it picks up where it left off without creating duplicates.
  • Reference frontend in Astro 5 — SSR for dynamic routes, static for landing pages, JSON-LD structured data, Vercel deployment. Built to prove the data model works end-to-end against real content.

All of this is running. The content API serves live data. The extraction scripts are documented and repeatable. The schemas, the import pipeline, the automation agent, the frontend: auditable and in version control.

The Human Work

The content architecture took hours. Getting the communication right took weeks.

Once the infrastructure existed, SCA engaged a WordPress development partner to build the production site. I needed to brief them on the content architecture — what exists, how it's structured, why the decisions were made — so they could build on it. The first version was technically accurate and communicatively wrong. It read like instructions from above. In a Chinese international school context, where relationships, face, and positioning all matter, that document would have been dead on arrival.

The briefing went through multiple major revisions. Each one was driven by a communication insight, not a technical one:

Revision Voice: "we" → "I"

The first draft used "we built" throughout. But I was the sole technical resource. Using "we" obscured accountability and made it unclear who was responsible for what. First person singular is more honest and more useful to the reader.

Revision Remove direct address

The draft spoke to the developer: "You should implement this as..." That's patronizing. It positioned the briefing as instructions rather than shared context. Rewritten in third person: here's what exists, here's the architecture, here are the options. The partner brings their own expertise.

Revision "Known Gaps" → "Implementation Opportunities"

A section called "Known Gaps" is a confession. It invites skepticism. Same information, reframed as opportunities — areas where the WordPress build adds value — positions the partner as the expert, not the janitor.

Revision "Core Requirements" → "Institutional Target Architecture"

"Requirements" implies a client dictating to a vendor. "Target architecture" implies a shared destination. Same technical content, different relationship dynamic.

Revision Add phased implementation

The original briefing presented the full architecture as a single deliverable. That's an ultimatum. Breaking it into phases — Foundation, then Infrastructure — gives the partner an on-ramp. They can deliver value early and expand from a position of success rather than obligation.

Revision Maintain bilingual EN/ZH throughout

In a Chinese international school, an English-only technical document signals that the author doesn't understand the operating context. Every version was maintained in parallel Chinese translation — not as an afterthought, but as a structural commitment to the audience.

Revision Downgrade overpromises

An AI auditor flagged three claims that oversold what the reference implementation could do. Each was quietly reduced to what was actually true. Honest calibration builds more trust than aspiration.

None of these revisions changed the architecture. The schema was the same. The API was the same. What changed was the approach — the framing, the tone, the relationship dynamics encoded in word choices. The document went from "here's what you need to build" to "here's what we've learned, and here's how we might build on it together." The briefing is published and live.

What This Connects To

I've been watching institutions fail at this for twenty years. The failure mode is always the same: content gets created for a specific channel — a website, a brochure, an email — and stays locked there. When the next channel arrives, you rebuild. When the one after that arrives, you rebuild again. Each cycle compounds the incoherence.

AI is the channel that changes the equation. Not because it's new — new technology comes along every few years. Because AI is an accelerator. It amplifies whatever it finds. If your content is structured, AI amplifies coherence. If it isn't, AI amplifies confusion. And it does both at institutional scale, faster than anyone can manually correct.

This is like telling someone to make sure they're insulated before they plug in the light. Electricity seeks ground no matter what. AI seeks salience no matter what. What you want to control is the path of least resistance. Structured content — typed, relational, with real provenance — is that insulation.

People install new tools over a weekend because of FOMO. Institutions can't do that. The stakes are too high and the contradictions travel too far. The organizations that will navigate this well are the ones that have their content modeled as infrastructure before the acceleration begins. That's what Sanity is for. And that's what Solution Architects help organizations see.