← Back to Writing
Systems AI Field Notes

The Feedback Loop Nobody Built

7 min read

Reading note

Essays for people who want the pattern behind the pattern.

This page is designed to read like a quiet, deliberate argument rather than a feed item.

Every AI system I’ve worked on was designed as a pipeline: data goes in, output comes out. Almost none were designed as a loop.

This is the most consequential architectural mistake in enterprise AI right now.

Pipelines vs. loops

A pipeline is linear: input → processing → output. Most AI implementations follow this pattern. A user asks a question. The system retrieves relevant documents. The model generates an answer. The answer is displayed.

Done. Next question.

A loop is circular: input → processing → output → measurement → adjustment → input. The output feeds back into the system and changes how the next input is processed.

The difference isn’t theoretical. It’s the difference between a system that stays the same and a system that improves.

What’s missing

When I audit AI implementations, I look for the feedback loop first. Specifically:

Is anyone measuring output quality? Not uptime. Not latency. Quality. Are the answers accurate? Are they relevant? Are they appropriate? In most implementations, the answer is: we don’t know. The system produces output. Nobody systematically evaluates whether that output is good.

Is quality data feeding back into the system? Even when output is evaluated — say, a user rates an answer as unhelpful — does that signal go anywhere? In most systems, it doesn’t. The thumbs-down disappears into a log nobody reads. The system doesn’t learn. It doesn’t adjust. It just keeps producing the same quality of output forever.

Is the retrieval layer being refined? The model is almost never the bottleneck. The retrieval layer is. But retrieval tuning requires knowing which documents are being surfaced, whether they’re the right documents, and whether the chunking strategy is producing useful segments. Most teams set up retrieval once during development and never touch it again.

Are prompts being iterated based on real usage? Prompts are written during development against test data. Production usage is different — messier, more varied, more ambiguous. The prompts that worked in testing may not work for the actual range of inputs. But without a feedback mechanism, nobody knows.

Why this happens

Building the pipeline is the deliverable. Building the loop is maintenance. And maintenance doesn’t get funded the same way.

The project plan says: “Deploy AI agent by Q2.” It doesn’t say: “Build a continuous improvement system that makes the agent better every month.” The first one has a deadline, a budget, and a sign-off. The second one is an ongoing cost with no clear milestone.

So the pipeline ships. The loop doesn’t get built. And the system’s quality is fixed at whatever it was on launch day — which is usually the worst it will ever need to be, because the data it operates on will keep changing while the system stays static.

The systems engineering fix

In systems engineering, a system without a feedback loop is an open-loop system. Open-loop systems are inherently fragile because they can’t self-correct.

The fix is closing the loop. Concretely:

Sample and evaluate output regularly. Not everything — a sample. Weekly. A human reviews 20-30 responses and scores them on accuracy, relevance, and appropriateness. This takes an hour. It’s the most valuable hour anyone can spend on an AI system.

Route quality signals back to the retrieval layer. When the model produces a bad answer, trace it back. Was the retrieved document wrong? Was the chunk too short? Was the right document not in the index? Fix the source, not the symptom.

Version your prompts and test changes. Treat prompts like code. Version them. A/B test changes against production traffic. Measure whether changes actually improve output quality, not just whether they feel better.

Build a dashboard, not just logs. Logs are where data goes to die. A dashboard is where data becomes visible. Track output quality over time. Make the trend line visible to the team that owns the system.

This is all standard systems engineering practice applied to a new domain. The feedback loop isn’t a nice-to-have. It’s what separates a system from a script.