Patterns in Conversational AI

The demo problem

Conversational AI demos are magical. You ask a question, the AI responds intelligently, everyone is impressed. Then you deploy to production and discover that users don’t ask questions the way you expected.

This gap between demo and production is larger for conversational AI than almost anything else I’ve built.

What I’ve learned

Start with the unhappy path. Most development time goes into handling the cases where the AI doesn’t understand, where the user is frustrated, where something goes wrong. Build these first.

Conversation design is the bottleneck. The AI models are good enough. The limiting factor is figuring out what the conversation should actually look like. This is a skill that’s different from engineering and different from product management.

Humans prefer predictability over capability. Users would rather have a bot that does three things reliably than one that tries to do everything but sometimes fails. Constrain the scope aggressively.

You need humans in the loop. Not as a crutch, but as a core part of the design. The best conversational systems I’ve seen have graceful handoff to humans when the AI is uncertain.

Still thinking about

How to measure success in conversational AI. Resolution rate? Customer satisfaction? Deflection from human agents? All of these can be gamed, and none of them fully capture whether the system is actually helping.

How to handle the cases where the AI is confident but wrong. These are the worst failures—users trust the system and get bad information. I don’t have a good answer for this yet.