Don't declare your intents

I was working lately on improving some customer support agents, and one thing gave me an "aha" moment: the design of the intents that route each ticket to the right path.

The intents were mapped directly from the existing support system. The LLM would identify the intent, map it to a category like "Track Delivery", "Reschedule", "Cancel Order" or "Request Invoice", and route it to the right workflow.

Anything that didn't fit went to "Other", which meant a human. The system worked, to the extent that the demonstration worked. Then came the thought of grouping tickets by embeddings and looking at which intents actually dominate.

A classifier can't report what you forgot

A classifier is a function from a query to a fixed set of labels. That is the whole contract. It cannot return a label we never defined, which means the categories we got wrong were never flagged as errors. They were invisible.

A ticket about a driver who left a parcel with the wrong neighbor didn't come back as "you forgot a category". It came back as "Track Delivery", with reasonable confidence, because that was the closest box on the shelf.

The router always produced an answer. Every ticket got a label, the dashboard filled in, and the thing I most needed to know, what we had failed to model, was the one thing the system structurally couldn't report.

And it only gets worse over time. As the business evolves and new classes of problems emerge, the predefined categories go stale, and the blind spots get folded into the best-matching label and counted as successes.

"Other" was where it all hid. That bucket wasn't noise. It was a stack of intents we hadn't discovered yet, compressed into a single uninformative label so it stopped bothering me.

The corpus already knows the answer

Here is the inversion. I already had the data. We had tens of thousands of resolved tickets, each one a real thing a real customer needed, each one already resolved by a human who left a trail of what it took.

The intents were sitting in that pile the entire time. We just decided what they were before we read them. So I went back and read them first, in the order I should have started with:

Take the whole corpus.
Embed it and find the structure that's actually there: reduce, cluster, look.
Name the groups that emerge, and let the long tail stay a long tail.
Then, and only then, freeze a router over the intents found.

What came out was not the handful of categories we had declared. The clustering pulled out more than 90 distinct recurring problems, and the most interesting ones existed nowhere in the original taxonomy.

That gap is human, not just technical. It's easy to sit down and declare 10 to 15 generic categories. Nobody hand-writes 91 clusters, let alone keeps them updated as the problem space moves.

Discovery is a loop, serving is a snapshot

Discovery and serving aren't the same layer. Discovery is an offline pass over the full corpus: slow, thorough, run on a cadence, allowed to take an hour. Serving is the hot path: a fast classifier that answers in milliseconds.

I wasn't going to run clustering per request any more than I'd retrain a model per request. I ran discovery offline, then distilled what it found into the cheap online router.

The serving layer is still a classifier with a fixed list. But the list is now derived from the corpus instead of from a guess, and it can be re-derived on a schedule, because the corpus moves.

Re-derived is the operative word. Even a taxonomy discovered correctly decays, because the world it describes keeps moving. New products ship, new failure modes appear, a carrier changes a policy.

Suddenly there's a class of ticket that didn't exist last month. A frozen list routes all of it into yesterday's best match, "Other" quietly refills, and you're back where you started.

So treat induction as a loop, not a launch. Re-run discovery on a schedule, diff the new clusters against the old ones, and let the diff tell you when the taxonomy has drifted.

Discover, then declare

Before writing down a single intent, ask one question: am I describing the corpus, or my assumptions about it? Discover the intents, then declare them. Not the other way around.

And even after I looked and clustered properly, the clusters lied to me in a second, subtler way. But that is a different post.