Make it rule-based, and call the LLM as rarely as possible.
The most expensive mistake in an AI project is using the LLM for everything. The architecture that actually works is the opposite instinct. Build a system that is, in the end, as rule-based as possible, and reach for the LLM as rarely as you can. A flexible, hybrid setup where classic rules do the bulk of the work and the model is called only when the rules run out.
Start with what does not deserve a model at all. Keyword-based filtering, pre-processing and post-processing on simple matches, does not make much sense to hand to an LLM. What makes sense is classic, category-based filtering driven by the user's conditions. That does the job well and cheaply. Someone on a project proposed filtering with regexes, and the honest reaction is that it is fine but uninspired. The upgrade is not “throw a model at it.” The upgrade is a rule engine, hybrid, that uses the LLM on corner cases. Add a human envelope on top, human revisions, personalization, and now it actually sounds like cutting-edge work instead of a pile of regular expressions.
The reason to be stingy with the model is not stinginess, it is physics and economics. You cannot just point an LLM at your database. It will eat millions of dollars in tokens, the context windows get enormous and get truncated somewhere, it will be inaccurate, it will hallucinate, its focus will smear, and the right way to implement it is a completely different principle. That is the line people skip past when they imagine an AI that just reads everything. It does not work that way, and it bankrupts you while not working.
Two practical decisions fall out of this, and I have made both with clients. First, the custom-LLM question. One of my first questions on a recent project was whether they needed their own LLM, and the answer was a flat no. Somebody had clearly explained to them what a custom LLM actually involves. So you take one of the top players, and while they named everyone, Claude and the rest, the choice often lands on a mainstream API. You are not training a model. You are connecting one to your already-parsed data, and standing up the vector layer that lets it search that data. The work is connection, not training.
Second, where the model genuinely earns its keep. For these clients, the most valuable part was unmistakably the LLM piece, but for a specific job: catching errors. They said if we can really see it identify something, find mismatches, flag where a SKU should be and instead there is some strange word that does not belong, that is the value. The model is not there to do the bulk processing. It is there to be the thing that notices the cell that is wrong. All the awkward edge cases people worry about should not have to be hand-coded, because writing a Python script for every possible scenario would cost hundreds of hours. With ML and an LLM on the corner cases, you cover them far faster, and the rule-based core handles the rest cheaply.
There are smarter patterns once you accept this division of labor. For genuinely complex queries, the kind that need research or specialized handling, you can fire the request to several models in parallel, let them each respond, and have one more model, a judge, compile the answers into a single strong one. That is firepower aimed precisely where it is justified, not sprayed across every routine lookup.
And keep a backup plan, because the central assumption is unproven until you test it. I am genuinely not yet certain that a model can find what we need inside our database, even though I suspect the real issue is the prompt. That is a concept, not a validated assumption, and you should treat it as one. Build the rule-based core you can trust, reserve the LLM for the corner cases and the error-catching where it is irreplaceable, and verify the risky part before you bet the architecture on it. Minimize the model. Do not worship it.