"Just call ChatGPT" isn't the answer
A raw general-purpose language model (a generic ChatGPT, Claude, or similar) knows nothing specific about your business. Ask it "do you ship the ZT-500 to Germany" or "what are your hours on Sundays" and it will produce a confident, entirely fabricated answer. Not because it's malicious, but because generating plausible-sounding text is what language models do.
The fix is to constrain the model to your actual business content and to put a thin rule layer on top that handles the things language models are unreliable at: deciding when to ask the visitor for their name, knowing when to stop pitching, recognizing a request for a live person, and routing that request to you quickly. None of this is exotic, but doing it well is the difference between "a chatbot" and "a chat assistant I'd actually put on my site."
Grounded retrieval, in plain English
Grounded means the bot's reply has to come from content you gave it: your website text, an FAQ you maintain, policy documents, maybe an inventory feed. When a visitor asks a question, the system first looks up the relevant chunks of your content, then hands those to the language model with the instruction "answer only from this."
The retrieval step is where most of the quality lives. Two different techniques each catch things the other misses:
- Dense retrieval (vector-based). Every piece of your content gets converted into a number representation that captures its meaning, not just its words. A visitor asking "how long does shipping take" will match a chunk about "delivery times" even though none of the words overlap. Great for paraphrase and synonyms. Weak at exact product names, SKUs, or proper nouns.
- Lexical retrieval (keyword-based, think BM25). Classic search: what words appear where, weighted by how unusual they are. A visitor asking "price of the ZT-500" will find the ZT-500 product page immediately. Great at literal matches and rare terms. Weak at paraphrase.
A well-built assistant runs both in parallel and fuses their rankings — a technique the industry usually calls reciprocal rank fusion, or RRF. That way a question like "I need something durable for daily commuting" gets help from the dense side (catches "durable" in a product description that uses "reinforced construction"), and a question like "ZT-500 price" gets help from the lexical side (pins the exact product to #1). Neither approach alone is enough for a business with both narrative answers and branded products.
A rule-based layer on top of the LLM
Here's a thing that sounds boring but matters a lot: pure prompt engineering is unreliable. If you just tell a language model "always ask for the visitor's name before closing the lead," it will comply often enough to look fine in a demo, and not reliably enough to run a real lead-capture flow on. Missing the name on some leads is a real business cost, and the failures are hard to catch during testing because they look identical to the successes until you review a week of transcripts.
The fix is a small deterministic policy layer: a rule-based decision per turn about what shape the next reply should take. Should the bot just answer? Should it answer and then offer to connect the visitor with a real person? Is it time to ask for a name? A contact method? Is the visitor winding down, in which case the bot should stop pitching? Is the visitor frustrated, in which case the bot should acknowledge that before anything else?
These decisions are made by code, not by asking the model. The model still writes the words, but the instructions it receives are different depending on which rule fired. Same vocabulary, different guardrails.
Why this matters for you: it's how the bot stops asking for your email on the third turn in a row, how it refuses to offer to connect twice to the same visitor, how it knows to wind down the conversation instead of keeping the pitch going when you've clearly decided not to buy. None of that is a prompt trick. It's a policy layer.
Session state and the "snapshot" lead
A chat is a conversation. The bot needs to remember things across turns. When a visitor says their name on turn 2 and their phone number on turn 5, the bot has to hold onto "name given" the whole time, even if the intervening turns talk about something else. That's session state.
The important constraint: state is per-session and time-bounded. It's not a running memory of "everything this visitor has ever told us." It's scoped to the conversation, it expires, and it doesn't train the underlying model. That matters for privacy and for predictability.
Lead capture is a related but separate event. Most chat platforms get this wrong by firing a notification every time the bot picks up a new fact about the visitor. A well-built system treats the lead as a single snapshot event: the bot collects what it can across the conversation, and when the visitor shares contact info, the system fires one notification with everything. No duplicate emails, no CRM pollution, no "is this the same person" guessing game downstream.
What a well-built chat assistant deliberately doesn't do
Some of the most useful architectural choices are about what the bot won't do:
- No open-internet search during the conversation. The bot only knows what you gave it. If a visitor asks about your competitor's pricing, the bot shouldn't know, and shouldn't pretend to know.
- No fine-tuning on your visitors. Customer conversations stay as conversations. They don't silently rewrite what the bot believes about your business. Learning happens through deliberate content updates you make.
- No general model of "businesses like yours." The bot doesn't generalize from "other dentists charge X for whitening" to your price. It tells the visitor the team can follow up if your FAQ doesn't cover it.
- No quiet drift in behavior. The rule layer is code, not an LLM prompt that slowly changes with updates. The bot's behavior is predictable between updates, and updates are intentional.
That list is a feature set, not a limitation. A bot that "learns from customer conversations" is a bot you cannot audit. A bot that "searches the web" introduces a moving source of truth you cannot review or sign off on. Neither is acceptable on a small business site.
Per-industry tuning, briefly
A message like "severe pain and bleeding" means something different to a dental office, a legal intake form, and an e-commerce support queue. A well-tuned chat assistant recognizes that and routes accordingly. This is usually done with industry-specific keyword patterns and verticalized behavior rules: a dental bot knows what an emergency looks like; a legal bot doesn't try to diagnose an injury; an e-commerce bot cares about SKUs and order numbers.
None of this is glamorous. It's the difference between a generic "AI for any business" that kind of works for everyone and a per-industry assistant that actually belongs on your site.
How Simple Business Bots handles each of these
- Grounded retrieval: hybrid dense plus lexical, with rank fusion. Exact product-name matches are boosted so a visitor asking for a specific item gets that item, not a generically-similar one.
- Rule layer on top of the LLM: a deterministic policy engine decides per turn whether to answer, answer-and-offer, collect a name, collect contact info, wind down, or de-escalate. The LLM writes the words; the layer decides the shape.
- Session state and snapshot lead: the bot remembers what matters across turns within a 24-hour session window, and fires exactly one lead event when contact info is captured. Fan-out to email, webhook, and CRMs is deduped.
- Hard boundaries: no web search, no fine-tuning on customer chat, no background learning. Knowledge changes when you change your content, not before.
- Per-industry behavior: a set of industry-specific verticals have their own keyword patterns and behavioral rules, from dental to auto repair to real estate to e-commerce. Generic fallback for anything outside that.
Questions to ask any chat vendor
If you want a short checklist for evaluating any AI chat vendor, these five questions tend to separate the careful systems from the thin wrappers:
- "Can your bot look things up on the open internet, or only from content and systems I've explicitly provided or connected?" The right answer is only content and systems I've connected (your FAQ, website, inventory feed, calendar, CRM, etc.), not the open internet.
- "How does the bot decide when to hand off to me?" The right answer is some version of a rule, not the model decides on its own.
- "What does the bot do when it can't answer?" The right answer is admits it, captures contact info, offers a handoff, not guesses and not replies with a dead end.
- "Do customer conversations train the underlying model?" The right answer is no.
- "Is there behavior specific to my industry, or is it the same bot for everyone?" Industry tuning isn't always essential, but it's a meaningful signal about how much care went into the product.