Docs/Knowledge Base/How the AI Uses Your Documents

How the AI Uses Your Documents

Understand how Synaptiq retrieves and uses your uploaded documents to generate accurate, grounded answers.

How the AI Uses Your Documents

When a customer asks your Synaptiq AI agent a question, the answer does not come from the AI's general training data. It comes from the specific documents you uploaded to your knowledge base. This page explains how that works so you can structure your content for the best possible results.

The Short Version

A customer asks a question.
Synaptiq searches your knowledge base for the most relevant sections of your documents.
Those sections are provided to the AI as context.
The AI generates an answer based on that context.
If no relevant content is found, the AI says it does not know rather than guessing.

This approach is called Retrieval-Augmented Generation (RAG), and it is the core of what makes Synaptiq accurate and trustworthy instead of creative and unreliable.

Retrieval-Augmented Generation (RAG) Explained

Large language models are trained on vast amounts of text from the internet, but they do not know anything about your specific products, pricing, or policies. Left on their own, they will either make things up or give generic answers.

RAG solves this by adding a retrieval step before generation:

Without RAG: Customer asks a question, the AI generates an answer from its general knowledge, which may be wrong or outdated.
With RAG: Customer asks a question, the system retrieves relevant content from your documents, and the AI generates an answer grounded in that specific content.

Think of it like the difference between asking someone to answer from memory versus handing them the reference manual and asking them to look it up first. The second approach is far more reliable.

How Document Processing Works

When you upload a document, Synaptiq does not store it as a single blob. It goes through a multi-step processing pipeline.

Step 1: Text Extraction

The raw text is extracted from your document. For PDFs, this means parsing the PDF structure. For DOCX files, it means reading the underlying XML. For TXT and MD files, the content is used as-is.

Step 2: Chunking

The extracted text is split into smaller sections called chunks. Chunking is essential because when a customer asks a question, the AI does not need your entire 30-page product guide. It needs the two or three paragraphs that are actually relevant.

Synaptiq chunks your documents intelligently:

Heading-aware splitting. If your document uses headings (H1, H2, H3), chunks align to section boundaries. A section titled "Pricing" stays together rather than being split in half.
Size-controlled. Each chunk stays within a target size range (typically 500-1500 tokens) to balance completeness with precision.
Overlap. Adjacent chunks share a small amount of overlapping text so that information near a chunk boundary is not lost.

This is why document formatting matters. A well-structured document with clear headings produces clean, topic-focused chunks. A wall of unformatted text produces chunks with mixed topics, leading to less precise retrieval.

Step 3: Embedding

Each chunk is converted into a vector embedding, which is a numerical representation of the chunk's meaning. Two chunks about similar topics will have similar embeddings, even if they use different words.

For example, a chunk about "monthly subscription cost" and a customer question about "how much does it cost per month" will have very similar embeddings, allowing the system to match them even though the exact wording differs.

These embeddings are stored in a vector database that enables fast similarity searches across all your documents.

What Happens When a Customer Asks a Question

Here is the step-by-step flow when a real conversation happens:

1. Query Embedding

The customer's question is converted into the same type of vector embedding used for your document chunks. This puts the question and your content into the same mathematical space where similarity can be measured.

2. Similarity Search

Synaptiq searches your vector database for the chunks whose embeddings are most similar to the question's embedding. It retrieves the top matching chunks, typically between 3 and 8 depending on how much relevant content exists.

3. Relevance Scoring

Each retrieved chunk receives a relevance score between 0 and 1. Chunks that closely match the question's intent score higher. Chunks that are only tangentially related score lower.

Only chunks above a minimum relevance threshold are passed to the AI. This prevents low-quality matches from polluting the answer.

4. Context Assembly

The qualifying chunks are assembled into a context window along with the conversation history and the customer's current question. The AI receives all of this as input.

5. Answer Generation

The AI generates a response using the retrieved context. It is instructed to base its answer on the provided documents and to cite which documents it drew from.

Confidence Scoring

Every answer Synaptiq generates carries an internal confidence score. This score reflects how well the retrieved document chunks matched the question and how much of the answer is grounded in the source material.

High confidence: The question closely matched one or more document chunks, and the answer is directly supported by the content. The AI delivers the answer normally.
Medium confidence: The question partially matched available content. The AI answers but may include qualifiers like "based on the available information" or suggest the customer verify specific details with the sales team.
Low confidence: Little or no relevant content was found. The AI falls back to its "I don't know" behavior.

You can see confidence indicators in the test interface when evaluating your knowledge base coverage.

When the AI Says "I Don't Know"

Synaptiq is designed to say "I don't know" rather than fabricate an answer. This happens when:

No relevant chunks were found. The customer asked about something your documents simply do not cover.
Retrieved chunks scored below the relevance threshold. Content was found, but it was not close enough to the question to be trustworthy.
The question is outside the AI's configured scope. If the customer asks something unrelated to your product (like a personal question or a request about a competitor), the AI declines.

When the AI does not know the answer, it acknowledges this honestly and offers to connect the customer with a human team member or suggests they rephrase their question.

This is a feature, not a bug. A sales AI that never says "I don't know" is one that sometimes lies. Synaptiq prioritizes trust.

Reducing "I Don't Know" Responses

If the AI is saying "I don't know" too often, the fix is almost always to add more content to your knowledge base:

Use the test interface to identify which questions are failing.
Determine what information would be needed to answer those questions.
Upload documents that contain that information, or add the information to existing documents and re-upload them.

Knowledge Base vs. General Knowledge

Synaptiq uses a strict priority hierarchy:

Your knowledge base always comes first. If your documents contain information relevant to the question, that information is used.
General knowledge fills conversational gaps. The AI uses its general language understanding for things like grammar, phrasing, and basic world knowledge (knowing that "Q1" means the first quarter of a year, for instance). It does not use general knowledge to answer product-specific questions.
When in doubt, the knowledge base wins. If your documents say your product costs $49/month but the AI's training data contains outdated pricing, your documents take priority.

This means keeping your knowledge base current is critical. The AI trusts your documents above all other sources of information. Outdated documents lead to outdated answers.

Practical Implications for Your Content

Understanding how RAG works has direct consequences for how you should write and organize your documents:

Be specific. Vague content retrieves poorly. "Our product is very affordable" will not match "how much does it cost?" nearly as well as "The Standard plan costs $29/month billed annually."
Cover topics thoroughly. If your pricing document mentions the price but not what is included at each tier, the AI cannot answer "what do I get with the Pro plan?"
Use natural language. Write the way your customers ask questions. If customers say "free trial" instead of "evaluation period," use "free trial" in your docs.
Avoid ambiguity. If "Enterprise" means different things in different documents, the AI may retrieve the wrong one. Be consistent with terminology.
Update frequently. Stale documents produce stale answers. When you change pricing, ship features, or update policies, update your knowledge base the same day.

Was this page helpful?

PreviousSupported File Formats NextTesting Queries Against Your Knowledge Base