Back

Building Trust in AI: Enterprise Knowledge Base Validation with MindsDB

Chandre Van Der Westhuizen, Community & Marketing Co-ordinator at MindsDB

Jul 10, 2025

When building AI systems that rely on private or domain-specific data, the quality of the underlying knowledge base (KB) directly determines the usefulness of responses. A well-structured, comprehensive KB ensures that AI agents provide relevant, grounded answers. But how do you know your KB is actually providing the right information?

With MindsDB’s EVALUATE KNOWLEDGE BASE feature, you can now systematically measure the performance of your knowledge base against a set of predefined questions and answers—all using simple SQL. This command measures how correct and relevant is the data returned by the knowledge base to the search query. Learn more about knowledge bases here.

Why Evaluate Your Knowledge Base?

Before deploying an AI agent that retrieves information from a knowledge base, you need to ensure:

The data is complete and up-to-date
Information chunks contain enough context to answer real-world questions
Queries return expected, relevant answers
Your chunking strategy (paragraphs, sections, etc.) supports accurate retrieval. In MindsDB you can optimize the storage and search of data by chunking data.

Evaluating your knowledge base lets you test and iterate on the quality of your knowledge source itself—independent of the language model being used. However, note that using different embedding and reranking models with knowledge bases can have an impact on the quality of search results.

What It Does

This SQL command, when used with version doc_id, compares the retrieved content with expected answers. It then returns a boolean flag to indicate correctness (either 1 or 0). With the llm_relevancy version, it uses an LLM to evaluate accuracy of data returned from KB and how it fits the users' query.

This helps you identify:

Gaps in data coverage: Instances where the knowledge base lacks the necessary information to answer queries.
Poorly chunked content: Ineffective content segmentation leading to context-lacking responses.
Redundant or conflicting entries: Contradictory information retrieved from different chunks.
Mismatches between queries and indexed information: Failures in surfacing relevant results despite adequate coverage.

It's like a unit test suite—but for your AI assistant.

Showing the flow of how to Evaluate your Knowledge Base in MindsDB

Where to Apply `EVALUATE KNOWLEDGE BASE` in Real-World Scenarios

Let’s explore use cases, the application of evaluating your knowledge base and the impact it can make.

Financial Industry

Use Case: Regulatory compliance assistants, internal policy bots, and investor-facing tools.

Application of EVALUATE KNOWLEDGE BASE:

Validate that your knowledge base, built from internal policies, compliance documentation, or regulatory frameworks (e.g., MiFID II, Basel III), accurately returns the correct policy or regulation when queried.
Run predefined compliance questions (e.g., “What are the KYC requirements for corporate accounts?”) against expected outputs.
Ensure assistants don’t retrieve outdated or conflicting rules by testing how well the KB reflects real-time regulatory updates.

Impact: Reduces risk of misinformation, builds trust with legal and compliance teams, and accelerates onboarding for new financial advisors or analysts.

Energy & Utilities

Use Case: Field operations support, equipment troubleshooting assistants, and ESG reporting systems.

Application of EVALUATE KNOWLEDGE BASE:

Evaluate how well the KB returns data chunks that contain relevant information to the search queries sourced from maintenance manuals, safety protocols, or engineering standards.
Use real-world questions like “What’s the lockout procedure for turbine model X?” or “How do we report methane emissions under GHG Protocol?” to test coverage and accuracy.
Improve agent performance for frontline engineers by tuning chunk size, format, or document selection based on evaluation feedback.

Impact: Increases reliability of AI tools in high-stakes environments, helps maintain regulatory compliance, and supports field teams with fast, accurate answers.

Enterprise Software Vendors

Use Case: AI-powered support bots, internal documentation assistants, and onboarding tools for customers or partners.

Application of EVALUATE KNOWLEDGE BASE:

Evaluate documentation-based KBs (e.g., API docs, release notes, product wikis) to ensure they correctly resolve developer queries such as “How do I authenticate with the v2 API?” or “What’s the default rate limit for org accounts?”
QA your KB to catch issues in technical documentation structure, missing information, or inconsistent formatting before release.
Validate multilingual KBs by testing localized question-answer pairs to support global customers.

Impact: Delivers more accurate AI support, reduces ticket deflection errors, improves customer onboarding, and ensures your docs are AI-ready.

This feature unlocks powerful workflows across more(but not limited to) industries:

Healthcare: Hospitals can validate that their clinical assistant answers patient care questions based on actual guidelines—not generative guesswork.
Financial & Legal Compliance: Banks and legal teams can ensure assistants only return responses grounded in official regulatory policies and documentation.
Customer Support: Support teams can test their LLMs on historical tickets to confirm the assistant mirrors how human agents resolve real customer issues.
EdTech & Tutoring Educators can benchmark tutoring assistants to make sure they provide accurate, educationally aligned answers to student questions.
Developer Tools: Engineering teams can feed in documentation + codebases and evaluate whether the assistant can correctly explain key components, APIs, or configurations.

Example Use Case: Sales Interaction Summaries

Suppose your enterprise sales team needs an AI agent capable of querying and retrieving information from sales call summaries. But before building the AI agent, the developers need to ensure that the call summary data is accurate, complete, and capable of answering real-world queries—so the agent can deliver reliable and relevant responses.

MindsDB allows you to seamlessly build a Knowledge Base and evaluate its quality before deployment. You’ll get a result set with:

Actual data retrieved from the KB
Similarity score (e.g. cosine similarity)
Boolean flag: is the answer close enough to be considered correct?

You can access MindsDB by installing it locally via Docker, MindsDB’s Extension on Docker Desktop or AWS Market Place. MindsDB uses SQL constructs modified with AI-specific syntax, so please familiarize yourself with it by checking the docs. Before creating the knowledge base, make sure you have created a database connection between your data source and MindsDB.

Let’s start with building a Knowledge Base that you will use for your agent.

First, create a database connection between your datasource and MindsDB using the CREATE DATABASE statement:

Below is our sample data that we will use in this example.

CREATE DATABASE sales_manager_data
WITH ENGINE = 'postgres',
PARAMETERS = {
 "user": "demo_user",
 "password": "demo_password",
 "host": "samples.mindsdb.com",
 "port": "5432",
 "database": "sales_manager_data"
};

Now the Knowledge Base can be created using the CREATE KNOWLEDGE_BASE statement:

CREATE KNOWLEDGE_BASE sales_kb
USING
   embedding_model = {
       "provider": "openai",
       "model_name" : "text-embedding-3-large",
       "api_key": "sk-xxxxx"
   },
   reranking_model = {
       "provider": "openai",
       "model_name": "gpt-4o",
       "api_key": "sk-xxxxxx"
   },
   metadata_columns = ['created_at', 'company'],
   content_columns = ['call_summary', 'key_points', 'next_steps'],
   id_column = 'id';

We provide the Knowledge base with a name, an embedding model, a reranking model and columns names for where data will be inserted. Learn more about the syntax for creating knowledge bases here.

The below explores the various parameters in the Knowledge Base syntax:

embedding model - The embedding model is a required component of the knowledge base. It stores specifications of the embedding model to be used.
reranking model - The reranking model is an optional component of the knowledge base. It stores specifications of the reranking model to be used.
metadata_columns - The data inserted into the knowledge base can be classified as metadata, which enables users to filter the search results using defined data fields.
content_columns - The data inserted into the knowledge base can be classified as content, which is embedded by the embedding model and stored in the underlying vector store.
id_column - The ID column uniquely identifies each source data row in the knowledge base. It is an optional parameter. If provided, this parameter is a string that contains the source data ID column name. If not provided, it is generated from the hash of the content columns.

Once completed you can run the DESCRIBE KNOWLEDGE BASE syntax to ensure that the KB was successfully created.

Data from the previously connected Postgres database can now be inserted into the Knowledge Base using the INSERT INTO syntax. In a similar way, you can insert data from files, web pages, etc.

INSERT INTO sales_kb
   SELECT id, created_at, company, call_summary, key_points, next_steps
   FROM sales_manager_data.call_summaries;

To make sure the data was successfully inserted, you can run the SELECT syntax to check the Knowledge Base.

SELECT * FROM sales_kb;

Now let’s evaluate the relevancy and accuracy of the data returned by the Knowledge Base using the EVALUATE KNOWLEDGE BASE syntax. The version doc_id will be used first to verify that the document ID returned by the knowledge base matches the expected document ID specified in the test table:

EVALUATE KNOWLEDGE_BASE sales_kb
USING
   test_table = files.my_test_table,
   version = 'doc_id',
   generate_data = {
       'from_sql': 'SELECT id, created_at || \', \' || company || \', \' || call_summaries || \', \' || key_points || \' , \' || next_steps AS  content FROM sales_manager_data.call_summaries',
       'count': 5
   },
   evaluate = true,
   save_to = files.my_result_table;

This command can do two things:

Generate test data and save it into the test table.
Evaluate a knowledge base using the provided (or generated) test data.
Or both at once.

Let’s explore the parameters:

test_table: contains test data, typically in the form of question-and-answer pairs, with its content determined by the specified version parameter.
version: The doc_id is provided to the version parameter. This version of evaluations checks if the document ID retrieved from the knowledge base aligns with the expected document ID defined in the test table.
generate_data : can be set to true so that data from Knowledge Base can be used by default.
- from_sql: is used to generate test data that gets stored in our test_table, we select data from our original data source and concatenate the columns to be specified as content that will help generate data.
- count: defines the size of the test dataset which is set to 5 in our query.
evaluate : This parameter is set to true to define that the Knowledge Base needs to be evaluated.
save_to: to define the table where we would like to store the results.

Here are the results:

Showcasing MindsDB's Evaluate Knowledge Base statement with Results

Now we can select the data generated in the test table to view it:

SELECT * FROM files.my_test_table;

You will see that 5 questions have been generated with 5 answers and the corresponding doc_id.

Showing test data generated for evaluating the MindsDB Knowledge Base

Now let’s check the results of the evaluation:

SELECT * FROM files.my_result_table;

Showcasing the results of evaluating Knowledge Bases with doc_1 version

Let’s look at the parameters:

total: stores the total number of questions asked.
total_found: records the number of questions for which the knowledge base returned correct answers.
retrieved_in_top_10: records the number of correct answers provided by the knowledge base within the top 10 questions.
cumulative_recall: stores data that can be used for a chart.
average_query_time: stores the execution time of a search query of the knowledge base.
name: stores the name of Knowledge Base.
created_at column: displays when the KB was created.

A total of 5 questions have been generated and evaluated of which 5 correct answers have been found for it. These answers were in the top of correct answers provided by the Knowledge Base. Based on this information we can determine that the Knowledge Base provides accurate data and can be trusted to be used for an agent.

You can now evaluate the llm_relevancy of the knowledge base.

EVALUATE KNOWLEDGE_BASE sales_kb
USING
   test_table = files.my_test_table2,
   version = 'llm_relevancy',
   generate_data = {
       'from_sql': 'SELECT id, created_at || \', \' || company || \', \' || call_summaries || \', \' || key_points || \' , \' || next_steps AS  content FROM sales_manager_data.call_summaries',
       'count': 5
   },
   evaluate = true,
   save_to = files.my_result_table2;

Here the version has been set to llm_relevancy. This evaluator leverages a language model to assess and rank responses retrieved from the knowledge base. Note that it uses the reranking model of the knowledge base by default (if provided). It can be also defined in the EVALUATE command using the llm parameter.

Here are the results:

Showcasing evaluation a knowledge base with llm relevancy

Let’s query the test table where the test data is stored in:

 SELECT * FROM files.my_test_table2;

We can see the questions asked and answers generated based on the text in the Knowledge Base:

Showcasing test data generated from evaluating Knowledge Base with LLM relevancy

Now let’s query the results table:

SELECT * FROM files.my_result_table2;

Showcasing the results of evaluating a Knowledge Base's llm relevancy

The evaluation output includes several key metrics for assessing knowledge base performance:

avg_relevancy: stores the average relevancy of retrieved responses.
avg_relevance_score_by_k: captures the average relevancy at a specified rank k.
avg_first_relevant_position: indicates the average position of the first relevant result.
mean_mrr: represents the Mean Reciprocal Rank (MRR), a common metric for evaluating ranked retrieval.
hit_at_k: measures whether a relevant result appears within the top k results.
bin_precision_at_k: records the Binary Precision.
avg_entropy: reflects the average entropy of relevance scores, providing insight into response consistency.
avg_ndcg: captures the average normalized Discounted Cumulative Gain (nDCG), which assesses ranking quality.
avg_query_time: logs the average execution time for knowledge base queries.
name : identifies the knowledge base
created_at: stores the timestamp when the evaluation was generated.

Verifying sales call data before building an AI agent ensures faster access to accurate insights, improves customer engagement, shortens sales cycles, and supports consistent, compliant communication.

The Value of Knowledge Base Evaluation

This evaluation provides direct feedback on how useful your data is, not how well the model interprets it.

It helps you:

Validate the integrity of your KB before launch
Catch data coverage issues early
Iterate on formatting, chunking, or document selection
Build confidence that your assistant is grounded in the right knowledge

You can also integrate this evaluation into your CI/CD workflow. For example:

Run evaluations automatically when the KB is rebuilt
Require a minimum average similarity score before allowing deployment
Monitor KB accuracy over time as source documents change

If you want to explore these topics in our future tutorials, feel free to subscribe to our blog.

Try It Now

You can get started with just two SQL tables—your knowledge base and your test queries.

Explore the full syntax here: MindsDB Docs: Evaluate Knowledge Base.

By validating the quality of your data before deploying AI agents, EVALUATE KNOWLEDGE BASE helps ensure the responses are grounded, trustworthy, and backed by accurate information—not just model guesswork.

Because good answers start with good data.

Why Evaluate Your Knowledge Base?

Before deploying an AI agent that retrieves information from a knowledge base, you need to ensure:

The data is complete and up-to-date
Information chunks contain enough context to answer real-world questions
Queries return expected, relevant answers
Your chunking strategy (paragraphs, sections, etc.) supports accurate retrieval. In MindsDB you can optimize the storage and search of data by chunking data.

What It Does

This helps you identify:

Gaps in data coverage: Instances where the knowledge base lacks the necessary information to answer queries.
Poorly chunked content: Ineffective content segmentation leading to context-lacking responses.
Redundant or conflicting entries: Contradictory information retrieved from different chunks.
Mismatches between queries and indexed information: Failures in surfacing relevant results despite adequate coverage.

It's like a unit test suite—but for your AI assistant.

Where to Apply `EVALUATE KNOWLEDGE BASE` in Real-World Scenarios

Let’s explore use cases, the application of evaluating your knowledge base and the impact it can make.

Financial Industry

Use Case: Regulatory compliance assistants, internal policy bots, and investor-facing tools.

Application of EVALUATE KNOWLEDGE BASE:

Validate that your knowledge base, built from internal policies, compliance documentation, or regulatory frameworks (e.g., MiFID II, Basel III), accurately returns the correct policy or regulation when queried.
Run predefined compliance questions (e.g., “What are the KYC requirements for corporate accounts?”) against expected outputs.
Ensure assistants don’t retrieve outdated or conflicting rules by testing how well the KB reflects real-time regulatory updates.

Impact: Reduces risk of misinformation, builds trust with legal and compliance teams, and accelerates onboarding for new financial advisors or analysts.

Energy & Utilities

Use Case: Field operations support, equipment troubleshooting assistants, and ESG reporting systems.

Application of EVALUATE KNOWLEDGE BASE:

Evaluate how well the KB returns data chunks that contain relevant information to the search queries sourced from maintenance manuals, safety protocols, or engineering standards.
Use real-world questions like “What’s the lockout procedure for turbine model X?” or “How do we report methane emissions under GHG Protocol?” to test coverage and accuracy.
Improve agent performance for frontline engineers by tuning chunk size, format, or document selection based on evaluation feedback.

Impact: Increases reliability of AI tools in high-stakes environments, helps maintain regulatory compliance, and supports field teams with fast, accurate answers.

Enterprise Software Vendors

Use Case: AI-powered support bots, internal documentation assistants, and onboarding tools for customers or partners.

Application of EVALUATE KNOWLEDGE BASE:

Evaluate documentation-based KBs (e.g., API docs, release notes, product wikis) to ensure they correctly resolve developer queries such as “How do I authenticate with the v2 API?” or “What’s the default rate limit for org accounts?”
QA your KB to catch issues in technical documentation structure, missing information, or inconsistent formatting before release.
Validate multilingual KBs by testing localized question-answer pairs to support global customers.

Impact: Delivers more accurate AI support, reduces ticket deflection errors, improves customer onboarding, and ensures your docs are AI-ready.

This feature unlocks powerful workflows across more(but not limited to) industries:

Healthcare: Hospitals can validate that their clinical assistant answers patient care questions based on actual guidelines—not generative guesswork.
Financial & Legal Compliance: Banks and legal teams can ensure assistants only return responses grounded in official regulatory policies and documentation.
Customer Support: Support teams can test their LLMs on historical tickets to confirm the assistant mirrors how human agents resolve real customer issues.
EdTech & Tutoring Educators can benchmark tutoring assistants to make sure they provide accurate, educationally aligned answers to student questions.
Developer Tools: Engineering teams can feed in documentation + codebases and evaluate whether the assistant can correctly explain key components, APIs, or configurations.

Example Use Case: Sales Interaction Summaries

MindsDB allows you to seamlessly build a Knowledge Base and evaluate its quality before deployment. You’ll get a result set with:

Actual data retrieved from the KB
Similarity score (e.g. cosine similarity)
Boolean flag: is the answer close enough to be considered correct?

Let’s start with building a Knowledge Base that you will use for your agent.

First, create a database connection between your datasource and MindsDB using the CREATE DATABASE statement:

Below is our sample data that we will use in this example.

CREATE DATABASE sales_manager_data
WITH ENGINE = 'postgres',
PARAMETERS = {
 "user": "demo_user",
 "password": "demo_password",
 "host": "samples.mindsdb.com",
 "port": "5432",
 "database": "sales_manager_data"
};

Now the Knowledge Base can be created using the CREATE KNOWLEDGE_BASE statement:

CREATE KNOWLEDGE_BASE sales_kb
USING
   embedding_model = {
       "provider": "openai",
       "model_name" : "text-embedding-3-large",
       "api_key": "sk-xxxxx"
   },
   reranking_model = {
       "provider": "openai",
       "model_name": "gpt-4o",
       "api_key": "sk-xxxxxx"
   },
   metadata_columns = ['created_at', 'company'],
   content_columns = ['call_summary', 'key_points', 'next_steps'],
   id_column = 'id';

We provide the Knowledge base with a name, an embedding model, a reranking model and columns names for where data will be inserted. Learn more about the syntax for creating knowledge bases here.

The below explores the various parameters in the Knowledge Base syntax:

embedding model - The embedding model is a required component of the knowledge base. It stores specifications of the embedding model to be used.
reranking model - The reranking model is an optional component of the knowledge base. It stores specifications of the reranking model to be used.
metadata_columns - The data inserted into the knowledge base can be classified as metadata, which enables users to filter the search results using defined data fields.
content_columns - The data inserted into the knowledge base can be classified as content, which is embedded by the embedding model and stored in the underlying vector store.
id_column - The ID column uniquely identifies each source data row in the knowledge base. It is an optional parameter. If provided, this parameter is a string that contains the source data ID column name. If not provided, it is generated from the hash of the content columns.

Once completed you can run the DESCRIBE KNOWLEDGE BASE syntax to ensure that the KB was successfully created.

Data from the previously connected Postgres database can now be inserted into the Knowledge Base using the INSERT INTO syntax. In a similar way, you can insert data from files, web pages, etc.

INSERT INTO sales_kb
   SELECT id, created_at, company, call_summary, key_points, next_steps
   FROM sales_manager_data.call_summaries;

To make sure the data was successfully inserted, you can run the SELECT syntax to check the Knowledge Base.

SELECT * FROM sales_kb;

EVALUATE KNOWLEDGE_BASE sales_kb
USING
   test_table = files.my_test_table,
   version = 'doc_id',
   generate_data = {
       'from_sql': 'SELECT id, created_at || \', \' || company || \', \' || call_summaries || \', \' || key_points || \' , \' || next_steps AS  content FROM sales_manager_data.call_summaries',
       'count': 5
   },
   evaluate = true,
   save_to = files.my_result_table;

This command can do two things:

Generate test data and save it into the test table.
Evaluate a knowledge base using the provided (or generated) test data.
Or both at once.

Let’s explore the parameters:

test_table: contains test data, typically in the form of question-and-answer pairs, with its content determined by the specified version parameter.
version: The doc_id is provided to the version parameter. This version of evaluations checks if the document ID retrieved from the knowledge base aligns with the expected document ID defined in the test table.
generate_data : can be set to true so that data from Knowledge Base can be used by default.
- from_sql: is used to generate test data that gets stored in our test_table, we select data from our original data source and concatenate the columns to be specified as content that will help generate data.
- count: defines the size of the test dataset which is set to 5 in our query.
evaluate : This parameter is set to true to define that the Knowledge Base needs to be evaluated.
save_to: to define the table where we would like to store the results.

Here are the results:

Now we can select the data generated in the test table to view it:

SELECT * FROM files.my_test_table;

You will see that 5 questions have been generated with 5 answers and the corresponding doc_id.

Now let’s check the results of the evaluation:

SELECT * FROM files.my_result_table;

Let’s look at the parameters:

total: stores the total number of questions asked.
total_found: records the number of questions for which the knowledge base returned correct answers.
retrieved_in_top_10: records the number of correct answers provided by the knowledge base within the top 10 questions.
cumulative_recall: stores data that can be used for a chart.
average_query_time: stores the execution time of a search query of the knowledge base.
name: stores the name of Knowledge Base.
created_at column: displays when the KB was created.

You can now evaluate the llm_relevancy of the knowledge base.

EVALUATE KNOWLEDGE_BASE sales_kb
USING
   test_table = files.my_test_table2,
   version = 'llm_relevancy',
   generate_data = {
       'from_sql': 'SELECT id, created_at || \', \' || company || \', \' || call_summaries || \', \' || key_points || \' , \' || next_steps AS  content FROM sales_manager_data.call_summaries',
       'count': 5
   },
   evaluate = true,
   save_to = files.my_result_table2;

Here are the results:

Let’s query the test table where the test data is stored in:

 SELECT * FROM files.my_test_table2;

We can see the questions asked and answers generated based on the text in the Knowledge Base:

Now let’s query the results table:

SELECT * FROM files.my_result_table2;

The evaluation output includes several key metrics for assessing knowledge base performance:

avg_relevancy: stores the average relevancy of retrieved responses.
avg_relevance_score_by_k: captures the average relevancy at a specified rank k.
avg_first_relevant_position: indicates the average position of the first relevant result.
mean_mrr: represents the Mean Reciprocal Rank (MRR), a common metric for evaluating ranked retrieval.
hit_at_k: measures whether a relevant result appears within the top k results.
bin_precision_at_k: records the Binary Precision.
avg_entropy: reflects the average entropy of relevance scores, providing insight into response consistency.
avg_ndcg: captures the average normalized Discounted Cumulative Gain (nDCG), which assesses ranking quality.
avg_query_time: logs the average execution time for knowledge base queries.
name : identifies the knowledge base
created_at: stores the timestamp when the evaluation was generated.