Beyond Keywords: Introducing MindsDB Knowledge Bases for RAG and Semantic Search
Beyond Keywords: Introducing MindsDB Knowledge Bases for RAG and Semantic Search

Alejandro Cantu, Senior Product Manager at MindsDB
Apr 22, 2025


Unlocking the vast potential hidden within unstructured text data is still quite a challenge today. While enterprises often excel at managing structured information in databases, insights trapped across documents, emails, support tickets, and conversations are often inaccessible through traditional means.
Keyword searching, the common fallback, barely scratches the surface, missing nuance, context, and the actual meaning embedded in the text. How can you effectively harness this unstructured knowledge to make your enterprise data truly intelligent and responsive for AI?
The key lies in bridging the gap between your diverse data landscape and the power of modern AI. This requires tools that can connect to varied data sources, unify information across types, and enable intelligent responses based on deep understanding.
To solve this challenge, MindsDB provides a platform to build AI capabilities directly on top of your data. A cornerstone of this capability, particularly for handling unstructured text, is the MindsDB Knowledge Base (KB) feature.
Knowledge Bases are a powerful, integrated system within MindsDB designed specifically to ingest, understand, and query text data based on its semantic meaning, all through the familiar language of SQL. They are instrumental in building the next generation of AI Search, AI Analytics, and AI Agents.
This post will dive deep into:
What MindsDB Knowledge Bases are and their key integrated components.
Why they are a game-changer for working with unstructured data and accelerating AI development.
How you interact with them using simple SQL commands.
1. What are MindsDB Knowledge Bases? An Integrated Semantic Engine
Think of a MindsDB Knowledge Base not just as a database feature, but as an integrated semantic engine built directly into the MindsDB platform. It elegantly combines the necessary components to transform raw text into searchable, intelligent knowledge, abstracting away significant underlying complexity.

Here's a breakdown of the core components unified within a Knowledge Base:
a) Embedding Models: Translating Text to Meaning At the heart of understanding text lies the embedding model. This AI model reads text and converts its semantic meaning into a numerical vector embedding. Texts with similar meanings produce vectors that are mathematically close. KBs integrate this process seamlessly. You specify a model during creation (currently supporting options from OpenAI and Azure OpenAI), and the KB automatically uses its API to convert your text into meaningful vectors during ingestion and querying. This allows searches based on conceptual similarity, not just keyword matches.
SQL
-- Defining the embedding model during KB creation USING embedding_model = { "provider": "OpenAI", "model_name" : "text-embedding-3-small", "api_key": "..." -- Use API key or ENV variable }, ...
b) Vector Storage: Organizing Knowledge by Meaning These vector embeddings require specialized storage optimized for fast similarity searches. KBs handle this integration transparently:
Built-in Default (ChromaDB): For ease of setup, MindsDB uses an embedded ChromaDB instance by default if no specific storage is defined, automatically managing the necessary collections.
Connect Your Own (PGVector): If you use PostgreSQL with the PGVector extension, you can configure the KB to use a table within your existing PGVector database (connected to MindsDB via
CREATE DATABASE
).
SQL
-- Optionally specifying PGVector storage USING ..., -- embedding_model, etc. storage = my_pgvector_connection.my_kb_table, ...
You interact via SQL; MindsDB manages the communication with the chosen vector store for storing and retrieving embeddings.
c) Reranking Models (Optional): Refining Relevance with LLMs Pure vector similarity sometimes isn't enough for complex queries. KBs allow the optional use of reranking models (powerful LLMs like OpenAI's GPT-4o). After an initial search retrieves candidate text chunks, the reranker intelligently re-evaluates these chunks against the original query's intent, re-ordering them for maximum relevance. This significantly boosts search quality, especially for nuanced questions crucial for reliable AI applications.
SQL
-- Optionally adding a reranking model for higher quality results USING ..., -- embedding_model, storage, etc. reranking_model = { "provider": "OpenAI", "model_name": "gpt-4o", "api_key": "..." }, ...
d) Content, Metadata, and IDs: Blending Unstructured and Structured KBs understand that semantic meaning often needs context from structured data. You define:
content_columns:
Which columns hold the text to be embedded and searched semantically.metadata_columns:
Which columns contain structured data (dates, authors, categories) to be stored alongside the text for traditional filtering.id_column:
The column containing the unique identifier for each record.
This design allows powerful queries that filter simultaneously on semantic relevance (content) and specific metadata attributes (WHERE author = '...'
).
e) Automatic Chunking: Preparing Text for Search Embedding models have input limits, and effective retrieval often requires finding specific passages, not entire documents. KBs automatically handle chunking, breaking down long text inputs into smaller, semantically searchable units during data ingestion. This ensures searches can pinpoint the most relevant information within larger documents.
f) The SQL Interface: Command and Control The defining characteristic is that this entire integrated system is managed through standard SQL commands: CREATE KNOWLEDGE_BASE, INSERT INTO, SELECT, DELETE,
and DROP
. This provides a familiar and powerful interface to these advanced semantic capabilities, making them feel like native database operations.
2. Why Knowledge Bases? Unlocking Value and Accelerating AI
MindsDB Knowledge Bases offer compelling advantages for any organization looking to leverage its unstructured data and build AI-driven solutions more effectively. They directly support the goal of making enterprise data intelligent and responsive.
a) Unlock the Potential of Unstructured Data The vast majority of enterprise data is unstructured text, traditionally hard to analyze. Keyword search limitations mean valuable insights remain hidden. KBs solve this by enabling true semantic search. You can query based on meaning and intent, uncovering connections and information previously impossible to find, effectively activating this dormant data.
b) Dramatically Simplify RAG and AI Search Development Building applications like RAG chatbots or semantic search engines usually requires stitching together multiple complex components: embedding services, vector databases, and orchestration code. This is time-consuming and requires specialized skills. MindsDB knowledge bases streamline this process dramatically. The core retrieval logic (embedding the query, searching vectors, filtering, reranking) is encapsulated within a single SQL SELECT statement against the KB. This drastically reduces the code, dependencies, and infrastructure complexity, allowing teams to build and deploy sophisticated AI Search capabilities and the retrieval backbone for AI Agents in days, not months.
c) Unify Your Data Landscape (Connect & Unify) Instead of maintaining separate silos for structured and unstructured data analysis, KBs bring semantic text understanding into the same MindsDB environment where you access your databases and applications (via the Federated Query Engine).
Query Across Types: Use SQL to query KBs alongside traditional tables.
JOIN Semantic & Structured Insights: Combine results from semantic searches on text with structured data from your databases for richer, composite insights.
Reduced Complexity: Manage fewer tools and systems, leading to a more cohesive and efficient data strategy.
d) Democratize Access to Advanced AI Capabilities Implementing semantic AI used to require specialized ML engineers. KBs make these powerful techniques accessible via SQL, empowering a broader range of developers and analysts. If you know SQL, you can now build applications that understand natural language queries over text, fostering wider innovation. This aligns with the goal of requiring no specialized data engineering for many tasks.
e) Build Flexible and Future-Proof AI Applications The AI world changes fast. KBs provide an abstraction layer. By coding against the SQL interface, your applications are less dependent on specific underlying embedding models or vector store technologies. As MindsDB expands support, you can adapt with potentially minimal changes to your core logic.

f) Powering High-Value Use Cases The ability to easily query text by meaning enables numerous applications:
Intelligent Internal Q&A: Allow employees to ask natural language questions against company policies, documentation, or communication archives.
Enhanced Customer Support: Equip support agents and chatbots with tools to find relevant answers instantly from past tickets and knowledge articles based on semantic similarity.
Deep Market & Competitor Analysis: Semantically query news feeds, reports, and website content to understand trends and competitor strategies.
Content Discovery & Recommendation: Suggest relevant articles or products based on conceptual understanding, not just tags.
KBs provide the essential engine for these applications, making them faster and easier to implement within the MindsDB ecosystem.
3. How You Use Knowledge Bases: The SQL Path to Semantic Insights
While KBs perform complex operations internally, your interaction is purely through SQL. This section illustrates the typical workflow using SQL commands. For complete syntax details, refer to the official [MindsDB Knowledge Base Documentation]([Link to the KB Docs]).
a) Creating Your Knowledge Base Define the KB's structure, specifying models, storage, and data columns.
SQL
-- Define the KB structure using SQL CREATE KNOWLEDGE_BASE product_feedback_kb USING embedding_model = {"provider": "OpenAI","api_key": "...", "model_name": "text-embedding-ada-002"}, metadata_columns = ['product_id', 'rating', 'submission_date'], content_columns = ['review_text'];
b) Populating with Knowledge Load data via INSERT INTO ... SELECT ....
MindsDB handles embedding the content_columns
and storing the results.
SQL
INSERT INTO product_feedback_kb SELECT review_uuid as id, prod_ref as product_id, star_rating as rating, review_date as submission_date, customer_comment as review_text FROM source_customer_reviews;
c) Querying by Meaning (Semantic Search) Use SELECT
with WHERE content = '...'
for semantic queries.
SQL
-- Find reviews discussing 'battery life issues' SELECT id, chunk_id, content, metadata, distance, relevance FROM product_feedback_kb WHERE content = 'problems with how long the battery lasts' AND relevance_threshold = 0.7 -- Optional: filter by relevance LIMIT 10;
d) Combining Semantic and Metadata Filters Mix semantic conditions with standard SQL filters on metadata.
SQL
-- Find relevant 5-star reviews mentioning 'ease of use' SELECT content, metadata, relevance FROM product_feedback_kb WHERE rating = 5 AND content = 'very easy to set up and use' LIMIT 5;

e) Joining KBs with Other Tables (Unifying Insights) Combine KB results with structured data using standard JOIN
.
SQL
-- Join feedback KB with product table SELECT p.product_name, kb.content, p.star_rating FROM product_feedback_kb AS kb JOIN products_table AS p ON p.id = kb.product_id WHERE p.category = 'Mobile Devices' AND kb.content = 'positive comments about screen quality' AND kb.relevance > 0.8;
f) Managing the Knowledge Base Use standard SQL for maintenance.
SQL
-- Delete old reviews DELETE FROM product_feedback_kb WHERE submission_date < '2023-01-01'; -- Drop the entire Knowledge Base DROP KNOWLEDGE_BASE product_feedback_kb;
Through this SQL interface, MindsDB makes sophisticated semantic text processing feel like a natural extension of your existing database skills, significantly simplifying the path to building intelligent applications.
Conclusion: Activate Semantic Search on Your Data with MindsDB Knowledge Bases
Effectively leveraging the massive amount of unstructured text data within an enterprise is crucial for staying competitive and making informed decisions. MindsDB Knowledge Bases provide a powerful, integrated solution to meet this challenge, bringing advanced semantic search and RAG capabilities directly into your data environment via SQL.
By unifying embedding, vector storage, and optional reranking under a familiar interface, KBs empower organizations to:
Connect & Unify: Seamlessly integrate unstructured text knowledge with your existing structured data workflows.
Respond Intelligently: Enable precise AI Search and provide the foundation for context-aware AI Agents.
Accelerate AI Development: Significantly reduce the time and complexity required to build semantic AI applications.
Democratize Advanced AI: Make powerful text understanding techniques accessible to anyone.
MindsDB Knowledge Bases transform your text data from a passive archive into an active, intelligent asset, helping you make your enterprise data truly responsive and ready for AI.
Ready to unlock the meaning in your data?
Explore the Documentation: MindsDB Knowledge Base Docs
Get Started with MindsDB: MindsDB Installation & Setup Guide
Join the Community: MindsDB Community Slack
Start building smarter, context-aware applications today with MindsDB Knowledge Bases!
Happy Building!
Unlocking the vast potential hidden within unstructured text data is still quite a challenge today. While enterprises often excel at managing structured information in databases, insights trapped across documents, emails, support tickets, and conversations are often inaccessible through traditional means.
Keyword searching, the common fallback, barely scratches the surface, missing nuance, context, and the actual meaning embedded in the text. How can you effectively harness this unstructured knowledge to make your enterprise data truly intelligent and responsive for AI?
The key lies in bridging the gap between your diverse data landscape and the power of modern AI. This requires tools that can connect to varied data sources, unify information across types, and enable intelligent responses based on deep understanding.
To solve this challenge, MindsDB provides a platform to build AI capabilities directly on top of your data. A cornerstone of this capability, particularly for handling unstructured text, is the MindsDB Knowledge Base (KB) feature.
Knowledge Bases are a powerful, integrated system within MindsDB designed specifically to ingest, understand, and query text data based on its semantic meaning, all through the familiar language of SQL. They are instrumental in building the next generation of AI Search, AI Analytics, and AI Agents.
This post will dive deep into:
What MindsDB Knowledge Bases are and their key integrated components.
Why they are a game-changer for working with unstructured data and accelerating AI development.
How you interact with them using simple SQL commands.
1. What are MindsDB Knowledge Bases? An Integrated Semantic Engine
Think of a MindsDB Knowledge Base not just as a database feature, but as an integrated semantic engine built directly into the MindsDB platform. It elegantly combines the necessary components to transform raw text into searchable, intelligent knowledge, abstracting away significant underlying complexity.

Here's a breakdown of the core components unified within a Knowledge Base:
a) Embedding Models: Translating Text to Meaning At the heart of understanding text lies the embedding model. This AI model reads text and converts its semantic meaning into a numerical vector embedding. Texts with similar meanings produce vectors that are mathematically close. KBs integrate this process seamlessly. You specify a model during creation (currently supporting options from OpenAI and Azure OpenAI), and the KB automatically uses its API to convert your text into meaningful vectors during ingestion and querying. This allows searches based on conceptual similarity, not just keyword matches.
SQL
-- Defining the embedding model during KB creation USING embedding_model = { "provider": "OpenAI", "model_name" : "text-embedding-3-small", "api_key": "..." -- Use API key or ENV variable }, ...
b) Vector Storage: Organizing Knowledge by Meaning These vector embeddings require specialized storage optimized for fast similarity searches. KBs handle this integration transparently:
Built-in Default (ChromaDB): For ease of setup, MindsDB uses an embedded ChromaDB instance by default if no specific storage is defined, automatically managing the necessary collections.
Connect Your Own (PGVector): If you use PostgreSQL with the PGVector extension, you can configure the KB to use a table within your existing PGVector database (connected to MindsDB via
CREATE DATABASE
).
SQL
-- Optionally specifying PGVector storage USING ..., -- embedding_model, etc. storage = my_pgvector_connection.my_kb_table, ...
You interact via SQL; MindsDB manages the communication with the chosen vector store for storing and retrieving embeddings.
c) Reranking Models (Optional): Refining Relevance with LLMs Pure vector similarity sometimes isn't enough for complex queries. KBs allow the optional use of reranking models (powerful LLMs like OpenAI's GPT-4o). After an initial search retrieves candidate text chunks, the reranker intelligently re-evaluates these chunks against the original query's intent, re-ordering them for maximum relevance. This significantly boosts search quality, especially for nuanced questions crucial for reliable AI applications.
SQL
-- Optionally adding a reranking model for higher quality results USING ..., -- embedding_model, storage, etc. reranking_model = { "provider": "OpenAI", "model_name": "gpt-4o", "api_key": "..." }, ...
d) Content, Metadata, and IDs: Blending Unstructured and Structured KBs understand that semantic meaning often needs context from structured data. You define:
content_columns:
Which columns hold the text to be embedded and searched semantically.metadata_columns:
Which columns contain structured data (dates, authors, categories) to be stored alongside the text for traditional filtering.id_column:
The column containing the unique identifier for each record.
This design allows powerful queries that filter simultaneously on semantic relevance (content) and specific metadata attributes (WHERE author = '...'
).
e) Automatic Chunking: Preparing Text for Search Embedding models have input limits, and effective retrieval often requires finding specific passages, not entire documents. KBs automatically handle chunking, breaking down long text inputs into smaller, semantically searchable units during data ingestion. This ensures searches can pinpoint the most relevant information within larger documents.
f) The SQL Interface: Command and Control The defining characteristic is that this entire integrated system is managed through standard SQL commands: CREATE KNOWLEDGE_BASE, INSERT INTO, SELECT, DELETE,
and DROP
. This provides a familiar and powerful interface to these advanced semantic capabilities, making them feel like native database operations.
2. Why Knowledge Bases? Unlocking Value and Accelerating AI
MindsDB Knowledge Bases offer compelling advantages for any organization looking to leverage its unstructured data and build AI-driven solutions more effectively. They directly support the goal of making enterprise data intelligent and responsive.
a) Unlock the Potential of Unstructured Data The vast majority of enterprise data is unstructured text, traditionally hard to analyze. Keyword search limitations mean valuable insights remain hidden. KBs solve this by enabling true semantic search. You can query based on meaning and intent, uncovering connections and information previously impossible to find, effectively activating this dormant data.
b) Dramatically Simplify RAG and AI Search Development Building applications like RAG chatbots or semantic search engines usually requires stitching together multiple complex components: embedding services, vector databases, and orchestration code. This is time-consuming and requires specialized skills. MindsDB knowledge bases streamline this process dramatically. The core retrieval logic (embedding the query, searching vectors, filtering, reranking) is encapsulated within a single SQL SELECT statement against the KB. This drastically reduces the code, dependencies, and infrastructure complexity, allowing teams to build and deploy sophisticated AI Search capabilities and the retrieval backbone for AI Agents in days, not months.
c) Unify Your Data Landscape (Connect & Unify) Instead of maintaining separate silos for structured and unstructured data analysis, KBs bring semantic text understanding into the same MindsDB environment where you access your databases and applications (via the Federated Query Engine).
Query Across Types: Use SQL to query KBs alongside traditional tables.
JOIN Semantic & Structured Insights: Combine results from semantic searches on text with structured data from your databases for richer, composite insights.
Reduced Complexity: Manage fewer tools and systems, leading to a more cohesive and efficient data strategy.
d) Democratize Access to Advanced AI Capabilities Implementing semantic AI used to require specialized ML engineers. KBs make these powerful techniques accessible via SQL, empowering a broader range of developers and analysts. If you know SQL, you can now build applications that understand natural language queries over text, fostering wider innovation. This aligns with the goal of requiring no specialized data engineering for many tasks.
e) Build Flexible and Future-Proof AI Applications The AI world changes fast. KBs provide an abstraction layer. By coding against the SQL interface, your applications are less dependent on specific underlying embedding models or vector store technologies. As MindsDB expands support, you can adapt with potentially minimal changes to your core logic.

f) Powering High-Value Use Cases The ability to easily query text by meaning enables numerous applications:
Intelligent Internal Q&A: Allow employees to ask natural language questions against company policies, documentation, or communication archives.
Enhanced Customer Support: Equip support agents and chatbots with tools to find relevant answers instantly from past tickets and knowledge articles based on semantic similarity.
Deep Market & Competitor Analysis: Semantically query news feeds, reports, and website content to understand trends and competitor strategies.
Content Discovery & Recommendation: Suggest relevant articles or products based on conceptual understanding, not just tags.
KBs provide the essential engine for these applications, making them faster and easier to implement within the MindsDB ecosystem.
3. How You Use Knowledge Bases: The SQL Path to Semantic Insights
While KBs perform complex operations internally, your interaction is purely through SQL. This section illustrates the typical workflow using SQL commands. For complete syntax details, refer to the official [MindsDB Knowledge Base Documentation]([Link to the KB Docs]).
a) Creating Your Knowledge Base Define the KB's structure, specifying models, storage, and data columns.
SQL
-- Define the KB structure using SQL CREATE KNOWLEDGE_BASE product_feedback_kb USING embedding_model = {"provider": "OpenAI","api_key": "...", "model_name": "text-embedding-ada-002"}, metadata_columns = ['product_id', 'rating', 'submission_date'], content_columns = ['review_text'];
b) Populating with Knowledge Load data via INSERT INTO ... SELECT ....
MindsDB handles embedding the content_columns
and storing the results.
SQL
INSERT INTO product_feedback_kb SELECT review_uuid as id, prod_ref as product_id, star_rating as rating, review_date as submission_date, customer_comment as review_text FROM source_customer_reviews;
c) Querying by Meaning (Semantic Search) Use SELECT
with WHERE content = '...'
for semantic queries.
SQL
-- Find reviews discussing 'battery life issues' SELECT id, chunk_id, content, metadata, distance, relevance FROM product_feedback_kb WHERE content = 'problems with how long the battery lasts' AND relevance_threshold = 0.7 -- Optional: filter by relevance LIMIT 10;
d) Combining Semantic and Metadata Filters Mix semantic conditions with standard SQL filters on metadata.
SQL
-- Find relevant 5-star reviews mentioning 'ease of use' SELECT content, metadata, relevance FROM product_feedback_kb WHERE rating = 5 AND content = 'very easy to set up and use' LIMIT 5;

e) Joining KBs with Other Tables (Unifying Insights) Combine KB results with structured data using standard JOIN
.
SQL
-- Join feedback KB with product table SELECT p.product_name, kb.content, p.star_rating FROM product_feedback_kb AS kb JOIN products_table AS p ON p.id = kb.product_id WHERE p.category = 'Mobile Devices' AND kb.content = 'positive comments about screen quality' AND kb.relevance > 0.8;
f) Managing the Knowledge Base Use standard SQL for maintenance.
SQL
-- Delete old reviews DELETE FROM product_feedback_kb WHERE submission_date < '2023-01-01'; -- Drop the entire Knowledge Base DROP KNOWLEDGE_BASE product_feedback_kb;
Through this SQL interface, MindsDB makes sophisticated semantic text processing feel like a natural extension of your existing database skills, significantly simplifying the path to building intelligent applications.
Conclusion: Activate Semantic Search on Your Data with MindsDB Knowledge Bases
Effectively leveraging the massive amount of unstructured text data within an enterprise is crucial for staying competitive and making informed decisions. MindsDB Knowledge Bases provide a powerful, integrated solution to meet this challenge, bringing advanced semantic search and RAG capabilities directly into your data environment via SQL.
By unifying embedding, vector storage, and optional reranking under a familiar interface, KBs empower organizations to:
Connect & Unify: Seamlessly integrate unstructured text knowledge with your existing structured data workflows.
Respond Intelligently: Enable precise AI Search and provide the foundation for context-aware AI Agents.
Accelerate AI Development: Significantly reduce the time and complexity required to build semantic AI applications.
Democratize Advanced AI: Make powerful text understanding techniques accessible to anyone.
MindsDB Knowledge Bases transform your text data from a passive archive into an active, intelligent asset, helping you make your enterprise data truly responsive and ready for AI.
Ready to unlock the meaning in your data?
Explore the Documentation: MindsDB Knowledge Base Docs
Get Started with MindsDB: MindsDB Installation & Setup Guide
Join the Community: MindsDB Community Slack
Start building smarter, context-aware applications today with MindsDB Knowledge Bases!
Happy Building!
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.
Start Building with MindsDB Today
Power your AI strategy with the leading
AI data solution.
© 2025 All rights reserved by MindsDB.