MindsDB Knowledge Bases: Revolutionizing AI-Powered Data Queries with SQL Algebra
MindsDB Knowledge Bases: Revolutionizing AI-Powered Data Queries with SQL Algebra

Alejandro Cantu, Senior Product Manager at MindsDB
Sep 25, 2025


MindsDB has unleashed a game-changing feature that bridges the gap between traditional SQL databases and modern AI-powered semantic search. The new Knowledge Base Algebra system transforms how we query unstructured data, making it as intuitive as working with traditional databases while leveraging the power of vector embeddings and LLMs.
What Makes MindsDB Knowledge Bases Special?
Imagine this: you’re sitting on mountains of data—support tickets, research papers, contracts, product reviews. Some of it is neat and structured (tables, ratings, categories). But most of it? Messy paragraphs, sprawling PDFs, emails, or notes.
You know the insights are there… but how do you ask the right questions and actually get meaningful answers?
The Struggle with Today’s Tools
Databases are powerful for structured data, but they don’t understand natural language.
Vector databases capture “semantic meaning,” but they can’t filter with precision.
LLMs are smart, but without guardrails they guess, hallucinate, or miss key context.
That leaves you with two bad options: either wrangle SQL for rigid results or trust an AI that can’t be trusted.
Sound familiar?
Traditional databases excel at structured data queries, while vector databases handle semantic similarity search. MindsDB Knowledge Bases combine both worlds, allowing you to run complex SQL queries that seamlessly blend:
Semantic content search using vector embeddings
Traditional metadata filtering using exact matches and ranges
Complex boolean logic with AND, OR, and NOT operations
All of this in one query, one system, no extra infrastructure.

Real-World Example: 186,000 Movie Database
Let's explore these capabilities using a real dataset of 186,000 movies, where each record contains:
id | content | metadata |
---|---|---|
movie_id | A bank security expert plots with a call girl to rob three safety deposit boxes... | {"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"} |
Normally, searching it for something nuanced would be impossible without hours of filtering. With MindsDB Knowledge Bases and its SQL Algebra, you can seamlessly query this data.
Start with accessing MindsDB, you can do so by installing it locally via Docker, MindsDB’s Extension on Docker Desktop.
The next step would be to connect your database with the CREATE DATABASE statement.
CREATE DATABASE heroku WITH ENGINE = 'pgvector', PARAMETERS = { "host": "host.xxxxx", "port": 5432, "database": "db", "user": "user", "password": "xxxx" };
Once your data is connected, the Knowledge Base can be created with the CREATE KNOWLEDGE BASE statement.
CREATE KNOWLEDGE_BASE movies_kb USING embedding_model = { 'provider': 'openai', 'model_name' : 'text-embedding-3-small', 'api_key': 'sk-xxx' }, reranking_model = { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-xxx" }, storage=heroku.movies_kb, metadata_columns = ['genre', 'expanded_genres', 'rating'], content_columns = ['content'], id_column = 'movie_id';
The data in the connected database can now be inserted into your Knowledge Base using the INSERT INTO statement.
INSERT INTO movies_kb SELECT movie_id, genre, expanded_genres, rating, content FROM heroku.movies_kb;
You can select the Knowledge Base to see if the data was successfully embedded.
SELECT * FROM movies_kb

Now that you have created a Knowledge Base, you can now explore how to query it with SQL Algebra.
Revolutionary Query Capabilities
There are various ways that we can query our Knowledge Base based on the type of information you need. Let’s look at how simple SQL Algebra queries can obtain information we require from our Knowledge Base.
1. Hybrid Content + Metadata Search
You can combine semantic search of unstructured content with precise metadata filters (like dates, tags, or categories) to deliver more accurate, context-rich results. Let’s find movies that match semantic content AND specific criteria:
-- High-rated action movies with heist themes (no romance) SELECT * FROM movies_kb WHERE content LIKE 'heist bank robbery space alien planet' AND genre != 'Romance' AND expanded_genres NOT LIKE '%Romance%' AND rating > 7.0;
This query demonstrates the power of hybrid search:
content LIKE
performs semantic similarity searchgenre !=
andNOT LIKE
provide precise exclusionsrating >
filters by numerical metadata

2. Complex Boolean Logic with Content
This enables you to use operators like AND on semantic search results, giving you fine-grained control to combine or exclude specific concepts in your queries:
-- Action-comedies with car chase scenes SELECT * FROM movies_kb WHERE content LIKE 'car chase driving speed race' AND expanded_genres LIKE '%Action%' AND expanded_genres LIKE '%Comedy%' AND rating > 6.5;

3. Advanced Pattern Exclusion
You can filter out rows matching specific string patterns, using NOT LIKE
, so your queries return only the data that truly fits your criteria.
-- Historical dramas without war themes SELECT * FROM movies_kb WHERE content LIKE 'historical period past century era' AND expanded_genres LIKE '%Drama%' AND content NOT LIKE 'war battle soldier military' AND content NOT LIKE 'fight combat weapon' AND rating > 3.5;
4. Multi-Condition Content Filtering
This allows you to apply multiple conditions with AND
/OR
across columns or values, so you can narrow results to only those records that meet several content-based criteria at once.
-- Positive-themed movies only SELECT * FROM movies_kb WHERE content LIKE 'hope inspiration success achievement dreams' AND content NOT LIKE 'death murder crime violence' AND content NOT LIKE 'sad tragedy loss depression' AND rating > 7.0;

Advanced SQL Operations
Let's look at more advanced SQL Operations.
UNION for Thematic Combinations
You can merge results from multiple queries into a single set, letting you combine different thematic datasets or search conditions into one unified output.
-- Combine different thriller subgenres SELECT * FROM movies_kb WHERE content LIKE 'detective investigation mystery' AND rating > 7.0 UNION SELECT * FROM movies_kb WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0 UNION SELECT * FROM movies_kb WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;

Complex Nested Logic
What if you want to build precise, hierarchical query filters.? You can group and layer conditions with parentheses—mixing AND
, OR
, and NOT
—to do so.
-- Multi-genre requirements with rating tiers SELECT * FROM movies_kb WHERE (content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%')) OR (content LIKE 'romance love relationship' AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%')) AND rating > 7.0;

IN/NOT IN Operations
This lets you quickly filter rows by checking whether a value matches (or excludes) any item in a specified list or subquery.
-- Adventure movies excluding family-friendly genres SELECT * FROM movies_kb WHERE content LIKE 'adventure journey quest treasure' AND genre NOT IN ('Horror', 'Romance', 'Family') AND rating > 6.5;

BETWEEN and Range Queries
You can filter results by selecting values that fall within a specified range, such as dates, numbers, or text intervals.
-- Comedy movies in specific rating range SELECT * FROM movies_kb WHERE content LIKE 'comedy funny humor laugh' AND rating BETWEEN 7.0 AND 9.0 AND expanded_genres LIKE '%Comedy%';

Technical Architecture: How It Works
MindsDB's KB algebra system operates through a sophisticated three-step process:
Step 1: Query Optimization
Converts complex NOT operations into efficient positive searches
Flattens nested condition trees for optimal execution
Separates content conditions from metadata filters
Step 2: SQL Algebra Execution
Content conditions: Processed using vector similarity search with configurable relevance thresholds
Metadata conditions: Handled by traditional database operations
Negative conditions: Converted to exclusion lists using sample-based filtering
Step 3: Post-Processing
Results combined using UNION (OR operations) or INTERSECT (AND operations)
Advanced SQL features (GROUP BY, ORDER BY) \
Type casting and data normalization handled automatically

Advanced Features
MindsDB's SQL Algebra has advanced features that allows for more in-depth querying.
Relevance Control
Rank and prioritize query results by relevance scores, ensuring the most contextually meaningful records appear first.
-- Fine-tune semantic matching precision SELECT * FROM movies_kb WHERE content LIKE 'complex philosophical deep meaning' AND relevance >= 0.6 -- Higher threshold = stricter matching AND reranking = false AND rating > 7.5;

NULL Handling
You can manage missing or undefined values using operators like IS NULL
and IS NOT NULL
, ensuring queries return accurate and predictable results.
-- Robust null value processing SELECT * FROM movies_kb WHERE content LIKE 'drama story character' AND rating IS NOT NULL AND genre IS NOT NULL;

Era-Based Analysis
Group and filter data by defined time periods (such as decades, quarters, or eras), making it easier to spot trends and patterns over time:
-- Exclude specific decades SELECT * FROM movies_kb WHERE content LIKE 'love story romance heart' AND genre = 'Romance' AND _original_doc_id NOT LIKE '% - 195%' -- Not 1950s AND _original_doc_id NOT LIKE '% - 196%' -- Not 1960s AND rating > 7.5;

Performance and Scalability
The system handles our 186,000 movie dataset with impressive performance:
Complex queries with multiple content conditions execute in seconds
Relevance filtering prevents irrelevant results from cluttering responses
Configurable reranking balances speed vs. accuracy
Vector database optimizations ensure scalability to millions of records
Real-World Applications
From personalized content discovery to faster legal research and sharper e-commerce search, Knowledge Bases with SQL Algebra unlock practical, everyday use cases that deliver measurable value. Here are a few examples:
Content Discovery: Power recommendation engines that deliver 30% more relevant results, e.g., “Find action movies like John Wick but without excessive violence” or “Sci-fi with romantic elements.”
Document Management: Cut research time in half by filtering precisely, e.g., “Find contracts mentioning IP but not patents” or “AI papers excluding deep learning focus.”
E-commerce Search: Boost product findability by 25% with nuanced queries, e.g., “Comfortable running shoes but not for marathons” or “Kitchen appliances under $200 but not blenders.”
Customer Support: Reduce ticket resolution time by 40% by targeting issues, e.g., “Product complaints about quality but not shipping” or “Feature requests for mobile but not related to payments.”
The Future of Data Querying
MindsDB's KB algebra represents a paradigm shift in how we interact with unstructured data. By making semantic search as accessible as traditional SQL, it democratizes AI-powered data exploration. Whether you're building recommendation systems, content discovery platforms, or intelligent search applications, MindsDB Knowledge Bases provide the perfect foundation.
The combination of natural language content search with precise metadata filtering opens up possibilities that were previously complex to implement and expensive to scale. With MindsDB, the future of intelligent data querying is here today. Contact our team if you'd like to see MindsDB in action.
MindsDB has unleashed a game-changing feature that bridges the gap between traditional SQL databases and modern AI-powered semantic search. The new Knowledge Base Algebra system transforms how we query unstructured data, making it as intuitive as working with traditional databases while leveraging the power of vector embeddings and LLMs.
What Makes MindsDB Knowledge Bases Special?
Imagine this: you’re sitting on mountains of data—support tickets, research papers, contracts, product reviews. Some of it is neat and structured (tables, ratings, categories). But most of it? Messy paragraphs, sprawling PDFs, emails, or notes.
You know the insights are there… but how do you ask the right questions and actually get meaningful answers?
The Struggle with Today’s Tools
Databases are powerful for structured data, but they don’t understand natural language.
Vector databases capture “semantic meaning,” but they can’t filter with precision.
LLMs are smart, but without guardrails they guess, hallucinate, or miss key context.
That leaves you with two bad options: either wrangle SQL for rigid results or trust an AI that can’t be trusted.
Sound familiar?
Traditional databases excel at structured data queries, while vector databases handle semantic similarity search. MindsDB Knowledge Bases combine both worlds, allowing you to run complex SQL queries that seamlessly blend:
Semantic content search using vector embeddings
Traditional metadata filtering using exact matches and ranges
Complex boolean logic with AND, OR, and NOT operations
All of this in one query, one system, no extra infrastructure.

Real-World Example: 186,000 Movie Database
Let's explore these capabilities using a real dataset of 186,000 movies, where each record contains:
id | content | metadata |
---|---|---|
movie_id | A bank security expert plots with a call girl to rob three safety deposit boxes... | {"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"} |
Normally, searching it for something nuanced would be impossible without hours of filtering. With MindsDB Knowledge Bases and its SQL Algebra, you can seamlessly query this data.
Start with accessing MindsDB, you can do so by installing it locally via Docker, MindsDB’s Extension on Docker Desktop.
The next step would be to connect your database with the CREATE DATABASE statement.
CREATE DATABASE heroku WITH ENGINE = 'pgvector', PARAMETERS = { "host": "host.xxxxx", "port": 5432, "database": "db", "user": "user", "password": "xxxx" };
Once your data is connected, the Knowledge Base can be created with the CREATE KNOWLEDGE BASE statement.
CREATE KNOWLEDGE_BASE movies_kb USING embedding_model = { 'provider': 'openai', 'model_name' : 'text-embedding-3-small', 'api_key': 'sk-xxx' }, reranking_model = { "provider": "openai", "model_name": "gpt-4o", "api_key": "sk-xxx" }, storage=heroku.movies_kb, metadata_columns = ['genre', 'expanded_genres', 'rating'], content_columns = ['content'], id_column = 'movie_id';
The data in the connected database can now be inserted into your Knowledge Base using the INSERT INTO statement.
INSERT INTO movies_kb SELECT movie_id, genre, expanded_genres, rating, content FROM heroku.movies_kb;
You can select the Knowledge Base to see if the data was successfully embedded.
SELECT * FROM movies_kb

Now that you have created a Knowledge Base, you can now explore how to query it with SQL Algebra.
Revolutionary Query Capabilities
There are various ways that we can query our Knowledge Base based on the type of information you need. Let’s look at how simple SQL Algebra queries can obtain information we require from our Knowledge Base.
1. Hybrid Content + Metadata Search
You can combine semantic search of unstructured content with precise metadata filters (like dates, tags, or categories) to deliver more accurate, context-rich results. Let’s find movies that match semantic content AND specific criteria:
-- High-rated action movies with heist themes (no romance) SELECT * FROM movies_kb WHERE content LIKE 'heist bank robbery space alien planet' AND genre != 'Romance' AND expanded_genres NOT LIKE '%Romance%' AND rating > 7.0;
This query demonstrates the power of hybrid search:
content LIKE
performs semantic similarity searchgenre !=
andNOT LIKE
provide precise exclusionsrating >
filters by numerical metadata

2. Complex Boolean Logic with Content
This enables you to use operators like AND on semantic search results, giving you fine-grained control to combine or exclude specific concepts in your queries:
-- Action-comedies with car chase scenes SELECT * FROM movies_kb WHERE content LIKE 'car chase driving speed race' AND expanded_genres LIKE '%Action%' AND expanded_genres LIKE '%Comedy%' AND rating > 6.5;

3. Advanced Pattern Exclusion
You can filter out rows matching specific string patterns, using NOT LIKE
, so your queries return only the data that truly fits your criteria.
-- Historical dramas without war themes SELECT * FROM movies_kb WHERE content LIKE 'historical period past century era' AND expanded_genres LIKE '%Drama%' AND content NOT LIKE 'war battle soldier military' AND content NOT LIKE 'fight combat weapon' AND rating > 3.5;
4. Multi-Condition Content Filtering
This allows you to apply multiple conditions with AND
/OR
across columns or values, so you can narrow results to only those records that meet several content-based criteria at once.
-- Positive-themed movies only SELECT * FROM movies_kb WHERE content LIKE 'hope inspiration success achievement dreams' AND content NOT LIKE 'death murder crime violence' AND content NOT LIKE 'sad tragedy loss depression' AND rating > 7.0;

Advanced SQL Operations
Let's look at more advanced SQL Operations.
UNION for Thematic Combinations
You can merge results from multiple queries into a single set, letting you combine different thematic datasets or search conditions into one unified output.
-- Combine different thriller subgenres SELECT * FROM movies_kb WHERE content LIKE 'detective investigation mystery' AND rating > 7.0 UNION SELECT * FROM movies_kb WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0 UNION SELECT * FROM movies_kb WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;

Complex Nested Logic
What if you want to build precise, hierarchical query filters.? You can group and layer conditions with parentheses—mixing AND
, OR
, and NOT
—to do so.
-- Multi-genre requirements with rating tiers SELECT * FROM movies_kb WHERE (content LIKE 'detective mystery investigation' AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%')) OR (content LIKE 'romance love relationship' AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%')) AND rating > 7.0;

IN/NOT IN Operations
This lets you quickly filter rows by checking whether a value matches (or excludes) any item in a specified list or subquery.
-- Adventure movies excluding family-friendly genres SELECT * FROM movies_kb WHERE content LIKE 'adventure journey quest treasure' AND genre NOT IN ('Horror', 'Romance', 'Family') AND rating > 6.5;

BETWEEN and Range Queries
You can filter results by selecting values that fall within a specified range, such as dates, numbers, or text intervals.
-- Comedy movies in specific rating range SELECT * FROM movies_kb WHERE content LIKE 'comedy funny humor laugh' AND rating BETWEEN 7.0 AND 9.0 AND expanded_genres LIKE '%Comedy%';

Technical Architecture: How It Works
MindsDB's KB algebra system operates through a sophisticated three-step process:
Step 1: Query Optimization
Converts complex NOT operations into efficient positive searches
Flattens nested condition trees for optimal execution
Separates content conditions from metadata filters
Step 2: SQL Algebra Execution
Content conditions: Processed using vector similarity search with configurable relevance thresholds
Metadata conditions: Handled by traditional database operations
Negative conditions: Converted to exclusion lists using sample-based filtering
Step 3: Post-Processing
Results combined using UNION (OR operations) or INTERSECT (AND operations)
Advanced SQL features (GROUP BY, ORDER BY) \
Type casting and data normalization handled automatically

Advanced Features
MindsDB's SQL Algebra has advanced features that allows for more in-depth querying.
Relevance Control
Rank and prioritize query results by relevance scores, ensuring the most contextually meaningful records appear first.
-- Fine-tune semantic matching precision SELECT * FROM movies_kb WHERE content LIKE 'complex philosophical deep meaning' AND relevance >= 0.6 -- Higher threshold = stricter matching AND reranking = false AND rating > 7.5;

NULL Handling
You can manage missing or undefined values using operators like IS NULL
and IS NOT NULL
, ensuring queries return accurate and predictable results.
-- Robust null value processing SELECT * FROM movies_kb WHERE content LIKE 'drama story character' AND rating IS NOT NULL AND genre IS NOT NULL;

Era-Based Analysis
Group and filter data by defined time periods (such as decades, quarters, or eras), making it easier to spot trends and patterns over time:
-- Exclude specific decades SELECT * FROM movies_kb WHERE content LIKE 'love story romance heart' AND genre = 'Romance' AND _original_doc_id NOT LIKE '% - 195%' -- Not 1950s AND _original_doc_id NOT LIKE '% - 196%' -- Not 1960s AND rating > 7.5;

Performance and Scalability
The system handles our 186,000 movie dataset with impressive performance:
Complex queries with multiple content conditions execute in seconds
Relevance filtering prevents irrelevant results from cluttering responses
Configurable reranking balances speed vs. accuracy
Vector database optimizations ensure scalability to millions of records
Real-World Applications
From personalized content discovery to faster legal research and sharper e-commerce search, Knowledge Bases with SQL Algebra unlock practical, everyday use cases that deliver measurable value. Here are a few examples:
Content Discovery: Power recommendation engines that deliver 30% more relevant results, e.g., “Find action movies like John Wick but without excessive violence” or “Sci-fi with romantic elements.”
Document Management: Cut research time in half by filtering precisely, e.g., “Find contracts mentioning IP but not patents” or “AI papers excluding deep learning focus.”
E-commerce Search: Boost product findability by 25% with nuanced queries, e.g., “Comfortable running shoes but not for marathons” or “Kitchen appliances under $200 but not blenders.”
Customer Support: Reduce ticket resolution time by 40% by targeting issues, e.g., “Product complaints about quality but not shipping” or “Feature requests for mobile but not related to payments.”
The Future of Data Querying
MindsDB's KB algebra represents a paradigm shift in how we interact with unstructured data. By making semantic search as accessible as traditional SQL, it democratizes AI-powered data exploration. Whether you're building recommendation systems, content discovery platforms, or intelligent search applications, MindsDB Knowledge Bases provide the perfect foundation.
The combination of natural language content search with precise metadata filtering opens up possibilities that were previously complex to implement and expensive to scale. With MindsDB, the future of intelligent data querying is here today. Contact our team if you'd like to see MindsDB in action.
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.
Start Building with MindsDB Today
Power your AI strategy with the leading AI data solution.
© 2025 All rights reserved by MindsDB.