MindsDB Knowledge Bases: Revolutionizing AI-Powered Data Queries with SQL Algebra

MindsDB Knowledge Bases: Revolutionizing AI-Powered Data Queries with SQL Algebra

Alejandro Cantu, Senior Product Manager at MindsDB

Sep 25, 2025

MindsDB has unleashed a game-changing feature that bridges the gap between traditional SQL databases and modern AI-powered semantic search. The new Knowledge Base Algebra system transforms how we query unstructured data, making it as intuitive as working with traditional databases while leveraging the power of vector embeddings and LLMs.


What Makes MindsDB Knowledge Bases Special?

Imagine this: you’re sitting on mountains of data—support tickets, research papers, contracts, product reviews. Some of it is neat and structured (tables, ratings, categories). But most of it? Messy paragraphs, sprawling PDFs, emails, or notes.


You know the insights are there… but how do you ask the right questions and actually get meaningful answers?

The Struggle with Today’s Tools

  • Databases are powerful for structured data, but they don’t understand natural language.

  • Vector databases capture “semantic meaning,” but they can’t filter with precision.

  • LLMs are smart, but without guardrails they guess, hallucinate, or miss key context.


That leaves you with two bad options: either wrangle SQL for rigid results or trust an AI that can’t be trusted.


Sound familiar?


Traditional databases excel at structured data queries, while vector databases handle semantic similarity search. MindsDB Knowledge Bases combine both worlds, allowing you to run complex SQL queries that seamlessly blend:

  • Semantic content search using vector embeddings

  • Traditional metadata filtering using exact matches and ranges

  • Complex boolean logic with AND, OR, and NOT operations


All of this in one query, one system, no extra infrastructure.


Real-World Example: 186,000 Movie Database

Let's explore these capabilities using a real dataset of 186,000 movies, where each record contains:

id

content

metadata

 movie_id

A bank security expert plots with a call girl to rob three safety deposit boxes...

{"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"}


Normally, searching it for something nuanced would be impossible without hours of filtering. With MindsDB Knowledge Bases and its SQL Algebra, you can seamlessly query this data.


Start with accessing MindsDB, you can do so by installing it locally via Docker, MindsDB’s Extension on Docker Desktop.


The next step would be to connect your database with the CREATE DATABASE statement.

CREATE DATABASE heroku
WITH
    ENGINE = 'pgvector',
    PARAMETERS = {
        "host": "host.xxxxx",
        "port": 5432,
        "database": "db",
        "user": "user",
        "password": "xxxx"
    };


Once your data is connected, the Knowledge Base can be created with the CREATE KNOWLEDGE BASE statement.

CREATE KNOWLEDGE_BASE movies_kb
    USING
        embedding_model = {
        'provider': 'openai',
        'model_name' : 'text-embedding-3-small',
        'api_key': 'sk-xxx'
        },
	reranking_model = {
        "provider": "openai",
        "model_name": "gpt-4o",
        "api_key": "sk-xxx"
	 },
        storage=heroku.movies_kb,
        metadata_columns = ['genre', 'expanded_genres', 'rating'],
        content_columns = ['content'],
        id_column = 'movie_id';


The data in the connected database can now be inserted into your Knowledge Base using the INSERT INTO statement.

INSERT INTO movies_kb
SELECT movie_id, genre, expanded_genres, rating, content
FROM heroku.movies_kb;


You can select the Knowledge Base to see if the data was successfully embedded.

SELECT * FROM movies_kb


Now that you have created a Knowledge Base, you can now explore how to query it with SQL Algebra.


Revolutionary Query Capabilities

There are various ways that we can query our Knowledge Base  based on the type of information you need. Let’s look at how simple SQL Algebra queries can obtain information we require from our Knowledge Base.


1. Hybrid Content + Metadata Search

You can  combine semantic search of unstructured content with precise metadata filters (like dates, tags, or categories) to deliver more accurate, context-rich results. Let’s find movies that match semantic content AND specific criteria:

-- High-rated action movies with heist themes (no romance)
SELECT * FROM movies_kb 
WHERE content LIKE 'heist bank robbery space alien planet'
  AND genre != 'Romance' 
  AND expanded_genres NOT LIKE '%Romance%'
  AND rating > 7.0;

This query demonstrates the power of hybrid search:

  • content LIKE performs semantic similarity search

  • genre != and NOT LIKE provide precise exclusions

  • rating > filters by numerical metadata


2. Complex Boolean Logic with Content

This enables you to use operators like AND on semantic search results, giving you fine-grained control to combine or exclude specific concepts in your queries:

-- Action-comedies with car chase scenes
SELECT * FROM movies_kb 
WHERE content LIKE 'car chase driving speed race'
  AND expanded_genres LIKE '%Action%'
  AND expanded_genres LIKE '%Comedy%'
  AND rating > 6.5;


3. Advanced Pattern Exclusion

You can filter out rows matching specific string patterns, using NOT LIKE, so your queries return only the data that truly fits your criteria.

-- Historical dramas without war themes
SELECT * FROM movies_kb 
WHERE content LIKE 'historical period past century era'
  AND expanded_genres LIKE '%Drama%'
  AND content NOT LIKE 'war battle soldier military'
  AND content NOT LIKE 'fight combat weapon'
  AND rating > 3.5;


4. Multi-Condition Content Filtering

This allows you to apply multiple conditions with AND/OR across columns or values, so you can narrow results to only those records that meet several content-based criteria at once.

-- Positive-themed movies only
SELECT * FROM movies_kb 
WHERE content LIKE 'hope inspiration success achievement dreams'
  AND content NOT LIKE 'death murder crime violence'
  AND content NOT LIKE 'sad tragedy loss depression'
  AND rating > 7.0;


Advanced SQL Operations

Let's look at more advanced SQL Operations.


UNION for Thematic Combinations

You can merge results from multiple queries into a single set, letting you combine different thematic datasets or search conditions into one unified output.

-- Combine different thriller subgenres
SELECT * FROM movies_kb 
WHERE content LIKE 'detective investigation mystery' AND rating > 7.0
UNION
SELECT * FROM movies_kb 
WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0
UNION
SELECT * FROM movies_kb 
WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;


Complex Nested Logic

What if you want to build precise, hierarchical query filters.? You can group and layer conditions with parentheses—mixing AND, OR, and NOT—to do so.

-- Multi-genre requirements with rating tiers
SELECT * FROM movies_kb 
WHERE (content LIKE 'detective mystery investigation' 
       AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))
   OR (content LIKE 'romance love relationship'
       AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%'))
  AND rating > 7.0;


IN/NOT IN Operations

This lets you quickly filter rows by checking whether a value matches (or excludes) any item in a specified list or subquery.

-- Adventure movies excluding family-friendly genres
SELECT * FROM movies_kb 
WHERE content LIKE 'adventure journey quest treasure'
  AND genre NOT IN ('Horror', 'Romance', 'Family')
  AND rating > 6.5;


BETWEEN and Range Queries

You can  filter results by selecting values that fall within a specified range, such as dates, numbers, or text intervals.

-- Comedy movies in specific rating range
SELECT * FROM movies_kb 
WHERE content LIKE 'comedy funny humor laugh'
  AND rating BETWEEN 7.0 AND 9.0
  AND expanded_genres LIKE '%Comedy%';


Technical Architecture: How It Works

MindsDB's KB algebra system operates through a sophisticated three-step process:

Step 1: Query Optimization
  • Converts complex NOT operations into efficient positive searches

  • Flattens nested condition trees for optimal execution

  • Separates content conditions from metadata filters

Step 2: SQL Algebra Execution
  • Content conditions: Processed using vector similarity search with configurable relevance thresholds

  • Metadata conditions: Handled by traditional database operations

  • Negative conditions: Converted to exclusion lists using sample-based filtering

Step 3: Post-Processing
  • Results combined using UNION (OR operations) or INTERSECT (AND operations)

  • Advanced SQL features (GROUP BY, ORDER BY) \

  • Type casting and data normalization handled automatically


Advanced Features

MindsDB's SQL Algebra has advanced features that allows for more in-depth querying.


Relevance Control

Rank and prioritize query results by relevance scores, ensuring the most contextually meaningful records appear first.

-- Fine-tune semantic matching precision
SELECT * FROM movies_kb 
WHERE content LIKE 'complex philosophical deep meaning'
  AND relevance >= 0.6  -- Higher threshold = stricter matching
  AND reranking = false
  AND rating > 7.5;

NULL Handling

You can manage missing or undefined values using operators like IS NULL and IS NOT NULL, ensuring queries return accurate and predictable results.

-- Robust null value processing
SELECT * FROM movies_kb 
WHERE content LIKE 'drama story character'
  AND rating IS NOT NULL
  AND genre IS NOT NULL;


Era-Based Analysis

Group and filter data by defined time periods (such as decades, quarters, or eras), making it easier to spot trends and patterns over time:

-- Exclude specific decades
SELECT * FROM movies_kb 
WHERE content LIKE 'love story romance heart'
  AND genre = 'Romance'
  AND _original_doc_id NOT LIKE '% - 195%'  -- Not 1950s
  AND _original_doc_id NOT LIKE '% - 196%'  -- Not 1960s
  AND rating > 7.5;


Performance and Scalability

The system handles our 186,000 movie dataset with impressive performance:

  • Complex queries with multiple content conditions execute in seconds

  • Relevance filtering prevents irrelevant results from cluttering responses

  • Configurable reranking balances speed vs. accuracy

  • Vector database optimizations ensure scalability to millions of records


Real-World Applications

From personalized content discovery to faster legal research and sharper e-commerce search, Knowledge Bases with SQL Algebra unlock practical, everyday use cases that deliver measurable value. Here are a few examples:

  1. Content Discovery: Power recommendation engines that deliver 30% more relevant results, e.g., “Find action movies like John Wick but without excessive violence” or “Sci-fi with romantic elements.”

  2. Document Management: Cut research time in half by filtering precisely, e.g., “Find contracts mentioning IP but not patents” or “AI papers excluding deep learning focus.”

  3. E-commerce Search: Boost product findability by 25% with nuanced queries, e.g., “Comfortable running shoes but not for marathons” or “Kitchen appliances under $200 but not blenders.”

  4. Customer Support: Reduce ticket resolution time by 40% by targeting issues, e.g., “Product complaints about quality but not shipping” or “Feature requests for mobile but not related to payments.”


The Future of Data Querying

MindsDB's KB algebra represents a paradigm shift in how we interact with unstructured data. By making semantic search as accessible as traditional SQL, it democratizes AI-powered data exploration. Whether you're building recommendation systems, content discovery platforms, or intelligent search applications, MindsDB Knowledge Bases provide the perfect foundation.


The combination of natural language content search with precise metadata filtering opens up possibilities that were previously complex to implement and expensive to scale. With MindsDB, the future of intelligent data querying is here today. Contact our team if you'd like to see MindsDB in action.

MindsDB has unleashed a game-changing feature that bridges the gap between traditional SQL databases and modern AI-powered semantic search. The new Knowledge Base Algebra system transforms how we query unstructured data, making it as intuitive as working with traditional databases while leveraging the power of vector embeddings and LLMs.


What Makes MindsDB Knowledge Bases Special?

Imagine this: you’re sitting on mountains of data—support tickets, research papers, contracts, product reviews. Some of it is neat and structured (tables, ratings, categories). But most of it? Messy paragraphs, sprawling PDFs, emails, or notes.


You know the insights are there… but how do you ask the right questions and actually get meaningful answers?

The Struggle with Today’s Tools

  • Databases are powerful for structured data, but they don’t understand natural language.

  • Vector databases capture “semantic meaning,” but they can’t filter with precision.

  • LLMs are smart, but without guardrails they guess, hallucinate, or miss key context.


That leaves you with two bad options: either wrangle SQL for rigid results or trust an AI that can’t be trusted.


Sound familiar?


Traditional databases excel at structured data queries, while vector databases handle semantic similarity search. MindsDB Knowledge Bases combine both worlds, allowing you to run complex SQL queries that seamlessly blend:

  • Semantic content search using vector embeddings

  • Traditional metadata filtering using exact matches and ranges

  • Complex boolean logic with AND, OR, and NOT operations


All of this in one query, one system, no extra infrastructure.


Real-World Example: 186,000 Movie Database

Let's explore these capabilities using a real dataset of 186,000 movies, where each record contains:

id

content

metadata

 movie_id

A bank security expert plots with a call girl to rob three safety deposit boxes...

{"genre":"Crime","rating":6.3,"expanded_genres":"Comedy, Crime, Drama"}


Normally, searching it for something nuanced would be impossible without hours of filtering. With MindsDB Knowledge Bases and its SQL Algebra, you can seamlessly query this data.


Start with accessing MindsDB, you can do so by installing it locally via Docker, MindsDB’s Extension on Docker Desktop.


The next step would be to connect your database with the CREATE DATABASE statement.

CREATE DATABASE heroku
WITH
    ENGINE = 'pgvector',
    PARAMETERS = {
        "host": "host.xxxxx",
        "port": 5432,
        "database": "db",
        "user": "user",
        "password": "xxxx"
    };


Once your data is connected, the Knowledge Base can be created with the CREATE KNOWLEDGE BASE statement.

CREATE KNOWLEDGE_BASE movies_kb
    USING
        embedding_model = {
        'provider': 'openai',
        'model_name' : 'text-embedding-3-small',
        'api_key': 'sk-xxx'
        },
	reranking_model = {
        "provider": "openai",
        "model_name": "gpt-4o",
        "api_key": "sk-xxx"
	 },
        storage=heroku.movies_kb,
        metadata_columns = ['genre', 'expanded_genres', 'rating'],
        content_columns = ['content'],
        id_column = 'movie_id';


The data in the connected database can now be inserted into your Knowledge Base using the INSERT INTO statement.

INSERT INTO movies_kb
SELECT movie_id, genre, expanded_genres, rating, content
FROM heroku.movies_kb;


You can select the Knowledge Base to see if the data was successfully embedded.

SELECT * FROM movies_kb


Now that you have created a Knowledge Base, you can now explore how to query it with SQL Algebra.


Revolutionary Query Capabilities

There are various ways that we can query our Knowledge Base  based on the type of information you need. Let’s look at how simple SQL Algebra queries can obtain information we require from our Knowledge Base.


1. Hybrid Content + Metadata Search

You can  combine semantic search of unstructured content with precise metadata filters (like dates, tags, or categories) to deliver more accurate, context-rich results. Let’s find movies that match semantic content AND specific criteria:

-- High-rated action movies with heist themes (no romance)
SELECT * FROM movies_kb 
WHERE content LIKE 'heist bank robbery space alien planet'
  AND genre != 'Romance' 
  AND expanded_genres NOT LIKE '%Romance%'
  AND rating > 7.0;

This query demonstrates the power of hybrid search:

  • content LIKE performs semantic similarity search

  • genre != and NOT LIKE provide precise exclusions

  • rating > filters by numerical metadata


2. Complex Boolean Logic with Content

This enables you to use operators like AND on semantic search results, giving you fine-grained control to combine or exclude specific concepts in your queries:

-- Action-comedies with car chase scenes
SELECT * FROM movies_kb 
WHERE content LIKE 'car chase driving speed race'
  AND expanded_genres LIKE '%Action%'
  AND expanded_genres LIKE '%Comedy%'
  AND rating > 6.5;


3. Advanced Pattern Exclusion

You can filter out rows matching specific string patterns, using NOT LIKE, so your queries return only the data that truly fits your criteria.

-- Historical dramas without war themes
SELECT * FROM movies_kb 
WHERE content LIKE 'historical period past century era'
  AND expanded_genres LIKE '%Drama%'
  AND content NOT LIKE 'war battle soldier military'
  AND content NOT LIKE 'fight combat weapon'
  AND rating > 3.5;


4. Multi-Condition Content Filtering

This allows you to apply multiple conditions with AND/OR across columns or values, so you can narrow results to only those records that meet several content-based criteria at once.

-- Positive-themed movies only
SELECT * FROM movies_kb 
WHERE content LIKE 'hope inspiration success achievement dreams'
  AND content NOT LIKE 'death murder crime violence'
  AND content NOT LIKE 'sad tragedy loss depression'
  AND rating > 7.0;


Advanced SQL Operations

Let's look at more advanced SQL Operations.


UNION for Thematic Combinations

You can merge results from multiple queries into a single set, letting you combine different thematic datasets or search conditions into one unified output.

-- Combine different thriller subgenres
SELECT * FROM movies_kb 
WHERE content LIKE 'detective investigation mystery' AND rating > 7.0
UNION
SELECT * FROM movies_kb 
WHERE content LIKE 'heist robbery theft steal' AND rating > 7.0
UNION
SELECT * FROM movies_kb 
WHERE content LIKE 'spy secret agent undercover' AND rating > 7.0;


Complex Nested Logic

What if you want to build precise, hierarchical query filters.? You can group and layer conditions with parentheses—mixing AND, OR, and NOT—to do so.

-- Multi-genre requirements with rating tiers
SELECT * FROM movies_kb 
WHERE (content LIKE 'detective mystery investigation' 
       AND (genre = 'Mystery' OR expanded_genres LIKE '%Thriller%'))
   OR (content LIKE 'romance love relationship'
       AND (genre = 'Romance' OR expanded_genres LIKE '%Romance%'))
  AND rating > 7.0;


IN/NOT IN Operations

This lets you quickly filter rows by checking whether a value matches (or excludes) any item in a specified list or subquery.

-- Adventure movies excluding family-friendly genres
SELECT * FROM movies_kb 
WHERE content LIKE 'adventure journey quest treasure'
  AND genre NOT IN ('Horror', 'Romance', 'Family')
  AND rating > 6.5;


BETWEEN and Range Queries

You can  filter results by selecting values that fall within a specified range, such as dates, numbers, or text intervals.

-- Comedy movies in specific rating range
SELECT * FROM movies_kb 
WHERE content LIKE 'comedy funny humor laugh'
  AND rating BETWEEN 7.0 AND 9.0
  AND expanded_genres LIKE '%Comedy%';


Technical Architecture: How It Works

MindsDB's KB algebra system operates through a sophisticated three-step process:

Step 1: Query Optimization
  • Converts complex NOT operations into efficient positive searches

  • Flattens nested condition trees for optimal execution

  • Separates content conditions from metadata filters

Step 2: SQL Algebra Execution
  • Content conditions: Processed using vector similarity search with configurable relevance thresholds

  • Metadata conditions: Handled by traditional database operations

  • Negative conditions: Converted to exclusion lists using sample-based filtering

Step 3: Post-Processing
  • Results combined using UNION (OR operations) or INTERSECT (AND operations)

  • Advanced SQL features (GROUP BY, ORDER BY) \

  • Type casting and data normalization handled automatically


Advanced Features

MindsDB's SQL Algebra has advanced features that allows for more in-depth querying.


Relevance Control

Rank and prioritize query results by relevance scores, ensuring the most contextually meaningful records appear first.

-- Fine-tune semantic matching precision
SELECT * FROM movies_kb 
WHERE content LIKE 'complex philosophical deep meaning'
  AND relevance >= 0.6  -- Higher threshold = stricter matching
  AND reranking = false
  AND rating > 7.5;

NULL Handling

You can manage missing or undefined values using operators like IS NULL and IS NOT NULL, ensuring queries return accurate and predictable results.

-- Robust null value processing
SELECT * FROM movies_kb 
WHERE content LIKE 'drama story character'
  AND rating IS NOT NULL
  AND genre IS NOT NULL;


Era-Based Analysis

Group and filter data by defined time periods (such as decades, quarters, or eras), making it easier to spot trends and patterns over time:

-- Exclude specific decades
SELECT * FROM movies_kb 
WHERE content LIKE 'love story romance heart'
  AND genre = 'Romance'
  AND _original_doc_id NOT LIKE '% - 195%'  -- Not 1950s
  AND _original_doc_id NOT LIKE '% - 196%'  -- Not 1960s
  AND rating > 7.5;


Performance and Scalability

The system handles our 186,000 movie dataset with impressive performance:

  • Complex queries with multiple content conditions execute in seconds

  • Relevance filtering prevents irrelevant results from cluttering responses

  • Configurable reranking balances speed vs. accuracy

  • Vector database optimizations ensure scalability to millions of records


Real-World Applications

From personalized content discovery to faster legal research and sharper e-commerce search, Knowledge Bases with SQL Algebra unlock practical, everyday use cases that deliver measurable value. Here are a few examples:

  1. Content Discovery: Power recommendation engines that deliver 30% more relevant results, e.g., “Find action movies like John Wick but without excessive violence” or “Sci-fi with romantic elements.”

  2. Document Management: Cut research time in half by filtering precisely, e.g., “Find contracts mentioning IP but not patents” or “AI papers excluding deep learning focus.”

  3. E-commerce Search: Boost product findability by 25% with nuanced queries, e.g., “Comfortable running shoes but not for marathons” or “Kitchen appliances under $200 but not blenders.”

  4. Customer Support: Reduce ticket resolution time by 40% by targeting issues, e.g., “Product complaints about quality but not shipping” or “Feature requests for mobile but not related to payments.”


The Future of Data Querying

MindsDB's KB algebra represents a paradigm shift in how we interact with unstructured data. By making semantic search as accessible as traditional SQL, it democratizes AI-powered data exploration. Whether you're building recommendation systems, content discovery platforms, or intelligent search applications, MindsDB Knowledge Bases provide the perfect foundation.


The combination of natural language content search with precise metadata filtering opens up possibilities that were previously complex to implement and expensive to scale. With MindsDB, the future of intelligent data querying is here today. Contact our team if you'd like to see MindsDB in action.

Start Building with MindsDB Today

Power your AI strategy with the leading AI data solution.

© 2025 All rights reserved by MindsDB.

Start Building with MindsDB Today

Power your AI strategy with the leading AI data solution.

© 2025 All rights reserved by MindsDB.

Start Building with MindsDB Today

Power your AI strategy with the leading AI data solution.

© 2025 All rights reserved by MindsDB.

Start Building with MindsDB Today

Power your AI strategy with the leading AI data solution.

© 2025 All rights reserved by MindsDB.