Back

The Future of Data-Native & Grounded AI with MindsDB: Moving Data to Models No Longer Makes Sense

Chandre Van Der Westhuizen, Community & Marketing Co-ordinator at MindsDB

Oct 9, 2025

What does complex ETL and slow decision making have in common? Without proper AI implementation, it results in delayed access to critical insights, leading to missed opportunities, reduced agility, and lost revenue.

The default strategy for AI and analytics has been to move data to the model—centralizing it in warehouses, cleaning it through complex ETL pipelines, and then running models in isolation. But in an era where data is distributed, compliance is strict, and real-time insights are a necessity, this old approach is breaking down.

Enter data-native AI: a new paradigm where models go to the data instead. Instead of duplicating, transforming, and moving data across systems, data-native AI enables models to interact with information directly where it lives.

AI has also become vital for decision-making, a fundamental question keeps coming up: Can we trust the answers our AI gives us?

The answer hinges on a critical principle: grounding.

Grounded AI refers to models and agents that base their outputs on real, verifiable, and up-to-date data. Without this foundation, even the most advanced language models can hallucinate, mislead, or provide contextually irrelevant responses. And in high-stakes environments like finance, retail, or enterprise operations, that’s simply unacceptable.

The Case Against Moving Data

Moving data might seem like a necessary step in modern data workflows—but it’s often more harmful than helpful. Before you spin up another ETL pipeline or copy data into yet another system, consider the real costs. Here’s why relocating data can do more damage than good:

1. Latency and Freshness: By the time data is moved, cleaned, and transformed, it’s often stale. Real-time decision-making suffers as a result.

2. Cost and Complexity: ETL pipelines, data lakes, and duplicated storage come with steep infrastructure and maintenance costs.

3. Security and Compliance: Copying data increases the risk of exposure and makes it harder to enforce data governance policies. With regulations like GDPR and HIPAA, moving sensitive data unnecessarily is a liability.

4. Siloed Context: Context is often lost when data is stripped from its original environment and schema. This weakens model accuracy and relevance.

The Trust Gap in Traditional AI

Large language models are powerful, but they weren’t designed to know your business. They don’t understand the nuances of your financial reports, CRM notes, or policy documentation—unless you give them that context.

Most traditional AI workflows rely on static snapshots of data, embeddings, or manually curated inputs. This introduces a lag between what your AI "knows" and what’s actually happening. The result?

Outdated answers
Missing critical changes
Reduced trust in AI recommendations

What Is Data-Native AI?

Data-native AI flips the model by allowing AI to interact with data directly at its source—whether that's a SQL database, an API, a document repository, or an enterprise SaaS system. Instead of forcing data into a model's format, the model adapts to the native environment of the data.

Key features include:

Federated query engines that enable real-time data access
Agent-based architectures that query across systems
Security boundaries respected by default
Reduced need for data replication or transformation

Why Real-Time Access Matters

Real-time data access ensures that AI decisions are:

Contextual: Based on the latest state of your business
Verifiable: Easy to trace back to the exact source
Accurate: Reflecting current customer behavior, market changes, or system states

In regulated industries, it also means being able to show auditors where the AI got its answers and why they were reasonable at that time.

How MindsDB Delivers Grounded AI and the Data-Native AI Advantage

MindsDB is built from the ground up to support a data-native approach and enables grounded AI by allowing agents and LLMs to query your live structured and unstructured data sources—from SQL and NoSQL databases to PDFs, APIs, and SaaS tools. It doesn’t stop there, you can unify your data with the power of MindsDB’s Knowledge Bases.

With MindsDB, you can:

Seamlessly query multiple live data sources in real time
Ensure LLM outputs are grounded in current, explainable information
Eliminate the overhead of ETL processes and avoid redundant data storage
Uphold security and compliance by keeping data within trusted environments
Enable federated, real-time access to both structured and unstructured data
Leverage configurable agent tools to define accessible data and operational constraints
Deliver transparent, auditable outputs with citations, source documents, and metadata

Instead of relying on a static knowledge base, MindsDB-powered agents interact directly with the source of truth.

Use Case: Solving The Problem for E-commerce Stores with Siloed data

Lets take a use case for an E-commerce store that has siloed data. The goal would be to unify data from their online store with customer, sales, orders and products data to gain insights to make business decisions.

Problem: Moving, transforming, and duplicating data for AI creates high infrastructure and maintenance costs. Traditional pipelines rely on periodic data syncs, leading to stale insights.

Solution: MindsDB enables in-place querying, eliminating the need for complex ETL processes and redundant storage. Knowledge Bases provide real-time access to structured and unstructured data—ensuring decisions are based on the latest available information.

Pre-requisites:

Access MindsDB’s GUI via Docker locally or MindsDB’s extension on Docker Desktop.
Configure your default models in the MindsDB GUI by navigating to Settings → Models.
Add your data to MindsDB by creating a database connection or uploading your files.

First, a connection to the store's Postgres database, where all the data is stored in separate tables, will be made to MindsDB using the Postgres Integration and CREATE DATABASE statement. This will give you real-time access to your data in MindsDB:

CREATE DATABASE postgresql_conn
WITH ENGINE = 'postgres',
PARAMETERS = {
   "user": "demo_user",
   "password": "demo_password",
   "host": "samples.mindsdb.com",
   "port": "5432",
   "database": "demo",
   "schema": "sample_data"
};

Now a connection will be created between PGVector and MindsDB which will be used as a storage for our Knowledge Bases.:

CREATE DATABASE pvec
WITH
    ENGINE = 'pgvector',
    PARAMETERS = {
    "host": "127.0.0.1",
    "port": 5432,
    "database": "postgres",
    "user": "user",
    "password": "password",
    "distance": "cosine"
    };

Lets create a Knowledge Base using the CREATE KNOWLEDGE_BASE statement:

CREATE KNOWLEDGE_BASE sales_kb
USING
storage = heroku.sales_storage,
  metadata_columns = ['product_id', 'order_id', 'customer_id', 'order_date', 'ship_date'],
  content_columns = ['category', 'sub_category', 'product_name', 'ship_mode', 'customer_name', 'segment', 'country', 'city', 'region', 'sales', 'quantity', 'discount', 'profit']
;

Here is a breakdown of the parameters:

sales_kb: The name of the knowledgebase.
storage : The storage table where the embeddings of the knowledge base is stored.
metadata_columns : Here columns are provided as meta data columns to perform metadata filtering.
content_columns : Here columns are provided for semantic search
The id_column has not been provided, therefore it will be generated from the hash of the content columns.

Now that the Knowledge Base is created, we can insert data into it. The goal is to unify multiple tables in one Knowledge Base. To do so, we will join the tables in the INSERT INTO statement:

INSERT INTO sales_kb
SELECT a.product_id, a.order_id, a.customer_id, order_date, ship_date, category, sub_category, product_name, ship_mode, customer_name, segment, country, city, region, sales, quantity, discount, profit
FROM postgresql_conn.websales_sales AS a
LEFT JOIN postgresql_conn.websales_customers as b ON a.customer_id = b.customer_id
INNER JOIN postgresql_conn.websales_orders as c ON a.order_id = c.order_id
RIGHT JOIN postgresql_conn.websales_products as d ON a.product_id = d.product_id;

You can create a MindsDB Agent with the Knowledge Base and query it to gain insights using the CREATE AGENT statement:

CREATE AGENT sales_agent
USING
   data = {
        "knowledge_bases": ["mindsdb.sales_kb"]
   },
   prompt_template='
       mindsdb.sales_kb stores data about sales, products sold via e-commerce, customer information and orders placed in the online store';

Here is a breakdown of the parameters:

sales_agent: The name provided to the agent
data: This parameter holds the data linked to the agent, including knowledge bases and data sources integrated with MindsDB.
- knowledge_bases: Stores the list of knowledge bases
prompt_template: This parameter stores instructions for the agent and description of data. It is recommended to provide data description of the data sources listed in the knowledge_bases and tables parameters to help the agent locate relevant data for answering questions.

The agent is ready to be queried and you can gain insights on shipping, customer, product and sales and profit analysis.

Lets start with asking the agent how many customers belong to a specific segment.

SELECT answer
FROM sales_agent
WHERE question = 'How many customers belong to a specific segment?';

This shows the size and value of each customer group, helping businesses tailor strategies, allocate resources, and measure growth.

You can ask which orders drive the most profit, enabling sales teams and executives to focus on replicating high-value transactions and refining pricing or strategy for maximum impact.

SELECT answer
FROM sales_agent
WHERE question = 'Can you show me the top 5 most profitable transactions';

You can query the agent to reveal which products deliver the greatest profitability, guiding smarter decisions on pricing, promotions, and resource prioritization.

SELECT answer
FROM sales_agent
WHERE question = 'Which products have the highest profit margins? ';

The agent can also help track logistics efficiency, uncover customer preferences, and manage shipping costs for better supply chain decisions.

SELECT answer
FROM sales_agent
WHERE question = 'How many orders were shipped using a specific shipping mode';

Lets try to highlight overall performance patterns, helping businesses understand growth, seasonality, and shifts in customer demand.

SELECT answer
FROM sales_agent
WHERE question = 'What are the sales trends over the past year?';

Lets try to reveal the trade-off between higher sales volume and reduced profit margins, guiding smarter discounting and pricing strategies.

SELECT answer
FROM sales_agent
WHERE question = 'How does the discount offered affect the sales and profit?';

Real-World Impact

In industries like finance, energy, retail, and enterprise software vendors, MindsDB unlocks:

Faster insights: Real-time portfolio analysis or fraud detection
Smarter automation: AI that reacts to operational data without lag
More trust: Decisions backed by up-to-date, in-context data

Further use cases include:

Customer support: Agents reference current policy documents and CRM history to give personalized, accurate answers
Financial modeling: AI uses real-time transaction data and risk metrics to suggest actions
Compliance: Agents flag violations based on up-to-date regulatory documents and behavior logs

Final Thoughts

It used to make sense to move data to models—back when computing was scarce and data was centralized. But today, data is everywhere and speed matters. The future belongs to AI systems that go to the data, enabling real-time access without the delays of duplication or transfer. To stay competitive, AI must adapt to the data—not the other way around.

Data-native and grounded AI is faster, more secure, and better suited for the dynamic, distributed environments of modern enterprises.

With MindsDB, that future is already here. Contact our team to see our solution in action.

AI has also become vital for decision-making, a fundamental question keeps coming up: Can we trust the answers our AI gives us?

The answer hinges on a critical principle: grounding.

The Case Against Moving Data

1. Latency and Freshness: By the time data is moved, cleaned, and transformed, it’s often stale. Real-time decision-making suffers as a result.

2. Cost and Complexity: ETL pipelines, data lakes, and duplicated storage come with steep infrastructure and maintenance costs.

4. Siloed Context: Context is often lost when data is stripped from its original environment and schema. This weakens model accuracy and relevance.

The Trust Gap in Traditional AI

Most traditional AI workflows rely on static snapshots of data, embeddings, or manually curated inputs. This introduces a lag between what your AI "knows" and what’s actually happening. The result?

Outdated answers
Missing critical changes
Reduced trust in AI recommendations

What Is Data-Native AI?

Key features include:

Federated query engines that enable real-time data access
Agent-based architectures that query across systems
Security boundaries respected by default
Reduced need for data replication or transformation

Why Real-Time Access Matters

Real-time data access ensures that AI decisions are:

Contextual: Based on the latest state of your business
Verifiable: Easy to trace back to the exact source
Accurate: Reflecting current customer behavior, market changes, or system states

In regulated industries, it also means being able to show auditors where the AI got its answers and why they were reasonable at that time.

How MindsDB Delivers Grounded AI and the Data-Native AI Advantage

With MindsDB, you can:

Seamlessly query multiple live data sources in real time
Ensure LLM outputs are grounded in current, explainable information
Eliminate the overhead of ETL processes and avoid redundant data storage
Uphold security and compliance by keeping data within trusted environments
Enable federated, real-time access to both structured and unstructured data
Leverage configurable agent tools to define accessible data and operational constraints
Deliver transparent, auditable outputs with citations, source documents, and metadata

Instead of relying on a static knowledge base, MindsDB-powered agents interact directly with the source of truth.

Use Case: Solving The Problem for E-commerce Stores with Siloed data

Problem: Moving, transforming, and duplicating data for AI creates high infrastructure and maintenance costs. Traditional pipelines rely on periodic data syncs, leading to stale insights.

Pre-requisites:

Access MindsDB’s GUI via Docker locally or MindsDB’s extension on Docker Desktop.
Configure your default models in the MindsDB GUI by navigating to Settings → Models.
Add your data to MindsDB by creating a database connection or uploading your files.

CREATE DATABASE postgresql_conn
WITH ENGINE = 'postgres',
PARAMETERS = {
   "user": "demo_user",
   "password": "demo_password",
   "host": "samples.mindsdb.com",
   "port": "5432",
   "database": "demo",
   "schema": "sample_data"
};

Now a connection will be created between PGVector and MindsDB which will be used as a storage for our Knowledge Bases.:

CREATE DATABASE pvec
WITH
    ENGINE = 'pgvector',
    PARAMETERS = {
    "host": "127.0.0.1",
    "port": 5432,
    "database": "postgres",
    "user": "user",
    "password": "password",
    "distance": "cosine"
    };

Lets create a Knowledge Base using the CREATE KNOWLEDGE_BASE statement:

CREATE KNOWLEDGE_BASE sales_kb
USING
storage = heroku.sales_storage,
  metadata_columns = ['product_id', 'order_id', 'customer_id', 'order_date', 'ship_date'],
  content_columns = ['category', 'sub_category', 'product_name', 'ship_mode', 'customer_name', 'segment', 'country', 'city', 'region', 'sales', 'quantity', 'discount', 'profit']
;

Here is a breakdown of the parameters:

sales_kb: The name of the knowledgebase.
storage : The storage table where the embeddings of the knowledge base is stored.
metadata_columns : Here columns are provided as meta data columns to perform metadata filtering.
content_columns : Here columns are provided for semantic search
The id_column has not been provided, therefore it will be generated from the hash of the content columns.

Now that the Knowledge Base is created, we can insert data into it. The goal is to unify multiple tables in one Knowledge Base. To do so, we will join the tables in the INSERT INTO statement:

INSERT INTO sales_kb
SELECT a.product_id, a.order_id, a.customer_id, order_date, ship_date, category, sub_category, product_name, ship_mode, customer_name, segment, country, city, region, sales, quantity, discount, profit
FROM postgresql_conn.websales_sales AS a
LEFT JOIN postgresql_conn.websales_customers as b ON a.customer_id = b.customer_id
INNER JOIN postgresql_conn.websales_orders as c ON a.order_id = c.order_id
RIGHT JOIN postgresql_conn.websales_products as d ON a.product_id = d.product_id;

You can create a MindsDB Agent with the Knowledge Base and query it to gain insights using the CREATE AGENT statement:

CREATE AGENT sales_agent
USING
   data = {
        "knowledge_bases": ["mindsdb.sales_kb"]
   },
   prompt_template='
       mindsdb.sales_kb stores data about sales, products sold via e-commerce, customer information and orders placed in the online store';

Here is a breakdown of the parameters:

sales_agent: The name provided to the agent
data: This parameter holds the data linked to the agent, including knowledge bases and data sources integrated with MindsDB.
- knowledge_bases: Stores the list of knowledge bases
prompt_template: This parameter stores instructions for the agent and description of data. It is recommended to provide data description of the data sources listed in the knowledge_bases and tables parameters to help the agent locate relevant data for answering questions.

The agent is ready to be queried and you can gain insights on shipping, customer, product and sales and profit analysis.

Lets start with asking the agent how many customers belong to a specific segment.

SELECT answer
FROM sales_agent
WHERE question = 'How many customers belong to a specific segment?';

This shows the size and value of each customer group, helping businesses tailor strategies, allocate resources, and measure growth.

You can ask which orders drive the most profit, enabling sales teams and executives to focus on replicating high-value transactions and refining pricing or strategy for maximum impact.

SELECT answer
FROM sales_agent
WHERE question = 'Can you show me the top 5 most profitable transactions';

You can query the agent to reveal which products deliver the greatest profitability, guiding smarter decisions on pricing, promotions, and resource prioritization.

SELECT answer
FROM sales_agent
WHERE question = 'Which products have the highest profit margins? ';

The agent can also help track logistics efficiency, uncover customer preferences, and manage shipping costs for better supply chain decisions.

SELECT answer
FROM sales_agent
WHERE question = 'How many orders were shipped using a specific shipping mode';

Lets try to highlight overall performance patterns, helping businesses understand growth, seasonality, and shifts in customer demand.

SELECT answer
FROM sales_agent
WHERE question = 'What are the sales trends over the past year?';

Lets try to reveal the trade-off between higher sales volume and reduced profit margins, guiding smarter discounting and pricing strategies.

SELECT answer
FROM sales_agent
WHERE question = 'How does the discount offered affect the sales and profit?';

Real-World Impact

In industries like finance, energy, retail, and enterprise software vendors, MindsDB unlocks:

Faster insights: Real-time portfolio analysis or fraud detection
Smarter automation: AI that reacts to operational data without lag
More trust: Decisions backed by up-to-date, in-context data

Further use cases include:

Customer support: Agents reference current policy documents and CRM history to give personalized, accurate answers
Financial modeling: AI uses real-time transaction data and risk metrics to suggest actions
Compliance: Agents flag violations based on up-to-date regulatory documents and behavior logs