Back

Drive Compliance & Customer Insights with MindsDB Knowledge Bases + Hybrid Search

Chandre Van Der Westhuizen, Community & Marketing Co-ordinator at MindsDB

Sep 11, 2025

Enterprise software vendors operate in a demanding environment: customers expect personalized, data-driven experiences, while regulators and enterprise buyers require rigorous compliance, auditability, and security. Meanwhile, compliance teams sift through policies, contracts, and audits as product and customer success teams seek insights from records and feedback—yet these sources often sit in silos, making a single, trustworthy view hard to achieve.

MindsDB Knowledge Bases with Hybrid Search resolve this tension by unifying unstructured compliance documents with structured customer data, enabling secure, intelligent, and transparent AI features directly inside your platform. The result is a trusted, real-time view that preserves both agility and control—accelerating decisions, reducing manual review, and powering customer experiences that meet enterprise-grade standards.

Not only can you implement MindsDB in your AI services to your customers, you can also implement it within your own organization.

What Are Knowledge Bases in MindsDB?

MindsDB’s Knowledge Bases act like AI-native indexes of your enterprise data. They combine:

Semantic Search (AI + vector embeddings): Understands meaning and intent, not just keywords.
SQL Logic & Metadata Filters: Exact matching on fields like dates, IDs, sentiment scores, or regulatory categories.
Hybrid Search: Blends semantic similarity with keyword search to give the most relevant, compliant, and context-aware results.

In practice, this means you can index customer & compliance data together – contracts, policies, customer tickets, and product feedback in one layer— and query it with the precision of SQL and the flexibility of AI.

Why Hybrid Search Matters for Enterprise Software Vendors

You can blend semantic understanding with SQL precision to deliver trusted, compliant insights—right inside your product used by your customers or by your own company.

Embedded Compliance: Provide enterprise customers with built-in tools to check requirements.
Example: “Show all contracts mentioning GDPR with renewal dates in Q4 2025.”
Customer Intelligence at Scale: Unlock insights from support tickets and feedback that would otherwise remain buried.
Example: “Find all EU enterprise accounts with recurring login complaints.”
Trusted AI Features: Ground AI outputs in real customer and compliance data, avoiding the “black box” effect.
Differentiation: Compete by offering AI-driven compliance and analytics capabilities natively inside your product.

Example Use case: Unifying Compliance Documents & Customer Data and Performing Hybrid Search

A great example of what can be built for your customers is by exploring what can be built within your own organization and using your company’s compliance documents and customer data. Here’s how an enterprise software vendor could deliver compliance-aware customer intelligence inside their platform:

Index Compliance Docs:
Create a Knowledge Base with contracts, audit trails, and regulatory documents.
- Metadata: jurisdiction, renewal date, risk level
- Content: document text
Index Customer Data:
Create another Knowledge Base for support tickets, product feedback, or CRM records.
- Metadata: customer ID, region, account tier
- Content: feedback text, support notes

We will use MindsDB’s Knowledge Base to index this data and perform Hybrid search to showcase how you can deliver compliance-aware customer intelligence.

Start with accessing MindsDB GUI locally via Docker, MindsDB’s Extension on Docker Desktop.

Step 1: Connect Your Datasource and LLM models

MindsDB allows you to add various LLM models as reranking and embedding models in the GUI’s Settings:

Navigate to Settings.
Select the Models tab.
Add the default model that will be used for your Agent.
Add the Reranking and Embedding model that will be used for your Knowledge Base.
There is an option to select using your default model as your reranking and embedding model.

Once you have added your model, you can connect your data. MindsDB allows you to connect to over 200 datasources, including the likes of Salesforce and Intercom.

For the purpose of this example, we have added csv files for compliance documents data and customer data to our GUI by uploading it as a file which will be stored as a table in MindsDB. You can check out our Upload as a File documentation.

A vector store is needed to store data in the form of embeddings for Knowledge Bases, and as Hybrid Search is only available for knowledge bases backed by a PGVector engine, you will have to connect PGVector as a database to use as a storage table.

CREATE DATABASE pvec
WITH
   ENGINE = 'pgvector',
   PARAMETERS = {
   "host": "127.0.0.1",
   "port": 5432,
   "database": "postgres",
   "user": "postgres",
   "password": "password"
   };

Step 2: Unify your Compliance Documents with Customer Data

Once your data is connected, you can create 2 different Knowledge bases for your compliance documents data and customer data using the CREATE KNOWLEDGE BASE statement.

Let’s create the first Knowledge Base for the compliance documents data:

CREATE KNOWLEDGE_BASE compliance_kb
USING
storage = pvec.compliance_table,
 metadata_columns = ['doc_type','jurisdiction', 'regulation', 'risk_rating', 'status', 'version', 'owner_department', 'effective_date', 'last_review_date', 'renewal_date', 'document_id',
 'page_count'],
 content_columns  = ['title', 'summary_snippet', 'keywords'],
 id_column = 'customer_id';

The parameters provided to the knowledge base is as follows:

compliance_kb: This is the name provided to the Knowledge Base
storage: Here you provide the connection name for our storage database, and provide a name to our storage table.
metadata_columns: Here the columns are provided which will be used for metadata filtering
content_columns: Here the columns are provided which will be used for semantic filtering.
id_column: Here a column is provided that uniquely identifies each source data row in the knowledge base. If not provided, it is generated from the hash of the content columns.

Now insert your data into the compliance_kb Knowledge Base using the INSERT INTO statement.

INSERT INTO compliance_kb
SELECT
 document_id,                 -- id
 title, summary_snippet, keywords,   -- content
 doc_type, jurisdiction, regulation, risk_rating, status, version, owner_department,
 effective_date, last_review_date, renewal_date, customer_id, page_count  -- metadata
FROM files.compliance_documents;

Here is the data provided to the compliance_kb Knowledge Base:

document_id — Unique identifier for each compliance document.
title — Document title/name.
doc_type — Document category (e.g., policy, contract, audit report).
jurisdiction — Region/country the document applies to.
regulation — Specific regulation or standard referenced (e.g., GDPR).
risk_rating — Assessed risk level (e.g., Low/Medium/High).
status — Lifecycle state (draft, active, expired, etc.).
version — Document version number.
owner_department — Team responsible for the document.
effective_date — Date the document officially takes effect.
last_review_date — Most recent date the document was reviewed.
renewal_date — Next renewal/expiry date.
customer_id — Linked customer identifier (join key to customer table).
keywords — Tags/terms extracted or assigned to aid search.
page_count — Number of pages in the document.
summary_snippet — Short abstract/excerpt for quick preview.

Let’s query the data inserted into the compliance_kb Knowledge Base using the SELECT statement.

SELECT * from compliance_kb

Now that the compliance_kb Knowledge Base is successfully created, we can create a Knowledge Base for our customer data using the same steps as above.

CREATE KNOWLEDGE_BASE customer_data_kb
USING
storage = pvec.customer_table,
 metadata_columns = ['company_name', 'region', 'account_type',
'sla_tier', 'sentiment_score', 'last_interaction_date'],
  content_columns  = ['feedback_text'],
 id_column = 'customer_id';

You will notice we have provided the id_column a similar column name that was used in the compliance_kb Knowledge Base. This is so that a cross JOIN can be performed between the two Knowledge Bases

Insert the customer data into the customer_data_kb Knowledge Base:

INSERT INTO customer_data_kb
SELECT
 customer_id,                 -- id
 feedback_text,               -- content
 company_name, region, account_type, sla_tier, sentiment_score, last_interaction_date  -- metadata
FROM files.customer_data;

Here is a breakdown of the data inserted into the customer_data_kb Knowledge Base:

customer_id — Unique customer identifier; joins to compliance docs.
company_name — Customer’s organization/legal name.
region — Geographic region of the customer.
account_type — Segment classification (e.g., SMB, Mid-Market, Enterprise).
sla_tier — Support/SLA level (e.g., Bronze, Silver, Gold).
sentiment_score — Numeric sentiment (0–1; higher = more positive).
feedback_text — Latest comment/issue/feedback from the customer.
last_interaction_date — Most recent touchpoint or engagement date.

Select the customer_data_kb Knowledge Base to see the data inserted:

SELECT * FROM customer_data_kb;

Step 3: Perform Hybrid Search on the Knowledge Bases.

The Knowledge Bases can now be queried using Hybrid search.

Let’s start with querying the compliance documents data. This allows you to easily identify which compliance documents to retrieve and see what they key points and summary is of the documents, and at what risk level your customer is.

You can surface EU policies flagged Medium/High risk within a defined date range:

-- Find EU-related policies with medium/high risk during a time window.
SELECT *
FROM compliance_kb
WHERE
  content = 'data retention policy and privacy handling'  -- semantic
  AND jurisdiction = 'EU'                                  -- metadata
  AND risk_rating IN ('Medium','High')                      -- metadata
  AND date BETWEEN '2024-08-30' AND '2025-08-01'           -- range
  AND relevance >= 0.5                                    -- relevance filtering
  AND hybrid_search_alpha = 0.60                           -- blend semantic + keyword
LIMIT 20;

The query provides the keypoints and summary snippets of the compliance documents for specific customers.

MindsDB Compliance Knowledge Base Hybrid Search

Let’s try to identify remediation and next-step language in high-risk documents linked to specific customers.

-- Surface remediation/next-steps language in high-risk docs linked to named customers.
SELECT *
FROM compliance_kb
WHERE
  content = 'remediation plan and corrective actions'      -- semantic
  AND risk_rating = 'High'
  AND relevance >= 0.4
  AND hybrid_search_alpha = 0.55
LIMIT 20;

Now we can go ahead and query the customer_data_kb Knowledge Base with Hybrid Search.

You can identify authentication issues among Enterprise accounts in Europe.

-- Find authentication issues among Enterprise accounts in Europe.
SELECT *
FROM customer_kb
WHERE
  content = 'login failures and authentication issues'   -- semantic
  AND region = 'Europe'                                  -- metadata
  AND account_type = 'Enterprise'                        -- metadata
  AND hybrid_search_alpha = 0.50
LIMIT 20;

Let’s look for support notes that mention data privacy or security concerns across all regions and account types.

-- Identify support notes about data privacy or security concerns across any region/types.
SELECT *
FROM customer_kb
WHERE
  content = 'privacy concerns and security incident mentions'  -- semantic
  AND relevance >= 0.55
  AND hybrid_search_alpha = 0.50
LIMIT 20;

Step 4: Cross JOIN Separate Knowledge Bases to perform Hybrid Search.

You can cross JOIN your compliance_kb and customer_data_kb Knowledge Bases to examine the correlation between customer feedback data and regulatory risks. To do this, you can JOIN the 2 different knowledge bases on the id column of the knowledge bases. Make sure that the id columns’ parameter value has corresponding column names and column values.

Let’s execute a query that returns the most recent customers who are tied to high-risk EU compliance documents about privacy/GDPR, and whose own notes mention security/privacy concerns—blending semantic relevance with metadata filters.

-- Cross-KB join: find customers linked to risky compliance docs
SELECT
 *
FROM customer_data_kb AS c
JOIN compliance_kb AS d
 ON c.id = d.id
WHERE
 d.content = 'privacy policy violations,data retention non-compliance, GDPR fines'
 AND d.jurisdiction = 'EU'
 AND d.risk_rating IN ('High')
 AND c.content = 'security or data privacy concerns'
 AND d.hybrid_search_alpha = 0.50
 AND c.hybrid_search_alpha = 0.50
LIMIT 25;

We can identify the customer Workday that will need action

This query also provides the title of the compliance document in question, as well as what the keywords are and the summary snippet, indicating that this is at high risk.

You can correlate customer complaints with SLA and contract compliance issues to pinpoint gaps and prioritize remediation.

-- Correlate customer complaints with SLA/contract compliance issues.
SELECT *
FROM customer_data_kb
JOIN compliance_kb
 ON customer_data_kb.id = compliance_kb.id
WHERE
   customer_data_kb.content = 'billing errors or contract disputes'
   AND compliance_kb.content = 'SLA obligations or service availability'
   AND compliance_kb.risk_rating = 'High'
   AND compliance_kb.hybrid_search_alpha = 0.6
   AND customer_data_kb.hybrid_search_alpha = 0.5;

We can see that companies like Mid-Market with a SLA tier Gold and Zoom with a SLA tier silver have reported billing errors.

And we can identify the title of the compliance documentation to be able to retrieve them and also the summary snippet that indicates that the customer is at a high risk.

You can determine which customer complaints map to specific regulatory requirements—such as access control under HIPAA or GDPR—so teams can prioritize compliant remediation.

-- Understand which customer complaints map to regulatory requirements (e.g., access control under HIPAA/GDPR).
SELECT *
FROM customer_data_kb
JOIN compliance_kb
 ON customer_data_kb.id = compliance_kb.id
WHERE
   customer_data_kb.content = 'login or authentication failures'
   AND compliance_kb.regulation IN ('HIPAA','GDPR')
   AND compliance_kb.hybrid_search_alpha = 0.6
   AND customer_data_kb.hybrid_search_alpha = 0.5;

Mid-Market has been identified as a customer that reported login failures.

And we can see the compliance document linked to their account and their SLA for HIPAA is at critical risk.

This join is useful whenever you need to turn regulatory risk into customer action. A few concrete reasons:

Regulatory impact mapping (Compliance): Identify which EU customers are associated with high-risk GDPR issues so you can prioritize audits, remediations, and policy updates.
Proactive outreach (Customer Success): Find accounts whose notes mention privacy/security concerns and who are linked to risky docs—then trigger playbooks, comms, or health-score adjustments.
Renewal & churn defense (Sales/CS): Surface at-risk Enterprise customers before renewal to address compliance blockers that could stall deals or cause churn.
Incident readiness (Security/Privacy): Quickly assemble the list of potentially impacted customers when a policy gap is discovered; drive MTTR down with targeted actions.
Executive reporting (Leadership): Provide a concise, customer-level view of GDPR exposure with recent dates, jurisdictions, and risk ratings.
Audit evidence (Legal/Compliance Ops): Demonstrate due diligence by showing how risky findings were mapped to customers and addressed.
Prioritization & capacity planning (Ops): Focus limited remediation resources on customers with both high document risk and recorded privacy issues.
Product & docs improvements (PM/Enablement): Spot recurring themes (e.g., “data retention”) to inform roadmap, docs, or training.

Bonus: Query Your Knowledge Base in Natural Language with AI Agents.

MindsDB’s AI Agents allows you to query your data using natural language. You can create an agent with the compliance_kb and customer_data_kb knowledge base with the CREATE Agent statement.

CREATE AGENT enterprise_vendor_agent
USING
 data = {
   "knowledge_bases": ["mindsdb.compliance_kb", "mindsdb.customer_data_kb"]
 },
 prompt_template = '
   The compliance_kb contains compliabe documents with metadata (date, jurisdiction, risk_rating, customer_id).
   The customer_data_kb contains customer information, notes and feedback with metadata (region, account_type, company_name).
   When answering, correlate compliance risks with affected customers and summarize actionable next steps.
 ';

Here is the breakdown of the syntax executed:

enterprise_vendor_agent: The name provided to the agent.
data: This parameter stores data connected to the agent. Here we store the compliance_kb and customer_data_kb knowledge bases.
prompt_template: Here instructions are provided to the agent, with descriptions of the data sources listed in the knowledge bases.

Let’s try to to pinpoint which specific policies, if updated first, will most quickly reduce compliance risk for your highest-value (enterprise) customers—i.e., a prioritized, evidence-backed list of policies ranked by severity, number of affected top accounts, recency of findings, upcoming renewal/audit dates, and customer signals (notes about privacy/security concerns), so you know what to fix first, for whom, and why.

SELECT answer
FROM enterprise_vendor_agent
WHERE question = 'Which policies should we update first to reduce compliance risk for our top enterprise accounts?';

You can also find out what are the most common, high-impact customer problems in Europe related to privacy and authentication, so you can prioritize fixes and actions. Concretely, you want a ranked list of recurring issues (e.g., SSO failures, MFA friction, data-retention confusion), with frequency, severity, affected accounts/tiers, trend over time, and links to any related compliance risks, plus clear recommendations (engineering fixes, security controls, UX/doc updates, CS outreach) that will measurably reduce incidents and risk.

SELECT answer
FROM enterprise_vendor_agent
WHERE question = 'Summarize top recurring customer issues related to privacy or authentication in Europe. Provide recommended actions.';

Enterprise Value for Vendors

MindsDB can arm product, sales, success, and executive teams with security-first, AI-native capabilities that boost adoption, differentiate your platform, elevate customer satisfaction, and build buyer trust.

Product Leaders: Offer AI-native compliance and analytics features that drive adoption and renewals.
Sales Teams:Differentiate your platform by demonstrating security-first, compliance-aware AI capabilities.
Customer Success Managers: Help enterprise clients monitor risks and improve satisfaction in one workflow.
Executives: Build trust with buyers by proving your product can handle both innovation and compliance.

Conclusion

For enterprise software companies, the stakes are high: miss a compliance requirement, and you face penalties; miss customer signals, and you risk churn. MindsDB Knowledge Bases with Hybrid Search unify these critical data streams into a single, intelligent layer—delivering accurate, compliant, and actionable insights in real time.

Start exploring how MindsDB can help your teams bridge compliance and customer intelligence today by contacting our team.