<- Back to blog

How to unlock the power of natural language processing in your database

52
52

Information is an incredibly valuable asset. 

Your databases are full of data … but extracting meaning from that text-based data is challenging. MindsDB, the leading and fastest-growing open-source applied machine learning platform in the world, makes integrating Natural Language Processing solutions into your databases seamless.

In this tutorial, you will learn how MindsDB integrates databases with pre-trained natural language models from Hugging Face, how to extract meaning from a sample database's text data, and how to convert that meaning into valuable insights with a sentiment analysis example. 

And the end of this tutorial, you will be able to deploy NLP models on your own data.

Let’s begin!

The strategy, Transformers as AI Tables

Since the publication of "Attention is all you need" — a paper authored by Google researchers back in 2017 that introduced the "Transformer" deep neural network— the machine learning community has rapidly taken this neural architecture to new heights.

Nowadays, transformers are state-of-the-art in Natural Language Processing (the domain for which it was initially devised) and in other fields like image classification, image generation, 3D scene reconstruction, robotics, and protein folding.

Most business organizations that stand to benefit from Machine Learning (ML) don't have such exciting problems, however. Usually, data is stored in a structured format in some database or data warehouse. Transformers are well suited to this, too, enabling stunning performance in use cases like text classification, summarisation, and generation.

However, deploying models into production environments can be a formidable challenge, as data scientists and machine learning engineers tend to build pipelines with research-oriented tools, which means a costly transition, in both time and money, when a model is finally ready to be used live.

One approach to simplify deployment for use cases with tabular data is considering machine learning models as another table in your database. This strategy can significantly speed up the data-gathering process and model lifecycle for developers and ML engineers. 

To achieve this, we can use MindsDB; an open-source applied ML platform that handles the engineering required to expose these models as tables from SQL. See an “AI Tables” explained video.

What NLP tasks are possible inside databases. How it works.

All MindsDB integrations are defined by what are called "handlers." For ML frameworks, in particular, these Python classes explicitly define how an AI table should be created and called with SQL statements. Depending on the ML framework, this logic can be quite involved.

For HuggingFace's transformers, the handler exposes all pre-trained models in the HuggingFace Hub that support one of the following tasks:

  • Standard text classification: we have a set of possible predefined labels, and the model will classify some input text into one of them. Some of the most popular use cases, like sentiment analysis, hate speech or spam detection are usually framed as standard text classification tasks.
  • Zero-shot text classification: a variant of the standard formulation that provides labels dynamically without requiring any model retraining.
  • Translation: conceptually simple – pass some text in an original language and translate it into a target language.
  • Summarization: Summarise the input text without losing the original meaning.

Regarding input data, the expectation is to operate over a column containing text in any given table from your database. The output data type will vary depending on what use case is tackled.

Each use case will have slightly different procedures for data cleaning, which are internally defined as part of the handler logic.

Whatever artifacts are downloaded from the Hugging Face Hub will be saved in a common storage, which helps avoid duplicate frozen weights amongst different MindsDB models. When such a model has been created, and it's time to generate predictions, the procedure feeds the input column into the transformer, produces an output column, and hands it over to the next stage, which will join the result according to the input SQL query.

The integration of Hugging Face with MindsDB unlocks the full potential of transformers across all supported data sources, in addition to interacting with other ML models that are also exposed as AI tables (and that may come from very different frameworks!).

Concept recap before the tutorial

NLP: Natural language processing

MindsDB: Open-source software that brings Machine learning to Databases.

Hugging Face: Transformer library. Think about it as pre-trained machine learning models with specific tasks in mind.

Sentiment Analysis: The task of inferring the emotion behind a text.

Prerequisites

To follow along, you can sign up for an account at cloud.mindsdb.com. Alternatively, head to docs.mindsdb.com and follow the instructions to manually set up a local instance of MindsDB via docker or pip.

Sentiment Analysis via SQL in practice

Let's see some action. This tutorial will go through a sequence of MindsDB-flavoured SQL commands that can get you a live transformer model to generate predictions for tables in your database. Note that for this example, you will use a table from our PostgreSQL public demo instance, so let’s start by connecting the MindsDB instance to it:

CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "3.220.66.106",
    "port": "5432",
    "database": "demo"
    };
 

Now let’s check the demo data to be used in the example:

SELECT * FROM example_db.demo_data.user_comments;
 

Here is the output:

Let’s mention one small nuance regarding switching between projects inside MindsDB. Projects are a natural way to keep artifacts (models, views, etc.) separate according to what predictive task they are solving, more about that here. If you worked on another project or database previously, you will need to move to the default MindsDB project database called mindsdb by executing the following command: 

USE mindsdb;
 

Now let's enable a Hugging Face pre-trained classification-type model to identify sentiment for each of these sentences: 

CREATE MODEL sentiment_classifier
PREDICT sentiment
USING
  engine = 'huggingface',
  task = 'text-classification',
  model_name = "cardiffnlp/twitter-Roberta-base-sentiment",
  input_column = 'comment',
  labels = ['negative','neutral','positive'];
 

In practice, this points MindsDB to generate an “AI table” called sentiment_classifier that will use the HuggingFace integration to predict a column named sentiment.

TheUSING clause specifies a few additional flags that this particular “handler” requires. The task parameter defines the behaviour of the pre-trained model that is downloaded from the hub with the URL in model_name. Finally, we declare the name of the input_column and the labels this model knows how to classify.

Please note that the input_column key should match the column name from the demo table you have seen in the previous step. For example, we use the user_comments table that has a comment column, as shown above. This is required for the JOIN queries to work (see next steps).

Once the above query has started execution, we can check the status of the creation process with the following query:

SELECT * FROM models
WHERE name='sentiment_classifier';
 

It may take a while to register as complete depending on the internet connection that the MindsDB has access. After all, we need to download and store the weights of a (potentially heavy) transformer model!

Once the creation is complete, the behavior is the same as with any other “AI table” – you can query it using SQL. Let’s specify some synthetic data first:

SELECT * FROM sentiment_classifier
WHERE comment ='It is really easy is to do NLP with MindsDB';
 

In our example, the model classified the sentiment correctly:

MindsDB also provides an ‘explain’ column with information about the probabilities of all sentiments, which is helpful for testing.

Finally, let’s apply sentiment analysis to the ‘user comments’ demo table. Use the command below to JOIN this table with the model, which will add the sentiment labels to the whole array.

SELECT INPUT.comment, model.sentiment
FROM example_db.demo_data.user_comments AS INPUT
JOIN sentiment_classifier AS model;
 

The above query will yield the text passed as input to the transformer and the predicted class for each string. At last, we can use transformer models with minimal friction, augmenting data with predictive insights in a way that will result straight forward to most database-savvy people.

Conclusion

In this tutorial, you have learned how MindsDB integrates databases with pre-trained natural language models from Hugging Face, how to extract meaning from a sample database's text data, and how to convert that meaning into valuable insights with a sentiment analysis example. 

Now, you can deploy NLP models on your own data!

Get started with MindsDB at no cost today at mindsdb.com   

Finally, if MindsDB's vision to democratize ML sounds exciting, head to our community Slack, where you can get help and find people to chat about using other available data sources, ML frameworks, or writing a handler to bring your own!

PREVIOUS POST

NEXT POST

Previous post goes here

This will be removed from the page. It's just here as a sample.

Behind me is an 'empty state' div

There are no previous posts sorry!

Go back to blog

Next post goes here

This will be removed from the page. It's just here as a sample.

Behind me is an 'empty state' div

There are no next posts

Go back to blog
This is some text inside of a div block.