Information is an incredibly valuable asset.
Your databases are full of data … but extracting meaning from that text-based data is challenging. MindsDB, the leading and fastest-growing open-source applied machine learning platform in the world, makes integrating Natural Language Processing solutions into your databases seamless.
In this tutorial, you will learn how MindsDB integrates databases with pre-trained natural language models from Hugging Face, how to extract meaning from a sample database's text data, and how to convert that meaning into valuable insights with a sentiment analysis example.
And the end of this tutorial, you will be able to deploy NLP models on your own data.
Since the publication of "Attention is all you need" — a paper authored by Google researchers back in 2017 that introduced the "Transformer" deep neural network— the machine learning community has rapidly taken this neural architecture to new heights.
Nowadays, transformers are state-of-the-art in Natural Language Processing (the domain for which it was initially devised) and in other fields like image classification, image generation, 3D scene reconstruction, robotics, and protein folding.
Most business organizations that stand to benefit from Machine Learning (ML) don't have such exciting problems, however. Usually, data is stored in a structured format in some database or data warehouse. Transformers are well suited to this, too, enabling stunning performance in use cases like text classification, summarisation, and generation.
However, deploying models into production environments can be a formidable challenge, as data scientists and machine learning engineers tend to build pipelines with research-oriented tools, which means a costly transition, in both time and money, when a model is finally ready to be used live.
One approach to simplify deployment for use cases with tabular data is considering machine learning models as tables in your database. This strategy can significantly speed up model creation and deployment for developers and ML engineers.
To achieve this, we can use MindsDB; an open-source applied ML platform that handles the engineering required to expose these models as tables from SQL. See an “AI Tables” explained video.
All MindsDB integrations are defined by what are called "handlers." For ML frameworks, in particular, these Python classes explicitly define how an AI table should be created and called with SQL statements. Depending on the ML framework, this logic can be quite involved.
For Hugging Face's transformers, the handler exposes all pre-trained models in the Hugging Face Hub that support one of the following tasks:
Regarding input data, the expectation is to operate over a column containing text in any given table from your database. The output data type will vary depending on what use case is tackled.
Each use case will have slightly different procedures for data cleaning, which are internally defined as part of the handler logic.
Model artifacts are downloaded from the Hugging Face Hub and saved in common storage, which reduces time to create subsequent models. When such a model has been created, and it's time to generate predictions, the procedure feeds the input column into the transformer, produces an output column, and hands it over to the next stage, which will join the result according to the input SQL query.
The integration of Hugging Face with MindsDB unlocks the full potential of transformers across all supported data sources, in addition to interacting with other ML models that are also exposed as AI tables (and that may come from very different frameworks!).
NLP: Natural language processing
MindsDB: Open-source software that brings Machine learning to Databases.
Hugging Face: Transformer library. Think about it as pre-trained machine learning models with specific tasks in mind.
Sentiment Analysis: The task of inferring emotions behind a text.
To follow along, you can sign up for an account at cloud.mindsdb.com. Alternatively, head to docs.mindsdb.com and follow the instructions to manually set up a local instance of MindsDB via docker or pip.
Let's see some action. This tutorial will go through a sequence of MindsDB-flavoured SQL commands that can get you a live transformer model to generate predictions for tables in your database. Note that for this example, you will use a table from our PostgreSQL public demo instance, so let’s start by connecting the MindsDB instance to it:
Now let’s check the demo data to be used in the example:
Here is the output:
Let’s mention one small nuance regarding switching between projects inside MindsDB. Projects are a natural way to keep artifacts (models, views, etc.) separate according to what predictive task they are solving, more about that here. If you worked on another project or database previously, you can move to the default MindsDB project database called mindsdb by executing the following command:
Now let's enable a Hugging Face pre-trained classification-type model to identify sentiment for each of these sentences:
Once the creation is complete, the behavior is the same as with any other “AI table” – you can query it using SQL. Let’s specify some synthetic data first:
In our example, the model classified the sentiment correctly:
MindsDB also provides an ‘explain’ column with information about the probabilities of all sentiments, which is helpful for testing.
The above query returns the input text and its predicted class. At last, we can create predictive insights from our database using transformer models with minimal friction.
In this tutorial, you have learned how MindsDB integrates databases with pre-trained natural language models from Hugging Face, how to extract meaning from a sample database's text data, and how to convert that meaning into valuable insights with a sentiment analysis example.
Now, you can deploy NLP models on your own data!
Get started with MindsDB at no cost today at mindsdb.com
Finally, if MindsDB's vision to democratize ML sounds exciting, head to our community Slack, where you can get help, chat with others, and even get advice on how to build your own integration!