The currently available Automated Machine Learning (AutoML) tools promise to make ML easy and affordable. But despite the lucrative promises and bold statements about replacing data scientists with software, it’s not happening anytime soon.
This article will check a new data-centric construct called AI Tables towards its chance to make self-service ML easy for data engineers, developers, and business analysts.
Let’s get started.
To become an insight-driven organization (IDO), first and foremost, you need data and the tools to manipulate and analyze it. Another essential component is the people, i.e., data analysts or data scientists with appropriate experience. And last but not least, you need to find a way to implement insight-driven decision-making processes across your company.
The technology that lets you make the most out of your data is Machine Learning. The ML flow starts by using your data to train the predictive model. And later, it answers your data-related questions. The most effective technology for Machine Learning is Artificial Neural Networks. Their design is influenced by our current understanding of how the human brain works. And given the great computing resources people have nowadays, it can lead to incredible models trained with a lot of data.
Nowadays, companies use various automation software and scripts to get different tasks done without human errors. Similarly, you can avoid human mistakes in your decision-making processes by basing decisions exclusively on your data.
The majority of businesses do not use AI or ML to handle their data. For example, the US Census Bureau shared that, as of 2020, less than 10% of US businesses had adopted Machine Learning. These include primarily large companies.
Let’s look into the biggest obstacles that stand on the way to adopting ML.
Do you want to succeed in deploying automated machine learning at your company? AutoML tools are crucial but remember to focus on processes, methods, and strategies. AutoML platforms are just tools, and most ML experts agree that it’s not enough.
Any ML process starts with data. It’s commonly agreed that the data preparation step is the most significant roadblock of an ML process. Next, the modeling part is just a piece of the whole data science pipeline, and AutoML tools simplify it. But the complete workflow still requires much effort to transform data and supply it to the models. And this is not helped by the fact that data preparation and data transformation are the most time-consuming and the least enjoyable part of the job.
And, of course, the business data used to train ML models is updated regularly. Hence, it requires the companies to build complex ETL pipelines that utilize sophisticated tools and processes. So making the ML process continuous and real-time is a challenging task.
Assume that now we have our ML Model built, and we need to deploy it.
The classical deployment approach treats it as an application layer component as per the diagram below.
Its input is data, and we get the predictions as an output. Our business applications consume these predictions from ML tools via APIs used by developers to integrate these apps. Sounds straightforward from the developers’ point of view, right?
As easy as it can be for developers, it is not as easy when considering processes. Any integration with the business-critical app in a reasonably-large organization is quite troublesome to maintain. Even if the company is tech-savvy, any code change request must go through specific reviews and testing workflows that involve multiple stakeholders. And that negatively impacts flexibility and adds complexity to the whole workflow.
It is much easier to experiment with ML-based decision-making when having enough flexibility in testing various concepts and ideas. So you would prefer something that would give you a self-service capability.
As we see above, data is the core of ML processes, with existing ML tools taking data and returning predictions, which are also data.
So now the questions arise:
Let’s analyze the abovementioned ML workflow challenges and find the solution by addressing these questions.
Challenge #1: Complex Data Integrations and ETL Pipelines
Maintaining complex data integrations and ETL pipelines between the ML Model and a database is one of the biggest challenges faced by ML processes.
SQL is the best tool for data manipulation, so we can solve this problem by bringing ML models inside the data layer, not the other way around. In other words, ML models would learn and return predictions inside the database.
Challenge #2: ML Models Integrations with Apps
Another challenge that generates an avalanche of issues is integrating models with the business applications via the APIs.
Business applications and BI tools are tightly-coupled with databases. So, if the AutoML tools become a part of the database, we can make predictions using standard SQL queries. What follows is that no API integrations between ML models and business apps are necessary anymore because models reside within the database.
Solution: Embedding AutoML within the Database
Embedding AutoML tools within the database brings many benefits, such as the following:
The relatively complex diagram presented in section Integrating ML with Apps and Change Management changes into the following:
It looks simpler and makes the ML processes smooth and efficient.
So now we know the solution to the main challenges, let’s implement it.
For that, we use a construct called AI Tables. It brings machine learning in the form of virtual tables into data platforms. Such an AI Table is created like any other database table and then exposed to applications, BI tools, and DB clients. We make predictions by simply querying the data.
AI Tables were initially developed by MindsDB and are available as open-source or as a managed cloud service. They integrate with traditional SQL and NoSQL databases and data streams like Kafka & Redis.
The concept of AI Tables enables us to perform ML processes within the database. So that all the steps of an ML process (that is, data preparation, training the model, and making predictions) take place through the database.
Training AI Tables
The user specifies a source table or view from which an AI Table learns automatically. To create an AI Table, use a single SQL command, shown in the following section.
AI Table is a machine learning model consisting of features equivalent to columns of a source table. AutoML engine automates the remaining modeling tasks. Nevertheless, experienced ML engineers can specify model parameters through a declarative syntax called JSON-AI.
Once you create an AI Table, it is ready to use. It doesn’t require any further deployment. To make a prediction is to run a standard SQL query on an AI Table and consider that data we ask for is already there – although it will be created on the fly as we ask for it.
You make predictions either one by one or in batches. AI Tables can handle many complex machine learning tasks like multivariate time-series, detecting anomalies, and more.
Let’s look at the real-world example. We’ll predict a stock for a retailer to generate better incomes by having the right products at the right time.
One of the intricate tasks of being a retailer is to have all the products available in stock at the right time. When the demand grows, the supply must increase. Your data can take the load of handling this task. All you need is to keep track of the following information:
Let’s visualize the above data in a table:
Based on these data, and using Machine Learning processes, we can predict how many items of a given product should be in stock at a given date.
To create AI Tables that utilize your data, you must first allow MindsDB to access your data. You can do it by connecting your database to your MindsDB. It is quite a straightforward task. The detailed instructions are available in the MindsDB documentation.
AI Tables are like ML models, so you must train them using past data.
Below is a simple SQL command that trains an AI Table:
Let’s analyze this query:
As a result, you can see the overall accuracy score and confidence for every prediction and estimate which columns are the most important for better results.
In databases, we often need to deal with tasks that involve multivariate time-series data with high cardinality. And if we use traditional approaches, it requires quite an effort to create such ML models. We need to group data and order it by a given time, date, or timestamp data field.
For example, we can predict the number of hammers sold by a hardware store. Here, data is grouped by the shop and product values, and a forecast is made for each distinct combination of shop and product values. But it is also crucial to know when a specific number of a given product will be sold. That brings us to the problem of creating time series models for each group.
It may sound like a lot of work, but MindsDB provides the means to create a single ML model to train the multivariate time series data at once using the GROUP BY statement. Let’s see how it is done using just a single SQL command.
Now, we’ll create a predictor for our sales data.
The stock_forecaster predictor uses the sales data to predict how many items will be sold by a specific shop in the future. The data is ordered by the date of sale and grouped by the shop. So we can predict the amount value for each shop value.
Let’s make some predictions using our stock_forecaster predictor.
We can get bulk predictions for many records at once by joining our sales data table with the predictor using the below query.
The JOIN operation adds the predicted amount to the records. So you get predictions for many rows at once.
If you want to learn more about analyzing and visualizing predictions in BI tools, check out this article.
The traditional approach treats ML Models as standalone apps that require maintaining ETL pipelines to a database and API integrations to business applications. Even though the AutoML tools make the modeling part effortless and straightforward, profound specialists still need to manage the complete ML workflow.
But databases are already the best tool for data preparation. Thus, it makes more sense to bring ML to data and not the other way round.
The construct of AI Tables from MindsDB enables self-service AutoML for data practitioners and streamlines machine learning workflows. It is because the AutoML tool resides within the database.
Let’s summarize the benefits of it:
AI Tables can help companies make data-driven decisions using existing tools and personnel.
Want to try it out yourself?
If this article was helpful, please give us a GitHub star here.