Time Series Forecasting with Nixtla’s StatsForecast and MindsDB using SQL
In this blog post, you will learn how to easily build forecasting solutions without having to write extensive machine learning code, making the AI development process faster and more accessible for developers.
You will use Nixtla’s StatsForecast engine integrated into the familiar developer environment by MindsDB - a powerful AI “middleware” platform that makes it easier than ever for #developers to build AI-powered applications.
The advantages brought to time series forecasting by the StatsForecast engine integration with MindsDB include fast and accurate implementations of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, anomaly detection, and more.
Read along to follow a tutorial on forecasting monthly expenditures with code examples and demo data.
Nixtla’s StatsForecast Integration with MindsDB
Nixtla is an open-source time-series ecosystem that offers model libraries for time series forecasting. And StatsForecast is one of the libraries providing statistical and econometric models.
StatsForecast is optimized for high performance and scalability and uses classical methods such as ARIMA, rather than deep learning. Models train very quickly and generalize well, so are unlikely to overfit. They also perform well on short time series, where deep learning models may be more likely to overfit. We’ll go through an example to predict monthly expenditures of various categories for the next quarter.
Let’s Set up the MindsDB
One way is to sign up for an account at MindsDB Cloud. It is a convenient option as it doesn’t require any installation procedures. You can find the details here.
Alternatively, visit our docs and follow the instructions to manually set up a local instance of MindsDB via Docker or pip. You can also set up MindsDB on AWS following this instruction set.
Tutorial on How to Predict Monthly Expenditures
Connecting the Data
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine.
We use a table from our MySQL public demo database, so let’s start by connecting MindsDB to it:
CREATE DATABASE mysql_demo_db
WITH ENGINE = 'mysql',
PARAMETERS = {
"user": "user",
"password": "MindsDBUser123!",
"host": "db-demo-data.cwoyhfn6bzs0.us-east-1.rds.amazonaws.com",
"port": "3306",
"database": "public"
};
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example:
SELECT *
FROM mysql_demo_db.historical_expenditures
LIMIT 3;
Here is the output:
month | category | expenditure |
1982-04-01 | food | 1162.6 |
1982-05-01 | food | 1150.9 |
1982-06-01 | food | 1160 |
The historical_expenditures table stores monthly expenditure data for various categories, such as food, clothing, industry, and more.
Creating an ML Engine
Before we can create a model, we should create a StatsForecast ML engine.
CREATE ML_ENGINE statsforecast
FROM statsforecast;
We can list all ML engines with this command:
SHOW ML_ENGINES;
Please make sure that the StatsForecast engine exists.
Creating a Model
Let's create a model table to predict the expenditures:
CREATE MODEL quarterly_expenditure_forecaster
FROM mysql_demo_db
(SELECT * FROM historical_expenditures)
PREDICT expenditure
ORDER BY month
GROUP BY category
HORIZON 3
USING ENGINE = 'statsforecast';
In practice, the CREATE MODEL statement triggers MindsDB to generate an AI table called quarterly_expenditure_forecaster that uses the StatsForecast integration to predict a column named expenditure. The model lives inside the default mindsdb project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
While creating time series forecasting models, we define the following parameters:
The ORDER BY clause specifies a column used to sort data. Here we use the month column to order the expenditures data.
The GROUP BY clause defines a column used to divide data into groups. The model makes independent predictions for each partition of data. Here we use the category column to group the expenditures data.
The HORIZON clause specifies how many records are to be predicted. Here we define HORIZON 3. It is equivalent to predicting expenditures for the next quarter.
Please note that the WINDOW clause is not required because StatsForecast automatically calculates the best window as part of hyperparameter tuning.
The ENGINE parameter in the USING clause specifies the ML engine used to make predictions.
Alternatively, if your data has a hierarchical structure, you may use hierarchical reconciliation to improve prediction accuracy. To do so, the below CREATE MODEL statement uses Nixtla’s HierarchicalForecast package.
Here we use the HIERARCHY parameter in the USING clause to define the column that categorizes the data.
CREATE MODEL quarterly_expenditure_forecaster
FROM mysql_demo_db
(SELECT * FROM historical_expenditures)
PREDICT expenditure
ORDER BY month
GROUP BY category
HORIZON 3
USING
ENGINE = 'statsforecast'
HIERARCHY = [‘category’];
We can check the training status with the following query:
SELECT *
FROM models
WHERE name = 'quarterly_expenditure_forecaster';
One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.
Querying for Predictions
Once the model status is complete, the behavior is the same as with any other AI table – you can query for batch predictions by joining it with a data table:
SELECT m.month as month, m.expenditure as forecasted
FROM mindsdb.quarterly_expenditure_forecaster as m
JOIN mysql_demo_db.historical_expenditures as t
WHERE t.month > LATEST
AND t.category = 'food';
Here is the output data:
month | forecasted |
2017-10-01 00:00:00.000000 | 10256.251953125 |
2017-11-01 00:00:00.000000 | 10182.58984375 |
2017-12-01 00:00:00.000000 | 10316.259765625 |
The historical_expenditures table is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster model with the historical_expenditures table, we get predictions for the next quarter as defined by the HORIZON 3 clause.
Please note that the output month column contains both the date and timestamp. This format is used by default, as the timestamp is required when dealing with the hourly frequency of data.
MindsDB provides the LATEST keyword that marks the latest training data point. In the WHERE clause, we specify the month > LATEST condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster model. We train the model using data until the third quarter of 2017, and the predictions come for the fourth quarter of 2017 (as defined by HORIZON 3).
Let’s look at a graph that visualizes the historical and predicted data.
Leverage Timeseries Forecasting Capabilities with MindsDB
By integrating databases and Nixtla’s StatsForecast engine using MindsDB, developers can easily forecast future events based on historical data.
The StatsForecast engine offers numerous time series forecasting models optimized for high performance and scalability. Nixtla developed a library of models that can efficiently fit millions of time series.
Features provided by Nixtla’s StatsForecast engine include implementation of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, and anomaly detection. If you have technical questions regarding model behavior, the best resource is Nixtla’s community Slack.
MindsDB is now the fastest-growing open-source applied machine-learning platform in the world. Its community continues to contribute to hundreds of data-source and ML-framework integrations. Stay tuned for the upcoming features - including more control over the interface parameters and fine-tuning models directly from MindsDB!
Experiment with Nixtla’s StatsForecast models within MindsDB and unlock the ML capability over your data in minutes. Remember to sign-up for a free demo account and follow the tutorials, perhaps this time using your data.
Finally, if MindsDB's vision to democratize ML sounds exciting, head to our community Slack, where you can get help and find people to chat about using other available data sources, ML frameworks, or writing a handler to bring your own!
Have fun!