Time Series Forecasting with Nixtla and MindsDB using MongoDB Query Language
In this blog post, you will learn how to easily build forecasting solutions without having to write extensive machine learning code, making the AI development process faster and more accessible for developers.
You will use Nixtla’s StatsForecast engine integrated into the familiar developer environment by MindsDB - a powerful AI “middleware” platform that makes it easier than ever for #developers to build AI-powered applications.
The advantages brought to time series forecasting by the StatsForecast engine integration with MindsDB include fast and accurate implementations of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, anomaly detection, and more.
Read along to follow a tutorial on forecasting monthly expenditures with code examples and demo data.
Nixtla’s StatsForecast Integration with MindsDB
Nixtla is an open-source time-series ecosystem that offers model libraries for time series forecasting. And StatsForecast is one of the libraries providing statistical and econometric models.
StatsForecast is optimized for high performance and scalability and uses classical methods such as ARIMA, rather than deep learning. Models train very quickly and generalize well, so are unlikely to overfit. They also perform well on short time series, where deep learning models may be more likely to overfit. We’ll go through an example to predict monthly expenditures of various categories for the next quarter.
Let’s Set up the MindsDB
One way is to sign up for an account at MindsDB Cloud. It is a convenient option as it doesn’t require any installation procedures. You can find the details here.
Alternatively, visit our docs and follow the instructions to manually set up a local instance of MindsDB via Docker or pip. You can also set up MindsDB on AWS following this instruction set.
Tutorial on How to Predict Monthly Expenditures
Connecting the Data
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine.
Before we start, visit our docs on how to connect Mongo Compass and Mongo Shell to MindsDB.
We use a collection from our Mongo public demo database, so let’s start by connecting MindsDB to it from Mongo Compass or Mongo Shell:
> use mindsdb
> db.databases.insertOne({
'name': 'mongo_demo_db',
'engine': 'mongodb',
'connection_args': {
"host": "mongodb+srv://user:MindsDBUser123!@demo-data-mdb.trzfwvb.mongodb.net/",
"database": "public"
}
})
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example.
> use mongo_demo_db
> db.historical_expenditures.find({}).limit(3)
Here is the output:
{
_id: '63fd2388bee7187f230f56fc',
month: '1982-04-01',
category: 'food',
expenditure: '1162.6'
}
{
_id: '63fd2388bee7187f230f56fd',
month: '1982-05-01',
category: 'food',
expenditure: '1150.9'
}
{
_id: '63fd2388bee7187f230f56fe',
month: '1982-06-01',
category: 'food',
expenditure: '1160'
}
The historical_expenditures
collection stores monthly expenditure data for various categories, such as food
, clothing
, industry
, and more.
Creating an ML Engine
Before we can create a model, we should create a StatsForecast ML engine.
> use mindsdb
> db.ml_engines.insertOne({'name': "statsforecast", "handler": "statsforecast"})
We can list all ML engines with this command:
> db.ml_engines.find()
Please make sure that the StatsForecast engine exists.
Creating a Model
Let's create a model to predict the expenditures:
> use mindsdb
> db.predictors.insertOne({
name: 'quarterly_expenditure_forecaster',
predict: 'expenditure',
connection: 'mongo_demo_db',
select_data_query: 'db.historical_expenditures.find({})',
training_options: {
timeseries_settings: {
order_by: ['month'],
group_by: ['category'],
horizon: 3
},
engine: 'statsforecast'
}
})
In practice, the insertOne
method triggers MindsDB to generate an AI collection called quarterly_expenditure_forecaster
that uses the StatsForecast integration to predict a field named expenditure
. The model lives inside the default mindsdb
project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
While creating time series forecasting models, we define the following parameters under the timeseries_settings
clause:
The
order_by
clause specifies a field used to sort data. Here we use themonth
field to order the expenditures data.The
group_by
clause defines a field used to divide data into groups. The model makes independent predictions for each partition of data. Here we use thecategory
field to group the expenditures data.The
horizon
clause specifies how many records are to be predicted. Here we definehorizon:3
. It is equivalent to predicting expenditures for the next quarter.
Please note that the window
clause is not required because StatsForecast automatically calculates the best window as part of hyperparameter tuning.
The engine
parameter in the training_options
clause specifies the ML engine used to make predictions.
Alternatively, if your data has a hierarchical structure, you may use hierarchical reconciliation to improve prediction accuracy. To do so, the below insertOne
method uses Nixtla’s HierarchicalForecast package.
> use mindsdb
> db.predictors.insertOne({
name: 'quarterly_expenditure_forecaster',
predict: 'expenditure',
connection: 'mongo_demo_db',
select_data_query: 'db.historical_expenditures.find({})',
training_options: {
timeseries_settings: {
order_by: ['month'],
group_by: ['category'],
horizon: 3
},
engine: 'statsforecast',
hierarchy: ['category']
}
})
Here we use the hierarchy
parameter in the training_options
clause to define the column that categorizes the data.
We can check the training status with the following query:
> db.models.find({
name: 'quarterly_expenditure_forecaster'
})
One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.
Querying for Predictions
Once the model status is complete
, the behavior is the same as with any other AI collection – you can query for batch predictions by joining it with a data collection:
> db.quarterly_expenditure_forecaster.find({
"collection": "mongo_demo_db.historical_expenditures",
"query": {"category": "food"}
}).limit(3)
Note that by default the predictions are made for month > LATEST
, that is, for future months not present in the training dataset.
Here is the output data:
{
_id: '63fd2388bee7187f230f58a5',
month: 2017-10-01T00:00:00.000Z,
category: 'food',
expenditure: 10256.251953125
}
{
_id: '63fd2388bee7187f230f58a4',
month: 2017-11-01T00:00:00.000Z,
category: 'food',
expenditure: 10182.58984375
}
{
_id: '63fd2388bee7187f230f58a3',
month: 2017-12-01T00:00:00.000Z,
category: 'food',
expenditure: 10316.259765625
}
The historical_expenditures
collection is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster
model with the historical_expenditures
collection, we get predictions for the next quarter as defined by the horizon:3
clause.
Please note that the output month column contains both the date and timestamp. This format is used by default, as the timestamp is required when dealing with the hourly frequency of data.
MindsDB provides the latest
keyword that marks the latest training data point. In the WHERE
clause, we specify the month > LATEST
condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster
model. We train the model using data until the third quarter of 2017, and the predictions come for the fourth quarter of 2017 (as defined by horizon:3
).
Let’s look at a graph that visualizes the historical and predicted data.
Leverage Timeseries Forecasting Capabilities with MindsDB
By integrating databases and Nixtla’s StatsForecast engine using MindsDB, developers can easily forecast future events based on historical data.
The StatsForecast engine offers numerous time series forecasting models optimized for high performance and scalability. Nixtla developed a library of models that can efficiently fit millions of time series.
Features provided by Nixtla’s StatsForecast engine include the implementation of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, and anomaly detection. If you have technical questions regarding model behavior, the best resource is Nixtla’s community Slack.
MindsDB is now the fastest-growing open-source applied machine-learning platform in the world. Its community continues to contribute to hundreds of data-source and ML-framework integrations. Stay tuned for the upcoming features - including more control over the interface parameters and fine-tuning models directly from MindsDB!
Experiment with Nixtla’s StatsForecast models within MindsDB and unlock the ML capability over your data in minutes. You can access MindsDB via our local Docker installation or MindsDB extension on Docker Desktop and follow the tutorials, perhaps this time using your data.
Check out the webinar on Machine Learning inside MongoDB for Full-Stack developers to learn more.
Finally, if MindsDB's vision to democratize ML sounds exciting, head to our community Slack, where you can get help and find people to chat about using other available data sources, ML frameworks, or writing a handler to bring your own!
Have fun!