In this blog post, you will learn how to easily build forecasting solutions without having to write extensive machine learning code, making the AI development process faster and more accessible for developers.
You will use Nixtla’s StatsForecast engine integrated into the familiar developer environment by MindsDB - a powerful AI “middleware” platform that makes it easier than ever for #developers to build AI-powered applications.
The advantages brought to time series forecasting by the StatsForecast engine integration with MindsDB include fast and accurate implementations of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, anomaly detection, and more.
Read along to follow a tutorial on forecasting monthly expenditures with code examples and demo data.
StatsForecast is optimized for high performance and scalability and uses classical methods such as ARIMA, rather than deep learning. Models train very quickly and generalize well, so are unlikely to overfit. They also perform well on short time series, where deep learning models may be more likely to overfit. We’ll go through an example to predict monthly expenditures of various categories for the next quarter.
In this tutorial, we create a model to predict expenditures based on historical data using the StatsForecast engine.
We use a collection from our Mongo public demo database, so let’s start by connecting MindsDB to it from Mongo Compass or Mongo Shell:
Now that we’ve connected our database to MindsDB, let’s query the data to be used in the example.
Here is the output:
The historical_expenditures collection stores monthly expenditure data for various categories, such as food, clothing, industry, and more.
Before we can create a model, we should create a StatsForecast ML engine.
We can list all ML engines with this command:
Please make sure that the StatsForecast engine exists.
Let's create a model to predict the expenditures:
In practice, the insertOne method triggers MindsDB to generate an AI collection called quarterly_expenditure_forecaster that uses the StatsForecast integration to predict a field named expenditure. The model lives inside the default mindsdb project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.
While creating time series forecasting models, we define the following parameters under the timeseries_settings clause:
Please note that the window clause is not required because StatsForecast automatically calculates the best window as part of hyperparameter tuning.
The engine parameter in the training_options clause specifies the ML engine used to make predictions.
Alternatively, if your data has a hierarchical structure, you may use hierarchical reconciliation to improve prediction accuracy. To do so, the below insertOne method uses Nixtla’s HierarchicalForecast package.
Here we use the hierarchy parameter in the training_options clause to define the column that categorizes the data.
We can check the training status with the following query:
One of the pros of using the StatsForecast engine is that it is fast - it doesn’t take long until the model completes the training process.
Once the model status is complete, the behavior is the same as with any other AI collection – you can query for batch predictions by joining it with a data collection:
Here is the output data:
The historical_expenditures collection is used to make batch predictions. Upon joining the quarterly_expenditure_forecaster model with the historical_expenditures collection, we get predictions for the next quarter as defined by the horizon: 3 clause.
Please note that the output month column contains both the date and timestamp. This format is used by default, as the timestamp is required when dealing with the hourly frequency of data.
MindsDB provides the latest keyword that marks the latest training data point. In the WHERE clause, we specify the month > latest condition to ensure the predictions are made for data after the latest training data point.
Let’s consider our quarterly_expenditure_forecaster model. We train the model using data until the third quarter of 2017, and the predictions come for the fourth quarter of 2017 (as defined by horizon: 3).
Let’s look at a graph that visualizes the historical and predicted data.
By integrating databases and Nixtla’s StatsForecast engine using MindsDB, developers can easily forecast future events based on historical data.
The StatsForecast engine offers numerous time series forecasting models optimized for high performance and scalability. Nixtla developed a library of models that can efficiently fit millions of time series.
Features provided by Nixtla’s StatsForecast engine include the implementation of models, probabilistic forecasting and confidence intervals, support for exogenous variables and static covariates, and anomaly detection. If you have technical questions regarding model behavior, the best resource is Nixtla’s community Slack.
MindsDB is now the fastest-growing open-source applied machine-learning platform in the world. Its community continues to contribute to hundreds of data-source and ML-framework integrations. Stay tuned for the upcoming features - including more control over the interface parameters and fine-tuning models directly from MindsDB!
Experiment with Nixtla’s StatsForecast models within MindsDB and unlock the ML capability over your data in minutes. Remember to sign-up for a free demo account and follow the tutorials, perhaps this time using your data.
Finally, if MindsDB's vision to democratize ML sounds exciting, head to our community Slack, where you can get help and find people to chat about using other available data sources, ML frameworks, or writing a handler to bring your own!