Real-time Machine Learning in SingleStore with MindsDB
SingleStore and MindsDB recently announced a partnership to advance machine learning innovation and provide machine learning features within the SingleStore database. SinglesStore offers a number of features that enhance training performance for machine learning, in addition to supporting real-time inference. This blog post from MindsDB VP of Business Development, Erik Bovee, outlines this exciting integration.
In previous posts we have explored machine learning at the data layer, machine learning on data streams, and handling difficult machine learning problems such as large, multivariate time series and SingleStore offers a number of features and capabilities that enhance machine learning across all of the use cases we have explored before.SingleStore is a high performance database, originally designed for running in-memory, boasts extremely high-performance at scale, and is particularly well adapted both for on-line analytics and transactions.
Column stats, Histograms, and Windowing functions
Running MindsDB with SingleStore can significantly increase the performance and reduce the computational requirements of training your machine learning algorithms, which can be quite intensive depending on the size and type of data, and the size/characteristics of the ML model.For many machine learning applications, training is done on a subset of data, and not the entire data set. Machine learning engineers are often tasked with extracting statistically important subsets of the data, preparing, and transforming that data for training. SingleStore has two significant features that simplify these tasks and accelerate training: data sampling with automated stats, and windowing functions. SingleStore has statistical features (you can explore them more deeply here) including a feature called ‘autostats’ which gathers information used for query planning, and is also extremely useful for data sampling for machine learning. Autostats provides two types of information:
Column stats, including information on the cardinality of a column (high cardinality can be a challenge in ML and is something that MindsDB handles well) and
Histograms, or range statistics, which provide information on the distribution of data in a column.
Information from column stats is useful for MindsDB in making the best, automated choice of ML model or mixer to train on the data, and a histogram very quickly provides a data sample and statistical information necessary for data preparation and transformation. With ‘autostats’ turned on in SingleStore, you can reduce the time MindsDB would take to prep data, you generate statistics that also contribute to faster training times, and higher quality trained models.Another useful feature for more efficient training is the SingleStore ‘Window’ function.
Similar to SQL GROUP BY, a window function takes an aggregate calculation across multiple rows and returns a value. Unlike GROUP BY, the window function does not collapse the row values into a single aggregate, but maintains the individual row values while appending the relevant aggregate (often a statistical value such as min, max, or mean). Windowing is a very efficient way of taking a subsample and simultaneously generating required aggregate values for training and a great way to sample for training time series models.
Really, Really Fast Inference
Singlestore supports features and characteristics that allow real-time machine learning, similar to leading streaming platforms where customers need to bind a predictor to a data stream and generate a predicted stream, for instance in real time use-cases such as anomaly detection - see ‘Streaming AI Layer’ (SAIL) here. Additionally, SingleStore has several features (full description of database ML support can be found here) that enhance machine learning including
Built-in data pipelines for streaming data,
Tables divided into partitions with each subject to a parallel select query for fast data-piping into ML models), and
Support for high-speed transactions for instantaneous piping into real-time predictors.
If you take the example of anomaly detection, assume that you have a time series which is constantly updated (new entries written to the database). Data from new entries could be piped from SingleStore to MindsDB, and a prediction immediately generated for each time stamp on the fly. Actual and predicted data are compared, and an anomaly is detected in real-time when actual data falls outside the predicted confidence bound.
SingleStore and MindsDB - Top Performance for Real-time Machine Learning
If you are already a SingleStore user, you can take advantage of existing SingleStore features quickly to enable fast, efficient machine learning capability directly in your database by simply connecting SingleStore to the MindsDB service. Sign up for a free MindsDB account, and engage with the responsive community on Slack or GitHub to ask questions, share and express ideas and thoughts.
If this article was helpful, please give us a star on GitHub.