Real-time Machine Learning on Data Streams

Cover Image for Real-time Machine Learning on Data Streams

Opportunities and challenges of real-time machine learning

Machine intelligence can solve problems that involve large amounts of high-velocity data that escape human attempts to monitor and analyze, and can be elusive even for traditional, programmatic monitoring and analysis. Forecasting and anomaly detection on large, multivariate time-series is a good example.

One of the most interesting, and potentially useful applications of machine learning is applying it to data streams. Among streaming service and database providers, one of the fastest growing sets of use cases today is time series, but time series present very difficult machine learning challenges even for most data scientists.

Time series present challenges in data access: reading from a key value store that contains time-stamped data, one must know the precise timestamp to access the value, or else create and manage an index that contains time series that are labeled and related to each other. This comes with a development cost and can be tricky to maintain, especially with edge cases. Additionally, streaming data (including time series) inverses the normal read/write patterns of databases, where there is limited writing, but lots of reading. Streaming data involves lots of writing, much less reading. Indexing and downsampling can make access and storing of this data more manageable.

Finally, time-series, which comprise a large subset of most streaming data, pose a serious challenge to machine learning applications because there is often a very high order of cardinality. Take the industrial sensor example - if you wanted to predict machine failure or maintenance requirements, you would likely be dealing with thousands of sensors. Each sensor (at each industrial facility) would have its own set of time stamps. This would mean many thousands of individual time series which normally would each require a machine learning model to be trained and launched in production.

In this article, I would like to discuss an integrated solution from Redis and MindsDB that overcomes these challenges.

Use Redis Streams for handling Time-Series data

Redis supports streaming data structures through a high-performance in-memory database, and features that allow collection of large volumes of high velocity data. Redis Streams also supports many data channels between producers and many consumers of data, who may use different production and consumption rates, and asynchronous communication between producers and consumers. Details on using Redis Streams can be found here.

For capturing, storing and accessing time series data, Redis has deployed its ‘Time Series Data Structures’ which allows high volume inserts (lots of writing for streaming data) and low latency reads, in addition to downsampling and features that would allow querying by time period, aggregated queries and labelling (field value pairs) to mitigate the need for creating indices from scratch. More features and details on Redis Time Series can be found here.

Machine Learning at the Data Layer is more efficient

Connecting MindsDB’s machine learning platform to Redis Streams is an exemplary way of handling time-series on streaming data. MindsDB provides an automated machine learning platform that is integrated with the data layer - a database or other data store, and in this case Redis’s key-value store with support for streaming and time-series data structures. The benefit of using MindsDB on any datastore is that there is a tremendous amount of efficiency in exposing machine learning as a native feature of the database: you can perform forecasting or classification quickly, efficiently where the data lives, and expose this functionality, for example, as simple database commands to the application layer; rather than having to extract data, transform and then load it into a separate ML application, and then again move the resulting predictions to another application. There is a good overview of the benefits of ML at the data layer here.

Bringing Machine Learning to Redis

MindsDB has integrated with Redis Streams and offers features that allow running machine learning predictions on real-time data. MindsDB binds an ML predictor to the Redis data stream, and can train that predictor based on historical data, or on listening to the stream over a defined period of time. Once trained, the MindsDB predictor can produce a second data stream containing predicted values. For example, one can train a predictor on a large series of time-stamped sensor data from an industrial facility and then create a stream in real-time that will give highly accurate predictions of required maintenance or failure events. Quick diagram below - the predictor trains on the input stream, and then creates two new streams: a forecast, and a stream of detected anomalies (where real values fall outside a defined range from the predicted value). The anomalies might signal imminent machine failure.

Machine Learning on Redis

Image: Machine Learning on Redis Streams

MindsDB and Redis - an effective combination to streamlined real-time forecasting

MindsDB has developed its SAIL (Streaming AI Layer) service to include large models specifically designed to handle extensive, multi-variate time series. Let’s get back to an example of predicting machine failure by analyzing a stream of data from multiple sensors. At the beginning of the article, we mentioned that the traditional approach would mean an MLOps nightmare to build and deploy hundreds or thousands of individual time series models. Not anymore!

MindsDB SAIL on Redis Streams and Time Series, allows to easily perform real-time anomaly detection and forecasting on high-volume, high-velocity times series data to solve some of the most difficult problems in machine learning for business. You would only need to train a single model for the above example to predict time-critical failure and maintenance patterns with a high degree of accuracy.

Learn more at RedisConf 2021

MindsDB is proud to partner with Redis, and will be presenting a ‘how-to’ demo and practical use cases at RedisConf 2021. Watch our session with MindsDB CEO Jorge Torres:

About the author - Erik Bovee was a founding partner at Speedinvest, an early stage venture fund with $400M assets under management. He led the seed round in MindsDB and has recently joined the team as Vice President of Business Development.