Machine Learning Inside the Database – Predictions at the Speed of Questions
Table of contents
"Years ago, when I first started working on the Hubble Space Telescope project for NASA, I was a young astrophysicist, very familiar with coding, modeling, and data analysis. But up to that point, I never had any interactions or familiarity with databases. Databases lived in the financial world, not in the world of scientific research. Of course, that reality has changed dramatically over the years – databases are everywhere, containing massive quantities of data, representing invaluable corporate knowledge.
In those early days, no scientists were querying databases and certainly not doing advanced analytics with databases. But I was taking my first steps in that journey. As the youngest member of the team, I was tasked by NASA managers to access and analyze the data in the Hubble Telescope science end-user database to answer a wide variety of different questions on-the-fly. These ad hoc questions would include: Which science instruments were most used (camera, spectrograph, …)? What modes of the instruments were most accessed? Which objects were most requested (stars, planets, galaxies, …)? How many telescope resources were required for different observations?
I became a database expert, providing answers to Hubble and NASA senior leaders at the speed of their questions. However, when they asked me to make projections about the resource utilization for future, novel applications of the telescope’s massive capabilities (over 400 different modes of operation), I could not do that with a simple database query. I had to tell them, “I will get back to you on that.”
Then, I would go back to my office. There I would query the database for historical data (descriptive analytics, we now call that), extract the data to a file, manually generate some hand-crafted code to read that data file, and build regression models (predictive analytics, we now call that), and run the model against the input parameters (from the leaders’ questions), and then “get back to them” with the answer. For each one of their questions, I had to “rinse and repeat” the same procedures, going back and forth from the database to my code, then back to the database, and so on. It was very tedious. Nevertheless, I became very good at it – so much so that I won a NASA individual achievement award for my efforts.
Well, that scenario plays out (in a different form, but fundamentally the same) everywhere today in businesses when leaders demand data-driven insights from huge databases to inform their business decisions and actions. Leaders ask their database users to provide not only descriptive analytics but also important predictive analytics. Unfortunately, those same employees are neither trained nor equipped to address the forecasting questions that are coming to them fast at the speed of business. Hence, business owners believe they need to hire teams of data scientists to help with this, which is not a viable solution for many small and medium-sized organizations. Fortunately, there is a better solution!
Since a significant amount of the data needed to feed machine learning models resides in databases, it would be most beneficial to empower existing database users to do machine learning with the tools that they already use and without disrupting their existing workflows. This is now possible. There is a new database power in town – machine learning (ML) in the database (DB).
With ML in the DB, non-data scientists (i.e., traditional business users of databases) can easily create machine learning models in the database and generate predictions from those models at the speed of questions (i.e., based on different scenarios; for different and new combinations of customers, products, and services; for different seasons, days of the week, or times of day; etc. – all of which remind me of my early days of answering ad hoc questions from organizational leaders on-the-fly). Now every business database user has the power to be a predictive analytics (forecasting) guru in their organization. Democratizing machine learning never looked so democratized!
What is even better is that ML in the DB can come with AutoML (Automatic Machine Learning) functionality. The AutoML feature can not only select the best data to feed into the model but also selects the best algorithm to use. This is essential since different algorithms are needed to address different business analytics use cases (prediction, optimization, detection, discovery), and different algorithms require different types of data input (numeric, discrete, continuous, binned, ratio, ordinal). With AutoML, businesses can automatically get predictability from their data with confidence, at higher speed and lower costs. This can be viewed as the “future of AI and ML” in business.
But it does not stop there. ML in the DB is a gift that keeps on giving. It also includes explainability as a feature. Explainable ML and AI (XAI) essentially reveals the key features in the predictive model’s input data that contribute most to the specific predictions of the model. This new capability allows businesses to understand forecasts and to deep dive into the models they have built. With XAI, the predictions no longer come from a “black box,” but each result has an explanation.
This XAI functionality is now required in organizations for many reasons to address concerns about bias, ethics, risk, and compliance, by demanding that models be explainable and transparent. Consequently, this explainability function automatically corresponds to insights discovery – providing insight into which inputs produce which outputs. If those inputs are features that are controllable by the business (e.g., sales promotions, marketing campaigns, rebate offers, etc.), then the XAI insights that the “ML in the DB” function provides can inform a prescriptive analytics model.
Prescriptive analytics is essentially an optimization ML model that responds to this business request: “Don’t just tell me what will happen but tell me what we can do to optimize what happens. What will produce an optimal outcome?” At this point, the traditional business database end-user has truly made great strides on a remarkable journey, from answering descriptive questions about past performance to predicting future performance to delivering insights discovery for better decisions and actions.Business forecasting and optimization (predictive and prescriptive analytics) can now be achieved inside the database from users who are not data scientists. Business database users can easily deploy “the best” ML models (with AutoML) and understand the results (with XAI) using standard familiar database queries, without the extra steps of exporting the workload elsewhere. These users no longer need to say, “I will get back to you on that.” They now have the power both to impress and to impact the business with ML in the DB, delivering insights and value at the speed of business leaders’ questions. They can now say, “I have the answer for you right here.”
Written by: Dr. Kirk Borne
MindsDB hosted a LIVE webinar with top Big Data and AI influencer Dr. Kirk Borne and MindsDB’s CEO, Jorge Torres, on how to run machine learning inside a database on Tuesday, February 16th, 2021, at 16:00 GMT. During the event, both speakers explained the benefits of bringing machine learning to the data source and how to train & deploy ML models directly in the database using simple SQL.
If you missed the live event you can still watch the recording.