Introduction to Jobs: A Feature for Automated Machine Learning Workflows

Cover Image for Introduction to Jobs: A Feature for Automated Machine Learning Workflows

We are excited to introduce to our community a long awaiting new feature - job scheduling. This feature enables you to automate repetitive tasks, set up regular model re-training, and easily manage machine learning workflows. Whether you're a seasoned developer or data scientist, or just getting started with AI, job scheduling will make your life easier. So, let's dive in and explore this exciting new addition to the MindsDB platform!

In this blog post, we'll introduce the CREATE JOB syntax with all its functionalities. After that, we'll present real-world use cases of job scheduling.

Introducing the CREATE JOB Statement

You may be familiar with the cron jobs in Linux or the task scheduler in Windows - the CREATE JOB statement in MindsDB is conceptually similar. You can schedule the execution of your SQL code to automate your ML workflow.

Here is how to create a job in MindsDB:

CREATE JOB job_name (
   mindsdb_sql_query_1;
   mindsdb_sql_query_2
)
START <date>
END <date>
EVERY [number] <period>;

Let’s briefly analyze this syntax:

  • The name of the job follows right after the CREATE JOB statement.

  • Within the parentheses, we can list an arbitrary number of SQL statements to be executed by this job.

  • You can include the START clause that defines when the job should start its execution. If not provided, the job is executed right away.

  • The END clause defines when the job should end its periodic execution. If the END clause is not set, and the following EVERY clause is defined, then the job repeats forever.

  • The EVERY clause specifies the time interval between consecutive executions.

Please note that all three clauses - START, END, and EVERY - are optional. If none of them is provided the job executes just once at the time of its creation.

Check out our docs here to learn more about jobs.

Real-World Use Cases

In this chapter, we’ll explore the various use cases of the CREATE JOB statement in MindsDB. This powerful feature allows users to automate their ML workflows and streamline the process of creating predictive models. With the ability to schedule and manage jobs, users can save time and resources while improving the accuracy and efficiency of their data analysis. Let's see how this feature can benefit your work.

Use Case 1: Retraining a Model and Saving Predictions

You may need to retrain your model when there are new training data available or when the MindsDB version is updated. Here is how you can create a job that retrains your model regularly and saves predictions into your database table:

CREATE JOB retrain_model_and_save_predictions (

   RETRAIN mindsdb.home_rentals_model
   USING
      join_learn_process = true;

   INSERT INTO my_integration.rentals (
      SELECT m.rental_price, m.rental_price_explain
      FROM mindsdb.home_rentals_model AS m
      JOIN example_db.demo_data.home_rentals AS d
   )
)
END '2023-06-01 00:00:00'
EVERY 2 days;

We name the job retrain_model_and_save_predictions. Next, we define the SQL code to be executed by this job:

  • The RETRAIN statement retrains your model to accommodate new training data or a new MindsDB version.

  • We use the join_learn_process parameter to ensure that the next command executes after the retraining process is completed.

  • We use the nested SELECT statement to query for batch predictions that are saved into the rentals table of the my_integration database using the INSERT INTO statement.

The START clause is omitted, so the job schedules its first execution at the current system timestamp. The job executes every two days and is scheduled to finish its periodic execution on June 1st.

Use Case 2: Saving Predictions

In the previous example, we saved predictions into the already existing table. It is also possible to create a table on the fly.

CREATE JOB save_predictions (

   CREATE TABLE my_integration.`result_{{START_DATETIME}}` (
      SELECT m.rental_price, m.rental_price_explain
      FROM mindsdb.home_rentals_model AS m
      JOIN example_db.demo_data.home_rentals AS d
   )
)
EVERY hour;

We name the job save_predictions. Next, we define the SQL code to be executed by this job:

  • The nested SELECT statement queries for predictions.

  • The CREATE TABLE statement creates a table in the my_integration database. The table name includes the {{START_DATETIME}} variable that is replaced by the data and time when the job starts its execution - for example, result_2023-02-14 18:47:51. This variable ensures unique table names.

Here, the START and END clauses are omitted. Therefore, the job starts its execution right away and executes every hour until manually disabled.

Use Case 3: Dropping a Model

You can create a one-time job to drop your model at a defined date.

CREATE JOB drop_model (

   DROP MODEL mindsdb.home_rentals_model
) 
START '2023-04-01';

We name the job drop_model. Next, we define the SQL code to be executed by this job - the DROP MODEL statement removes the home_rentals_model model from the mindsdb project.

Try your Hand at Job Scheduling in MindsDB!

After providing valuable insight into the CREATE JOB statement in MindsDB, we encourage you to give this feature a try and see how it can benefit your work.

You can use the job scheduling feature to create a chatbot that replies to messages using the underlying OpenAI model. Have a look at our Twitter chatbot implementation here.

Don't hesitate to reach out via the Slack community or GitHub if you have any questions or need assistance. We're here to support you every step of the way!