This article shows you how to use covariates, also known as external regressors, to improve AutoML forecasting models.
Covariates are additional variables outside the target time series that can improve forecasting models. For example, if you're forecasting hotel occupancy rates, knowing if it's the weekend could help predict customer behavior.
In this example, you:
FeatureStore
table.FeatureStore
as covariates in an AutoML forecasting experiment.This example uses randomly generated time series data for hotel occupancy rates in January 2024. Then, use AutoML to predict the occupancy_rate
for the first day of February 2024.
Run the following code to generate the sample data.
Python
df = spark.sql("""SELECT explode(sequence(to_date('2024-01-01'), to_date('2024-01-31'), interval 1 day)) as date, rand() as occupancy_rate FROM (SELECT 1 as id) tmp ORDER BY date""")
display(df)
Feature engineeringâ
Use the sample dataset to feature engineer a feature called is_weekend
that a binary classifier of whether or not a date
is a weekend.
Python
from pyspark.sql.functions import dayofweek, when
def compute_hotel_weekend_features(df):
''' is_weekend feature computation code returns a DataFrame with 'date' as primary key'''
return df.select("date").withColumn(
"is_weekend",
when(dayofweek("date").isin( 1, 2, 3, 4, 5), 0)
.when(dayofweek("date").isin(6, 7), 1)
)
hotel_weekend_feature_df = compute_hotel_weekend_features(df)
Create the Feature Storeâ
To use covariates on AutoML, you must use a Feature Store to join one or more covariate feature tables with the primary training data in AutoML.
Store the data frame hotel_weather_feature_df
as a Feature Store.
Python
from databricks.feature_engineering import FeatureEngineeringClient
fe = FeatureEngineeringClient()
hotel_weekend_feature_table = fe.create_table(
name='ml.default.hotel_weekend_features',
primary_keys=['date'],
df=hotel_weekend_feature_df,
description='Hotel is_weekend features table'
)
note
This example uses the Python FeatureEngineeringClient
to create and write tables. However, you can also use SQL or DeltaLiveTables to write and create tables. See Work with feature tables in Unity Catalog for more options.
Use the feature_store_lookups
parameter to pass the Feature Store to AutoML. feature_store_lookups
contains a dictionary with two fields: table_name
and lookup_key
.
Python
hotel_weekend_feature_lookup = {
"table_name": "ml.default.hotel_weekend_features",
"lookup_key": ["date"]
}
feature_lookups = [hotel_weekend_feature_lookup]
note
feature_store_lookups
can contain multiple feature table lookups.
Use the following code to pass the features_lookups
to an AutoML experiment API call.
Python
from databricks import automl
summary = automl.forecast(dataset=df, target_col="occupancy_rate", time_col="date", frequency="d", horizon=1, timeout_minutes=30, identity_col=None, feature_store_lookups=feature_lookups)
Next stepsâ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4