This article describes the AutoML Python API, which provides methods to start classification, regression, and forecasting AutoML runs. Each method call trains a set of models and generates a trial notebook for each model.
For more information on AutoML, including a low-code UI option, see What is AutoML?.
ClassifyâThe databricks.automl.classify
method configures an AutoML run to train a classification model.
note
The max_trials
parameter is deprecated in Databricks Runtime 10.4 ML and is not supported in Databricks Runtime 11.0 ML and above. Use timeout_minutes
to control the duration of an AutoML run.
Python
databricks.automl.classify(
dataset: Union[pyspark.sql.DataFrame, pandas.DataFrame, pyspark.pandas.DataFrame, str],
*,
target_col: str,
primary_metric: str = "f1",
data_dir: Optional[str] = None,
experiment_dir: Optional[str] = None,
experiment_name: Optional[str] = None,
exclude_cols: Optional[List[str]] = None,
exclude_frameworks: Optional[List[str]] = None,
feature_store_lookups: Optional[List[Dict]] = None,
imputers: Optional[Dict[str, Union[str, Dict[str, Any]]]] = None,
pos_label: Optional[Union[int, bool, str]] = None,
time_col: Optional[str] = None,
split_col: Optional[str] = None,
sample_weight_col: Optional[str] = None
max_trials: Optional[int] = None,
timeout_minutes: Optional[int] = None,
) -> AutoMLSummary
Classify parametersâ Regressâ
The databricks.automl.regress
method configures an AutoML run to train a regression model. This method returns an AutoMLSummary.
note
The max_trials
parameter is deprecated in Databricks Runtime 10.4 ML and is not supported in Databricks Runtime 11.0 ML and above. Use timeout_minutes
to control the duration of an AutoML run.
Python
databricks.automl.regress(
dataset: Union[pyspark.sql.DataFrame, pandas.DataFrame, pyspark.pandas.DataFrame, str],
*,
target_col: str,
primary_metric: str = "r2",
data_dir: Optional[str] = None,
experiment_dir: Optional[str] = None,
experiment_name: Optional[str] = None,
exclude_cols: Optional[List[str]] = None,
exclude_frameworks: Optional[List[str]] = None,
feature_store_lookups: Optional[List[Dict]] = None,
imputers: Optional[Dict[str, Union[str, Dict[str, Any]]]] = None,
time_col: Optional[str] = None,
split_col: Optional[str] = None,
sample_weight_col: Optional[str] = None,
max_trials: Optional[int] = None,
timeout_minutes: Optional[int] = None,
) -> AutoMLSummary
Regress parametersâ Forecastâ
The databricks.automl.forecast
method configures an AutoML run for training a forecasting model. This method returns an AutoMLSummary. To use Auto-ARIMA, the time series must have a regular frequency (that is, the interval between any two points must be the same throughout the time series). The frequency must match the frequency unit specified in the API call. AutoML handles missing time steps by filling in those values with the previous value.
Python
databricks.automl.forecast(
dataset: Union[pyspark.sql.DataFrame, pandas.DataFrame, pyspark.pandas.DataFrame, str],
*,
target_col: str,
time_col: str,
primary_metric: str = "smape",
country_code: str = "US",
frequency: str = "D",
horizon: int = 1,
data_dir: Optional[str] = None,
experiment_dir: Optional[str] = None,
experiment_name: Optional[str] = None,
exclude_frameworks: Optional[List[str]] = None,
feature_store_lookups: Optional[List[Dict]] = None,
identity_col: Optional[Union[str, List[str]]] = None,
sample_weight_col: Optional[str] = None,
output_database: Optional[str] = None,
timeout_minutes: Optional[int] = None,
) -> AutoMLSummary
Forecasting parametersâ Import notebookâ
The databricks.automl.import_notebook
method imports a notebook that has been saved as an MLflow artifact. This method returns an ImportNotebookResult.
Python
databricks.automl.import_notebook(
artifact_uri: str,
path: str,
overwrite: bool = False
) -> ImportNotebookResult:
Import notebook exampleâ
Python
summary = databricks.automl.classify(...)
result = databricks.automl.import_notebook(summary.trials[5].artifact_uri, "/Users/you@yourcompany.com/path/to/directory")
print(result.path)
print(result.url)
AutoMLSummary
â
Summary object for an AutoML run that describes the metrics, parameters, and other details for each of the trials. You also use this object to load the model trained by a specific trial.
TrialInfo
â
Summary object for each individual trial.
TrialInfo
has a method to load the model generated for the trial.
ImportNotebookResult
â
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4