Stay organized with collections Save and categorize content based on your preferences.
Prophet is a forecasting model maintained by Meta. See the Prophet paper for algorithm details and the documentation for more information about the library.
Like BigQuery ML ARIMA_PLUS, Prophet attempts to decompose each time series into trends, seasons, and holidays, producing a forecast using the aggregation of these models' inferences. An important difference, however, is that BQML ARIMA+ uses ARIMA to model the trend component, while Prophet attempts to fit a curve using a piecewise logistic or linear model.
Google Cloud offers a pipeline for training a Prophet model and a pipeline for getting batch inferences from a Prophet model. Both pipelines are instances of Vertex AI Pipelines from Google Cloud Pipeline Components (GCPC).
Integration of Prophet with Vertex AI means that you can do the following:
Although Prophet is a multivariate model, Vertex AI supports only a univariate version of it.
To learn about the service accounts this workflow uses, see Service accounts for Tabular Workflows.
Workflow APIsThis workflow uses the following APIs:
Prophet is designed for a single time series. Vertex AI aggregates data by time series ID and trains a Prophet model for each time series. The model training pipeline performs hyperparameter tuning using grid search and Prophet's built-in backtesting logic.
To support multiple time series, the pipeline uses a Vertex AI Custom Training Job and Dataflow to train multiple Prophet models in parallel. Overall, the number of models trained is the product of the number of time series and the number of hyperparameter tuning trials.
The following sample code demonstrates how to run a Prophet model training pipeline:
job = aiplatform.PipelineJob(
...
template_path=train_job_spec_path,
parameter_values=train_parameter_values,
...
)
job.run(service_account=SERVICE_ACCOUNT)
The optional service_account
parameter in job.run()
lets you set the Vertex AI Pipelines service account to an account of your choice.
The pipeline and the parameter values are defined by the following function.
(
train_job_spec_path,
train_parameter_values,
) = utils.get_prophet_train_pipeline_and_parameters(
...
)
The following is a subset of get_prophet_train_pipeline_and_parameters
parameters:
project
String Your project ID. location
String Your region. root_dir
String The Cloud Storage location to store the output. target_column
String The column (value) you want this model to predict. time_column
String The time column. You must specify a time column and it must have a value for every row. The time column indicates the time at which a given observation was made. time_series_identifier_column
String The time series identifier column. You must specify a time series identifier column and it must have a value for every row. Forecasting training data usually includes multiple time series, and the identifier tells Vertex AI which time series a given observation in the training data is part of. All of the rows in a given time series have the same value in the time series identifier column. Some common time series identifiers might be the product ID, a store ID, or a region. It is possible to train a forecasting model on a single time series, with an identical value for all rows in the time series identifier column. However, Vertex AI is a better fit for training data that contains two or more time series. For best results, use at least 10 time series for every column you use to train the model. data_granularity_unit
String The unit to use for the granularity of your training data and your forecast horizon and context window. Can be minute
, hour
, day
, week
, month
, or year
. Learn how to choose the data granularity. data_source_csv_filenames
String A URI for a CSV stored in Cloud Storage. data_source_bigquery_table_path
String A URI for a BigQuery table. forecast_horizon
Integer The forecast horizon determines how far into the future the model forecasts the target value for each row of inference data. The forecast horizon is specified in units of data granularity. Learn more. optimization_objective
String Optimization objective for the model. Learn more. max_num_trials
Integer Maximum number of tuning trials to perform per time series. Dataflow parameters
The following is a subset of get_prophet_train_pipeline_and_parameters
parameters for Dataflow customization:
trainer_dataflow_machine_type
String The Dataflow machine type to use for training. trainer_dataflow_max_num_workers
Integer The maximum number of Dataflow workers to use for training. evaluation_dataflow_machine_type
String The Dataflow machine type to use for evaluation. evaluation_dataflow_max_num_workers
Integer The maximum number of Dataflow workers to use for evaluation. dataflow_service_account
String Custom service account to run Dataflow jobs. You can configure the Dataflow job to use private IPs and a specific VPC subnet. This parameter acts as an override for the default Dataflow worker service account.
Because Prophet training jobs run on Dataflow, an initial startup time of 5 - 7 minutes occurs. To reduce additional runtime, you can scale up or scale out. For example, to scale up, change the machine type from n1-standard-1
to e2-highcpu-8
. To scale out, increase the number of workers from 1
to 200
.
The training pipeline offers the following options for splitting your data:
Data split Description Parameters Default split Vertex AI randomly selects 80% of your data rows for the training set, 10% for the validation set, and 10% for the test set. Vertex AI uses the Time column to determine the chronological order of the data rows. None Fraction split Vertex AI uses values you provide to partition your data into the training set, the validation set, and the test set. Vertex AI uses the Time column to determine the chronological order of the data rows.training_fraction
validation_fraction
test_fraction
training_fraction
, validation_fraction
, and test_fraction
values to partition your data into the training set, the validation set, and the test set. Vertex AI uses the timestamp_split_key
column to determine the chronological order of the data rows.
training_fraction
validation_fraction
test_fraction
timestamp_split_key
predefined_split_key
column.
predefined_split_key
You define the data split parameters in get_prophet_train_pipeline_and_parameters
as follows:
predefined_split_key
String The name of the column containing the TRAIN, VALIDATE, or TEST values. Set this value if you are using a manual (predefined) split. training_fraction
Float The percentage of the data to assign to the training set. Set this value if you are using a fraction split or a timestamp split. validation_fraction
Float The percentage of the data to assign to the validation set. Set this value if you are using a fraction split or a timestamp split. test_fraction
Float The percentage of the data to assign to the test set. Set this value if you are using a fraction split or a timestamp split. timestamp_split_key
String The name of the column containing the timestamps for the data split. Set this value if you are using a timestamp split. Window parameters
Vertex AI generates forecast windows from the input data using a rolling window strategy. If you leave the window parameters unset, Vertex AI uses the Count strategy with a default maximum value of 100,000,000
. The training pipeline offers the following rolling window strategies:
100,000,000
. The maximum number of windows cannot exceed 100,000,000
. window_max_count
Stride Vertex AI uses one out of every X input rows to generate a window, up to a maximum of 100,000,000 windows. This option is useful for seasonal or periodic inferences. For example, you can limit forecasting to a single day of the week by setting the stride length value to 7
. The value can be between 1
and 1000
. window_stride_length
Column You can add a column to your input data where the values are either True
or False
. Vertex AI generates a window for every input row where the value of the column is True
. The True
and False
values can be set in any order, as long as the total count of True
rows is less than 100,000,000
. Boolean values are preferred, but string values are also accepted. String values are not case sensitive. window_column
You define the window parameters in get_prophet_train_pipeline_and_parameters
as follows:
window_column
String The name of the column with True
and False
values. window_stride_length
Integer The value of the stride length. window_max_count
Integer The maximum number of windows. Make inferences with Prophet
The Vertex AI model training pipeline for Prophet creates one Prophet model for each time series in the data. The inference pipeline aggregates input data by time series ID and calculates the inferences separately for each time series. The pipeline then disaggregates the inference results to match the format of Vertex AI Forecasting.
The following sample code demonstrates how to run a Prophet inference pipeline:
job = aiplatform.PipelineJob(
...
template_path=prediction_job_spec_path,
parameter_values=prediction_parameter_values,
...
)
job.run(...)
The pipeline and the parameter values are defined by the following function.
(
prediction_job_spec_path,
prediction_parameter_values,
) = utils.get_prophet_prediction_pipeline_and_parameters(
...
)
The following is a subset of get_prophet_prediction_pipeline_and_parameters
parameters:
project
String Your project ID. location
String Your region. model_name
String The name of the Model resource. Format the string as follows: projects/{project}/locations/{location}/models/{model}
. time_column
String The time column. You must specify a time column and it must have a value for every row. The time column indicates the time at which a given observation was made. time_series_identifier_column
String The time series identifier column. You must specify a time series identifier column and it must have a value for every row. Forecasting training data usually includes multiple time series, and the identifier tells Vertex AI which time series a given observation in the training data is part of. All of the rows in a given time series have the same value in the time series identifier column. Some common time series identifiers might be the product ID, a store ID, or a region. It is possible to train a forecasting model on a single time series, with an identical value for all rows in the time series identifier column. However, Vertex AI is a better fit for training data that contains two or more time series. For best results, use at least 10 time series for every column you use to train the model. target_column
String The column (value) you want this model to predict. data_source_csv_filenames
String A URI for a CSV stored in Cloud Storage. data_source_bigquery_table_path
String A URI for a BigQuery table. bigquery_destination_uri
String A URI for the selected destination dataset. If this value is not set, resources are created under a new dataset in the project. machine_type
String The machine type to use for batch inference. max_num_workers
Integer The maximum number of workers to use for batch inference.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-15 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-15 UTC."],[],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4