This page describes how to enable idle instances for a service by configuring minimum instances using the default Cloud Run autoscaling behavior. To manually scale your service, see manual scaling.
If you need more control over your service's autoscaling behavior, you can set a minimum number of instances to avoid slow container start times and reduce service latency. For Cloud Run services, Cloud Run, by default, scales in to the number of instances based on the number of incoming requests.
However, if your service requires reduced latency, especially when scaling from zero active instances, you can change this default behavior by specifying a minimum number of container instances to be kept warm and ready to serve requests. Refer to General development tips for more details on this optimization.
Cloud Run removes instances that are not processing requests (idle). With minimum instances set, Cloud Run keeps at least the number of minimum instances running, even if they're not processing requests. Active instances above the min-instances
number might become idle, if they are not receiving requests.
For example, if min-instances
is 10
, and the number of active instances is 0
, then the number of idle instances is 10
. When the number of active instances increases to 6
, then the number of idle instances decreases to 4
.
Note that if a service has not recently served traffic, the active instances metric can indicate that no instances are active, even if you specified one or more for minimum instances.
Minimum instances can be restarted at any time.
BillingInstances kept running using the minimum instances feature do incur billing costs.
The following diagram shows how billing works during an instance lifecycle when you configure minimum instances for a service or revision:
Figure 1. An example instance that receives and processes three requests.Depending on the billing settings configured, the service is billed as follows:
0
, you are not billed when instances are idle.0
, you are still billed the default rate. This option works well if you need CPU outside of requests. If min instances is set to 0
, you are billed the default rate.Since these charges are predictable, Google recommends purchasing a Committed use discount.
Apply minimum instances at service-level versus revision-levelYou can configure minimum instances at the service level or at the revision level. Google recommends that you apply minimum instances at the service level and avoid combining service-level and revision-level minimum instances. Learn more about the behavior when you configure both service-level and revision-level scaling settings.
If you apply minimum instances at the revision-level, the settings go into effect upon deployment of the revision. If you apply this feature at the service-level, the setting goes into effect without needing to deploy a new revision.
Revisions and minimum instancesWhen minimum instances are set at the service level, incoming requests are distributed to all revisions that are serving traffic proportionally to the traffic split.
When minimum instances are set at the revision level, minimum instances are started whenever the revision is referenced in a traffic split or has a traffic tag assigned. This means that the instance is billed when processing requests and also when it is waiting for incoming requests.
Important: Having revision-level minimum instances and traffic tags configured means that all tagged revisions are started and then kept active, even when there are no incoming requests. To avoid incurring billing costs for tagged revisions, use service-level minimum instances or remove tags on revisions when you don't need them anymore. Tagged revisions and service-level minimum instancesIf a revision with a tag assigned is started, the instance is counted towards the service-level minimum instances if they are a part of a traffic split.
Request routing with minimum instancesWhen you set minimum instances, Cloud Run distributes incoming requests evenly across all these provisioned instances. Understanding this behavior is important for managing costs, especially with request-based billing or if you intend to maintain idle hot spare instances. To minimize costs, set the number of minimum instances to the number of instances needed to serve your typical traffic.
Required rolesTo get the permissions that you need to configure and deploy Cloud Run services, ask your administrator to grant you the following IAM roles:
roles/run.developer
) on the Cloud Run serviceroles/iam.serviceAccountUser
) on the service identityFor a list of IAM roles and permissions that are associated with Cloud Run, see Cloud Run IAM roles and Cloud Run IAM permissions. If your Cloud Run service interfaces with Google Cloud APIs, such as Cloud Client Libraries, see the service identity configuration guide. For more information about granting roles, see deployment permissions and manage access.
Configure service-level minimum instancesBy default, container instances have service-level minimum instances turned off, with a setting of 0
. You can change this default using the Google Cloud console, the Google Cloud CLI, or a YAML file:
In the Google Cloud console, go to Cloud Run:
If you are configuring a new service, select Services from the menu, and click Deploy container to display the Create service form. Locate the Service scaling form.
If you are configuring an existing service, click the service to display its detail panel, then click the edit Edit service level scaling settings at the top right of the detail panel.
In the field labelled Minimum number of instances, specify the number of container instances to be kept warm, ready to receive requests.
Click Create for a new service or Deploy for an existing service.
Update the minimum number of instances for a given service using the following command:
gcloud run services update SERVICE --min MIN-VALUE
Replace the following:
default
to clear any minimum instance setting.Alternatively, you can set the minimum number of instances during deployment using the command:
gcloud run deploy --image IMAGE_URL --min MIN-VALUE
Replace the following:
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
.default
to clear any minimum instance setting.Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.
If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the run.googleapis.com/minScale
attribute:
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE annotations: run.googleapis.com/minScale: 'MIN_INSTANCE'
Replace the following:
Create or update the service using the following command:
gcloud run services replace service.yaml
To update service-level minimum instances for your service from code:
REST APITo update service-level minimum instances for a given service, send a PATCH
HTTP request to the Cloud Run Admin API service
endpoint.
For example, using curl
:
curl -H "Content-Type: application/json" \ -H "Authorization: Bearer ACCESS_TOKEN" \ -X PATCH \ -d '{ "scaling": { "minInstanceCount": MIN-VALUE }}' \ https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/services/SERVICE?update_mask=scaling.minInstanceCount
Replace the following:
gcloud
, you can retrieve an access token using gcloud auth print-access-token
. From within a Cloud Run container instance, you can retrieve an access token using the container instance metadata server.To view the current service-level minimum instances settings for your Cloud Run service:
ConsoleIn the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details panel.
The current setting is shown at the upper right of the service details panel, next to Scaling.
Use the following command:
gcloud run services describe SERVICE
Locate the value for Scaling: Auto (Min: MIN_VALUE, Max: MAX_VALUE) in the returned configuration.
Any configuration change leads to the creation of a new revision. Subsequent revisions will also automatically get this configuration setting unless you make explicit updates to change it.
By default, container instances have min-instances
turned off, with a setting of 0
. You can change this default using the Google Cloud console, the Google Cloud CLI, or a YAML file when you create a new service or deploy a new revision:
In the Google Cloud console, go to Cloud Run:
Select Services from the menu, and click Deploy container to configure a new service. If you are configuring an existing service, click the service, then click Edit and deploy new revision.
If you are configuring a new service, fill out the initial service settings page, then click Container(s), Volumes, Networking, Security to expand the service configuration page.
Click the Container tab.
Click Create or Deploy.
You can update min-instance
of a given service by using the following command:
gcloud run services update SERVICE --min-instances MIN-VALUE
Replace the following:
default
to clear any minimum instance setting.You can also set min-instance
during deployment using the command:
gcloud run deploy --image IMAGE_URL --min-instances MIN-VALUE
Replace the following:
us-docker.pkg.dev/cloudrun/container/hello:latest
. If you use Artifact Registry, the repository REPO_NAME must already be created. The URL follows the format of LOCATION-docker.pkg.dev/PROJECT_ID/REPO_NAME/PATH:TAG
.default
to clear any minimum instance setting.If you are creating a new service, skip this step. If you are updating an existing service, download its YAML configuration:
gcloud run services describe SERVICE --format export > service.yaml
Update the autoscaling.knative.dev/minScale:
attribute:
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: SERVICE spec: template: metadata: annotations: autoscaling.knative.dev/minScale: 'MIN-INSTANCE' name: REVISION
Replace the following:
SERVICE-
-
-
Create or update the service using the following command:
gcloud run services replace service.yaml
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.
Add the following to agoogle_cloud_run_v2_service
resource in your Terraform configuration:
The preceding google_cloud_run_v2_service
resource specifies a minimum number of instances of 1
under template.scaling
. Replace 1
with your own minimum number of instances.
To view the current revision-level minimum instances settings for your Cloud Run service:
ConsoleIn the Google Cloud console, go to Cloud Run:
Click the service you are interested in to open the Service details panel.
Click the Revisions tab.
In the details panel at the right, the Revision min. instances setting is listed under the Container tab.
Use the following command:
gcloud run services describe SERVICE
Locate the value for Min instances: in the returned configuration.
The following sections show the service behavior when configuring minimum instances.
Use both service-level and revision-level minimum or maximum instancesThe following table shows the behavior if you combine service-level minimum instances and revision-level minimum or maximum instances:
Configuration setting Behavior Both service level minimum instances and revision-level minimum instances are set. The effective value for the revision is the larger of revision-level minimum instances and service-level minimum instances. Both service level minimum instances and revision-level maximum instances are set. The effective value for the revision is the smaller of revision-level maximum instances and service level minimum instances.This holds true even if the revision-level maximum instances prevents the service from reaching the number of instances configured for service level minimum instances.
Use service level minimum instances with traffic splittingIf you use traffic splitting, the service-level minimum instances are divided across the revisions based on the proportion of the traffic split. For example, if the service-level minimum instances = 10, a 50/50 traffic split allocates 5 service-level minimum instances to each revision.
The following table shows sample configuration scenarios:
Sample use case Sample configuration Resulting behavior No revision-level settings Service-level minimum instances: 10If minimum instances is set higher than what is required for your typical traffic, many instances may become slightly active, each processing a few requests. For example, if your service generally requires 200 instances for peak load but minimum instances is configured to 600, incoming requests will be spread across all 600 instances. This results in many of these 600 instances becoming somewhat active, each handling a small portion of the traffic, instead of ~200 instances being highly active and the remaining 400 staying completely idle.
To minimize costs (by having higher utilization on fewer instances), set minimum instances to a value that closely aligns with the actual number of instances needed to serve your typical traffic.
Additionally, when autoscaling provisions additional instances above the configured minimum instances, Cloud Run prefers to route incoming requests to the configured minimum instances first before sending requests to the autoscaled instances. With request-based billing, this preferential routing to the configured minimum instances reduces cost by filling the configured minimum instances before using the autoscaled instances. Note that this preferential routing can also lead to configured minimum instances having a higher utilization than autoscaled instances, depending on the amount of traffic.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4