This article explains what pools are, and how you can best configure them. For information on creating a pool, see Pool configuration reference.
note
If your workload supports serverless compute, Databricks recommends using serverless compute instead of pools to take advantage of always-on, scalable compute. See Connect to serverless compute.
Pool considerationsâConsider the following when creating a pool:
You can minimize instance acquisition time by creating a pool for each instance type and Databricks runtime your organization commonly uses. For example, if most data engineering clusters use instance type A, data science clusters use instance type B, and analytics clusters use instance type C, create a pool with each instance type.
Using spot instance poolsâIf your driver node and worker nodes have different requirements, use different pools for each.
Databricks recommends not using spot instances for your driver node. If you use a spot pool for your worker node, select an on-demand pool as your Driver type.
Configure pools to use on-demand instances for jobs with short execution times and strict execution time requirements. Use on-demand instances to prevent acquired instances from being lost to a higher bidder on the spot market.
Configure pools to use spot instances for clusters that support interactive development or jobs that prioritize cost savings over reliability.
Tag pools to manage cost and billingâTagging pools to the correct cost center allows you to manage cost and usage chargeback. You can use multiple custom tags to associate multiple cost centers to a pool. However, it's important to understand how tags are propagated when a cluster is created from pools. Tags from pools propagate to the underlying cloud provider instances, but the cluster's tags do not. Apply all custom tags required for managing chargeback of the cloud provider compute cost to the pool.
Pool tags and cluster tags both propagate to Databricks billing. You can use the combination of cluster and pool tags to manage chargeback of Databricks Units.
To learn more, see Use tags to attribute and track usage.
Configure pools to control costâYou can use the following configuration options to help control the cost of pools:
To benefit fully from pools, you can pre-populate newly created pools. Set the Min Idle instances greater than zero in the pool configuration. Alternatively, if you're following the recommendation to set this value to zero, use a starter job to ensure that newly created pools have available instances for clusters to access.
With the starter job approach, schedule a job with flexible execution time requirements to run before jobs with more strict performance requirements or before users start using interactive clusters. After the job finishes, the instances used for the job are released back to the pool. Set Min Idle instance setting to 0 and set the Idle Instance Auto Termination time high enough to ensure that idle instances remain available for subsequent jobs.
Using a starter job allows the pool instances to spin up, populate the pool, and remain available for downstream job or interactive clusters.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4