Stay organized with collections Save and categorize content based on your preferences.
ObjectivesThis tutorial shows you how to install the Dataproc Jupyter component on a new cluster, and then connect to the Jupyter notebook UI running on the cluster from your local browser using the Dataproc Component Gateway.
Note: Running this tutorial will incur Google Cloud charges—see Dataproc Pricing. CostsIn this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
Before you beginIf you haven't already done so, create a Google Cloud project and a Cloud Storage bucket.
Setting up your project
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Make sure that billing is enabled for your Google Cloud project.
Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
gcloud init
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Make sure that billing is enabled for your Google Cloud project.
Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
gcloud init
Creating a Cloud Storage bucket in your project to store any notebooks you create in this tutorial.
key
and a value
for your label.In the Replication settings section, click Configure to configure settings for the replication job.
The Configure cross-bucket replication pane appears.
gs://bucket-name/notebooks/jupyter
.
Create a cluster with the installed Jupyter component.
Note: When creating the cluster, specify the name of the bucket you created in Before you begin, step 2 (only specify the name of the bucket) as the Dataprocstaging bucket
(see Dataproc staging and temp buckets for instructions on setting the staging bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter
. Open the Jupyter and JupyterLab UIs
Click the Google Cloud console Component Gateway links in the Google Cloud console to open the Jupyter notebook or JupyterLab UIs running on your cluster's
The top-level directory displayed by your Jupyter instance is a virtual directory that lets you see the contents of either your Cloud Storage bucket or your local file system. You can choose either location by clicking on the GCS link for Cloud Storage or Local Disk for the local file system of the master node in your cluster.
After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.
Delete the projectThe easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
appspot.com
URL, delete selected resources inside the project instead of deleting the whole project.If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.
gcloud dataproc clusters delete cluster-name \ --region=${REGION}
gcloud storage rm gs://${BUCKET_NAME} --recursive
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-07-02 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-02 UTC."],[[["This tutorial guides users on installing the Dataproc Jupyter component on a new cluster and accessing the Jupyter notebook UI via the Dataproc Component Gateway."],["Users must create a Google Cloud project and a Cloud Storage bucket to store notebooks before installing the Jupyter component on a new cluster."],["The Jupyter UI can display the contents of either the Cloud Storage bucket or the local file system of the cluster's master node, accessible via the \"GCS\" or \"Local Disk\" links respectively."],["The tutorial outlines how to clean up resources, including deleting the project, cluster, and Cloud Storage bucket, to avoid incurring further charges."],["Running this tutorial will incur Google Cloud costs, including Dataproc and Cloud Storage fees, which can be estimated using the pricing calculator."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4