RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook below:

Install and run a Jupyter notebook on a Dataproc cluster | Dataproc Documentation

Skip to main content Install and run a Jupyter notebook on a Dataproc cluster

Stay organized with collections Save and categorize content based on your preferences.

Objectives

This tutorial shows you how to install the Dataproc Jupyter component on a new cluster, and then connect to the Jupyter notebook UI running on the cluster from your local browser using the Dataproc Component Gateway .

Note: Running this tutorial will incur Google Cloud charges—see Dataproc Pricing. Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

If you haven't already done so, create a Google Cloud project and a Cloud Storage bucket.

Setting up your project
1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
  Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
  Go to project selector
3. Make sure that billing is enabled for your Google Cloud project.
4. Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
  
  Enable the APIs
5. Install the Google Cloud CLI.
6. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
7. To initialize the gcloud CLI, run the following command:
```
gcloud init
```
8. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
  Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
  Go to project selector
9. Make sure that billing is enabled for your Google Cloud project.
10. Enable the Dataproc, Compute Engine, and Cloud Storage APIs.
  
  Enable the APIs
11. Install the Google Cloud CLI.
12. If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
13. To initialize the gcloud CLI, run the following command:
```
gcloud init
```
Creating a Cloud Storage bucket in your project to store any notebooks you create in this tutorial.
1. In the Google Cloud console, go to the Cloud Storage Buckets page.
  Go to Buckets
2. Click add_box Create.
3. On the Create a bucket page, enter your bucket information. To go to the next step, click Continue.
  1. In the Get started section, do the following:
    - Enter a globally unique name that meets the bucket naming requirements.
    - To add a bucket label, expand the Labels section (expand_more), click add_box Add label, and specify a key and a value for your label.
  2. In the Choose where to store your data section, do the following:
    1. Select a Location type.
    2. Choose a location where your bucket's data is permanently stored from the Location type drop-down menu.
      - If you select the dual-region location type, you can also choose to enable turbo replication by using the relevant checkbox.
    3. To set up cross-bucket replication, select Add cross-bucket replication via Storage Transfer Service and follow these steps: Set up cross-bucket replication
      1. In the Bucket menu, select a bucket.
      2. In the Replication settings section, click Configure to configure settings for the replication job.
        
        The Configure cross-bucket replication pane appears.
        
        To filter objects to replicate by object name prefix, enter a prefix that you want to include or exclude objects from, then click add Add a prefix.
        
        To set a storage class for the replicated objects, select a storage class from the Storage class menu. If you skip this step, the replicated objects will use the destination bucket's storage class by default.
        
        Click Done.
  3. In the Choose how to store your data section, do the following:
    1. Select a default storage class for the bucket or Autoclass for automatic storage class management of your bucket's data.
    2. To enable hierarchical namespace, in the Optimize storage for data-intensive workloads section, select Enable hierarchical namespace on this bucket. Note: You cannot enable hierarchical namespace in existing buckets.
  4. In the Choose how to control access to objects section, select whether or not your bucket enforces public access prevention, and select an access control method for your bucket's objects. Note: You cannot change the Prevent public access setting if this setting is enforced at an organization policy.
  5. In the Choose how to protect object data section, do the following:
    - Select any of the options under Data protection that you want to set for your bucket.
      - To enable soft delete, click the Soft delete policy (For data recovery) checkbox, and specify the number of days you want to retain objects after deletion.
      - To set Object Versioning, click the Object versioning (For version control) checkbox, and specify the maximum number of versions per object and the number of days after which the noncurrent versions expire.
      - To enable the retention policy on objects and buckets, click the Retention (For compliance) checkbox, and then do the following:
        
        To enable Object Retention Lock, click the Enable object retention checkbox.
        
        To enable Bucket Lock, click the Set bucket retention policy checkbox, and choose a unit of time and a length of time for your retention period.
    - To choose how your object data will be encrypted, expand the Data encryption section (expand_more), and select a Data encryption method.
4. Click Create.

Create a cluster and install the Jupyter component

Create a cluster with the installed Jupyter component.

Note: When creating the cluster, specify the name of the bucket you created in Before you begin, step 2 (only specify the name of the bucket) as the Dataproc staging bucket (see Dataproc staging and temp buckets for instructions on setting the staging bucket). Your notebooks will be stored in Cloud Storage under gs://bucket-name/notebooks/jupyter. Open the Jupyter and JupyterLab UIs

Click the Google Cloud console Component Gateway links in the Google Cloud console to open the Jupyter notebook or JupyterLab UIs running on your cluster's

The top-level directory displayed by your Jupyter instance is a virtual directory that lets you see the contents of either your Cloud Storage bucket or your local file system. You can choose either location by clicking on the GCS link for Cloud Storage or Local Disk for the local file system of the master node in your cluster.

Click the GCS link. The Jupyter notebook web UI displays notebooks stored in your Cloud Storage bucket, including any notebooks you create in this tutorial.

Clean up

After you finish the tutorial, you can clean up the resources that you created so that they stop using quota and incurring charges. The following sections describe how to delete or turn off these resources.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

Caution

Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an appspot.com URL, delete selected resources inside the project instead of deleting the whole project.

If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the cluster

To delete your cluster:

gcloud dataproc clusters delete cluster-name \
    --region=${REGION}

Delete the bucket

To delete the Cloud Storage bucket you created in Before you begin, step 2, including the notebooks stored in the bucket:
```
gcloud storage rm gs://${BUCKET_NAME} --recursive
```

What's next

See the Jupyter/IPython Notebook Quick Start Guide

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-07-02 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-07-02 UTC."],[[["This tutorial guides users on installing the Dataproc Jupyter component on a new cluster and accessing the Jupyter notebook UI via the Dataproc Component Gateway."],["Users must create a Google Cloud project and a Cloud Storage bucket to store notebooks before installing the Jupyter component on a new cluster."],["The Jupyter UI can display the contents of either the Cloud Storage bucket or the local file system of the cluster's master node, accessible via the \"GCS\" or \"Local Disk\" links respectively."],["The tutorial outlines how to clean up resources, including deleting the project, cluster, and Cloud Storage bucket, to avoid incurring further charges."],["Running this tutorial will incur Google Cloud costs, including Dataproc and Cloud Storage fees, which can be estimated using the pricing calculator."]]],[]]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4