A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/discover/databricks-datasets below:

Sample datasets | Databricks Documentation

Sample datasets

There are a variety of sample datasets provided by Databricks and made available by third parties that you can use in your Databricks workspace.

Unity Catalog datasets​

Unity Catalog provides access to a number of sample datasets in the samples catalog. You can review these datasets in the Catalog Explorer UI and reference them directly in a notebook or in the SQL editor by using the <catalog-name>.<schema-name>.<table-name> pattern.

The nyctaxi schema (also known as a database) contains the table trips, which has details about taxi rides in New York City. The following statement returns the first 10 records in this table:

SQL

SELECT * FROM samples.nyctaxi.trips LIMIT 10

The tpch schema contains data from the TPC-H Benchmark. To list the tables in this schema, run:

SQL

SHOW TABLES IN samples.tpch
Third-party sample datasets in CSV format​

Databricks has built-in tools to quickly upload third-party sample datasets as comma-separated values (CSV) files into Databricks workspaces. Some popular third-party sample datasets available in CSV format:

To use third-party sample datasets in your Databricks workspace, do the following:

  1. Follow the third-party's instructions to download the dataset as a CSV file to your local machine.
  2. Upload the CSV file from your local machine into your Databricks workspace.
  3. To work with the imported data, use Databricks SQL to query the data. Or you can use a notebook to load the data as a DataFrame.
Third-party sample datasets within libraries​

Some third parties include sample datasets within libraries, such as Python Package Index (PyPI) packages or Comprehensive R Archive Network (CRAN) packages. For more information, see the library provider's documentation.

Databricks datasets (databricks-datasets) mounted to DBFS​

Databricks recommends against using DBFS and mounted cloud object storage for most use cases in Unity Catalog-enabled Databricks workspaces. Some sample datasets mounted to DBFS are available in Databricks

note

The availability and location of Databricks datasets are subject to change without notice.

Browse DBFS mounted Databricks datasets​

To browse these files from a Python, Scala, or R notebook, you can use Databricks Utilities (dbutils) reference. The following code lists all of the available Databricks datasets.

Python

display(dbutils.fs.ls('/databricks-datasets'))

Scala

display(dbutils.fs.ls("/databricks-datasets"))

R

%fs ls "/databricks-datasets"

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4