A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.databricks.com/aws/en/archive/admin-guide/usage-analysis below:

Billable usage log schema (legacy)

Billable usage log schema (legacy)

important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. To view current admin documentation, see Manage your Databricks account.

note

This article includes details about the legacy usage logs, which do not record usage for all products. Databricks recommends using the billable usage system table to access and query complete usage data.

This article explains how to read and analyze the usage log data downloaded from the account console.

You can view and download billable usage directly in the account console, or by using the Account API.

CSV file schema​ Billing SKUs​ Deprecated SKUs​

The following SKUs have been deprecated:

Analyze usage data in Databricks​

This section describes how to make the data in the billable usage CSV file available to Databricks for analysis.

The CSV file uses a format that is standard for commercial spreadsheet applications but requires a modification to be read by Apache Spark. You must use option("escape", "\"") when you create the usage table in Databricks.

Total DBUs are the sum of the dbus column.

Import the log using the Create Table UI​

You can use the Upload files to Databricks to import the CSV file into Databricks for analysis.

Create a Spark DataFrame​

You can also use the following code to create the usage table from a path to the CSV file:

Python

df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
csv("/FileStore/tables/usage_data.csv"))

df.createOrReplaceTempView("usage")

If the file is stored in an S3 bucket, for example when it is used with log delivery, the code will look like the following. You can specify a file path or a directory. If you pass a directory, all files are imported. The following example specifies a file.

Python

df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
load("s3://<bucketname>/<pathprefix>/billable-usage/csv/workspaceId=<workspace-id>-usageMonth=<month>.csv"))

df.createOrReplaceTempView("usage")

The following example imports a directory of billable usage files:

Python

df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
load("s3://<bucketname>/<pathprefix>/billable-usage/csv/"))

df.createOrReplaceTempView("usage")
Create a Delta table​

To create a Delta table from the DataFrame (df) in the previous example, use the following code:

Python

(df.write
.format("delta")
.mode("overwrite")
.saveAsTable("database_name.table_name")
)

warning

The saved Delta table is not updated automatically when you add or replace new CSV files. If you need the latest data, re-run these commands before you use the Delta table.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4