A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.microsoft.com/en-us/azure/kusto/query/pythonplugin below:

Python plugin - Kusto | Microsoft Learn

Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer

The Python plugin runs a user-defined function (UDF) using a Python script. The Python script gets tabular data as its input, and produces tabular output. The plugin's runtime is hosted in sandboxes, running on the cluster's nodes.

Syntax

T | evaluate [hint.distribution = (single | per_node)] [hint.remote = (auto | local)] python(output_schema, script [, script_parameters] [, external_artifacts] [, spill_to_disk])

Learn more about syntax conventions.

Parameters Name Type Required Description output_schema string ✔️ A type literal that defines the output schema of the tabular data, returned by the Python code. The format is: typeof(ColumnName: ColumnType[, ...]). For example, typeof(col1:string, col2:long). To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long). script string ✔️ The valid Python script to execute. To generate multi-line strings, see Usage tips. script_parameters dynamic A property bag of name value pairs to be passed to the Python script as the reserved kargs dictionary. For more information, see Reserved Python variables. hint.distribution string A hint for the plugin's execution to be distributed across multiple cluster nodes. The default value is single. single means a single instance of the script will run over the entire query data. per_node means that if the query before the Python block is distributed, an instance of the script will run on each node, on the data that it contains. hint.remote string This hint is only relevant for cross cluster queries. The default value is auto. auto means the server decides automatically in which cluster the Python code is executed. Setting the value to local forces executing the Python code on the local cluster. Use it in case the Python plugin is disabled on the remote cluster. external_artifacts dynamic A property bag of name and URL pairs for artifacts that are accessible from cloud storage. See more in Using external artifacts. spill_to_disk bool Specifies an alternative method for serializing the input table to the Python sandbox. For serializing big tables set it to true to speed up the serialization and significantly reduce the sandbox memory consumption. Default is true. Reserved Python variables

The following variables are reserved for interaction between Kusto Query Language and the Python code.

Enable the plugin

The plugin is disabled by default. Before you start, review the list of prerequisites. To enable the plugin and select the version of the Python image, see Enable language extensions on your cluster.

Python sandbox image

To change the version of the Python image to a different managed image or a custom image, see Change the Python language extensions image on your cluster.

To see the list of packages for the different Python images, see Python package reference.

Note

Use ingestion from query and update policy Example
range x from 1 to 360 step 1
| evaluate python(
//
typeof(*, fx:double),               //  Output schema: append a new fx column to original table 
```
result = df
n = df.shape[0]
g = kargs["gain"]
f = kargs["cycles"]
result["fx"] = g * np.sin(df["x"]/n*2*np.pi*f)
```
, bag_pack('gain', 100, 'cycles', 4)    //  dictionary of parameters
)
| render linechart 

Performance tips Usage tips Example reading the Python script external data
    let script = 
        externaldata(script:string)
        [h'https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py']
        with(format = raw);
    range x from 1 to 360 step 1
    | evaluate python(
        typeof(*, fx:double),
        toscalar(script), 
        bag_pack('gain', 100, 'cycles', 4))
    | render linechart 
Using External Artifacts

External artifacts from cloud storage can be made available for the script and used at runtime.

The URLs referenced by the external artifacts property must be:

Note

When authenticating external artifacts using Managed Identities, the SandboxArtifacts usage must be defined on the cluster level managed identity policy.

The artifacts are made available for the script to be read from a local temporary directory, .\Temp. The names provided in the property bag are used as the local file names. See Example.

For information regarding referencing external packages, see Install packages for the Python plugin.

Refreshing external artifact cache

External artifact files utilized in queries are cached on your cluster. If you make updates to your files in cloud storage and require immediate synchronization with your cluster, you can use the .clear cluster cache external-artifacts command. This command clears the cached files and ensures that subsequent queries run with the latest version of the artifacts.

Install packages for the Python plugin

In most use cases, you might prefer to create a custom image.

You might want to install package(s) yourself, for the following reasons:

Install packages as follows:

Prerequisites
  1. Create a blob container to host the packages, preferably in the same place as your cluster. For example, https://artifactswestus.blob.core.windows.net/python, assuming your cluster is in West US.

  2. Alter the cluster's callout policy to allow access to that location.

    .alter-merge cluster policy callout @'[ { "CalloutType": "sandbox_artifacts", "CalloutUriRegex": "artifactswestus\\.blob\\.core\\.windows\\.net/python/","CanCall": true } ]'
    
Install packages
  1. For public packages in PyPi or other channels, download the package and its dependencies.

    pip wheel [-w download-dir] package-name.
    
  2. Create a zip file containing the required package and its dependencies.

    Note

  3. Upload the zip file to a blob in the artifacts location (from step 1 of the prerequisites).

  4. Call the python plugin.

Example using external artifacts

Install the Faker package that generates fake data.

range ID from 1 to 3 step 1 
| extend Name=''
| evaluate python(typeof(*), ```if 1:
    from sandbox_utils import Zipackage
    Zipackage.install("Faker.zip")
    from faker import Faker
    fake = Faker()
    result = df
    for i in range(df.shape[0]):
        result.loc[i, "Name"] = fake.name()
    ```,
    external_artifacts=bag_pack('faker.zip', 'https://artifacts.blob.core.windows.net/Faker.zip;impersonate'))
ID Name 1 Gary Tapia 2 Emma Evans 3 Ashley Bowen

For more examples of UDF functions that use the Python plugin, see the Functions library.

The Python plugin runs a user-defined function (UDF) using a Python script. The Python script gets tabular data as its input, and produces tabular output.

Syntax

T | evaluate [hint.distribution = (single | per_node)] [hint.remote = (auto | local)] python(output_schema, script [, script_parameters] [, external_artifacts] [, spill_to_disk])

Learn more about syntax conventions.

Parameters Name Type Required Description output_schema string ✔️ A type literal that defines the output schema of the tabular data, returned by the Python code. The format is: typeof(ColumnName: ColumnType[, ...]). For example, typeof(col1:string, col2:long). To extend the input schema, use the following syntax: typeof(*, col1:string, col2:long). script string ✔️ The valid Python script to execute. To generate multi-line strings, see Usage tips. script_parameters dynamic A property bag of name value pairs to be passed to the Python script as the reserved kargs dictionary. For more information, see Reserved Python variables. hint.distribution string A hint for the plugin's execution to be distributed across multiple sandboxes. The default value is single. single means a single instance of the script will run over the entire query data in a single sandbox. per_node means that if the query before the Python block is distributed to partitions, each partition will run in its own sandbox in parallel. hint.remote string This hint is only relevant for cross cluster queries. The default value is auto. auto means the server decides automatically in which cluster the Python code is executed. Setting the value to local forces executing the Python code on the local cluster. Use it in case the Python plugin is disabled on the remote cluster. external_artifacts dynamic A property bag of name and URL pairs for artifacts that are accessible from OneLake storage. See more in Using external artifacts. spill_to_disk bool Specifies an alternative method for serializing the input table to the Python sandbox. For serializing big tables set it to true to speed up the serialization and significantly reduce the sandbox memory consumption. Default is true. Reserved Python variables

The following variables are reserved for interaction between Kusto Query Language and the Python code.

Enable the plugin

The plugin is disabled by default. Before you start, enable the Python plugin in your KQL database.

Python sandbox image

To see the list of packages for the different Python images, see Python package reference.

Note

Use ingestion from query and update policy Example
range x from 1 to 360 step 1
| evaluate python(
//
typeof(*, fx:double),               //  Output schema: append a new fx column to original table 
```
result = df
n = df.shape[0]
g = kargs["gain"]
f = kargs["cycles"]
result["fx"] = g * np.sin(df["x"]/n*2*np.pi*f)
```
, bag_pack('gain', 100, 'cycles', 4)    //  dictionary of parameters
)
| render linechart 

Performance tips Usage tips Example reading the Python script external data
    let script = 
        externaldata(script:string)
        [h'https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py']
        with(format = raw);
    range x from 1 to 360 step 1
    | evaluate python(
        typeof(*, fx:double),
        toscalar(script), 
        bag_pack('gain', 100, 'cycles', 4))
    | render linechart 
Using External Artifacts

External artifacts from OneLake storage can be made available for the script and used at runtime.

The artifacts are made available for the script to be read from a local temporary directory, .\Temp. The names provided in the property bag are used as the local file names. See Example.

For information regarding referencing external packages, see Install packages for the Python plugin.

Refreshing external artifact cache

External artifact files utilized in queries are cached on your cluster. If you make updates to your files in cloud storage and require immediate synchronization with your cluster, you can use the .clear cluster cache external-artifacts command. This command clears the cached files and ensures that subsequent queries run with the latest version of the artifacts.

Install packages for the Python plugin

Install packages as follows:

Prerequisite Install packages
  1. For public packages in PyPi or other channels, download the package and its dependencies.

    pip wheel [-w download-dir] package-name.
    
  2. Create a zip file containing the required package and its dependencies.

    Note

  3. Upload the zip file to the lakehouse.

  4. Copy the OneLake URL (from the zipped file's properties)

  5. Call the python plugin.

Example using external artifacts

Install the Faker package that generates fake data.

range ID from 1 to 3 step 1 
| extend Name=''
| evaluate python(typeof(*), ```if 1:
    from sandbox_utils import Zipackage
    Zipackage.install("Faker.zip")
    from faker import Faker
    fake = Faker()
    result = df
    for i in range(df.shape[0]):
        result.loc[i, "Name"] = fake.name()
    ```,
    external_artifacts=bag_pack('faker.zip', 'https://msit-onelake.dfs.fabric.microsoft.com/MSIT_DEMO_WS/MSIT_DEMO_LH.Lakehouse/Files/Faker.zip;impersonate'))
ID Name 1 Gary Tapia 2 Emma Evans 3 Ashley Bowen

For more examples of UDF functions that use the Python plugin, see the Functions library.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4