RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-python below:

Use Python to manage data in Azure Data Lake Storage - Azure Storage

This article shows you how to use Python to create and manage directories and files in storage accounts that have a hierarchical namespace.

To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use Python to manage ACLs in Azure Data Lake Storage.

Package (PyPi) | Samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

An Azure subscription. See Get Azure free trial.
A storage account that has hierarchical namespace enabled. Follow these instructions to create one.

Set up your project

This section walks you through preparing a project to work with the Azure Data Lake Storage client library for Python.

From your project directory, install packages for the Azure Data Lake Storage and Azure Identity client libraries using the pip install command. The azure-identity package is needed for passwordless connections to Azure services.

pip install azure-storage-file-datalake azure-identity

Then open your code file and add the necessary import statements. In this example, we add the following to our .py file:

import os
from azure.storage.filedatalake import (
    DataLakeServiceClient,
    DataLakeDirectoryClient,
    FileSystemClient
)
from azure.identity import DefaultAzureCredential

Note

Multi-protocol access on Data Lake Storage enables applications to use both Blob APIs and Data Lake Storage Gen2 APIs to work with data in storage accounts with hierarchical namespace (HNS) enabled. When working with capabilities unique to Data Lake Storage Gen2, such as directory operations and ACLs, use the Data Lake Storage Gen2 APIs, as shown in this article.

When choosing which APIs to use in a given scenario, consider the workload and the needs of your application, along with the known issues and impact of HNS on workloads and applications.

To work with the code examples in this article, you need to create an authorized DataLakeServiceClient instance that represents the storage account. You can authorize a DataLakeServiceClient object using Microsoft Entra ID, an account access key, or a shared access signature (SAS).

You can use the Azure identity client library for Python to authenticate your application with Microsoft Entra ID.

Create an instance of the DataLakeServiceClient class and pass in a DefaultAzureCredential object.

def get_service_client_token_credential(self, account_name) -> DataLakeServiceClient:
    account_url = f"https://{account_name}.dfs.core.windows.net"
    token_credential = DefaultAzureCredential()

    service_client = DataLakeServiceClient(account_url, credential=token_credential)

    return service_client

To learn more about using DefaultAzureCredential to authorize access to data, see Overview: Authenticate Python apps to Azure using the Azure SDK.

To use a shared access signature (SAS) token, provide the token as a string and initialize a DataLakeServiceClient object. If your account URL includes the SAS token, omit the credential parameter.

def get_service_client_sas(self, account_name: str, sas_token: str) -> DataLakeServiceClient:
    account_url = f"https://{account_name}.dfs.core.windows.net"

    # The SAS token string can be passed in as credential param or appended to the account URL
    service_client = DataLakeServiceClient(account_url, credential=sas_token)

    return service_client

To learn more about generating and managing SAS tokens, see the following article:

Grant limited access to Azure Storage resources using shared access signatures (SAS)

You can authorize access to data using your account access keys (Shared Key). The following code example creates a DataLakeServiceClient instance that is authorized with the account key:

def get_service_client_account_key(self, account_name, account_key) -> DataLakeServiceClient:
    account_url = f"https://{account_name}.dfs.core.windows.net"
    service_client = DataLakeServiceClient(account_url, credential=account_key)

    return service_client

Caution

Authorization with Shared Key is not recommended as it may be less secure. For optimal security, disable authorization via Shared Key for your storage account, as described in Prevent Shared Key authorization for an Azure Storage account.

Use of access keys and connection strings should be limited to initial proof of concept apps or development prototypes that don't access production or sensitive data. Otherwise, the token-based authentication classes available in the Azure SDK should always be preferred when authenticating to Azure resources.

Microsoft recommends that clients use either Microsoft Entra ID or a shared access signature (SAS) to authorize access to data in Azure Storage. For more information, see Authorize operations for data access.

Create a container

A container acts as a file system for your files. You can create a container by using the following method:

DataLakeServiceClient.create_file_system

The following code example creates a container and returns a FileSystemClient object for later use:

def create_file_system(self, service_client: DataLakeServiceClient, file_system_name: str) -> FileSystemClient:
    file_system_client = service_client.create_file_system(file_system=file_system_name)

    return file_system_client

Create a directory

You can create a directory reference in the container by using the following method:

FileSystemClient.create_directory

The following code example adds a directory to a container and returns a DataLakeDirectoryClient object for later use:

def create_directory(self, file_system_client: FileSystemClient, directory_name: str) -> DataLakeDirectoryClient:
    directory_client = file_system_client.create_directory(directory_name)

    return directory_client

Rename or move a directory

You can rename or move a directory by using the following method:

DataLakeDirectoryClient.rename_directory

Pass the path with the new directory name in the new_name argument. The value must have the following format: {filesystem}/{directory}/{subdirectory}.

The following code example shows how to rename a subdirectory:

def rename_directory(self, directory_client: DataLakeDirectoryClient, new_dir_name: str):
    directory_client.rename_directory(
        new_name=f"{directory_client.file_system_name}/{new_dir_name}")

Upload a file to a directory

You can upload content to a new or existing file by using the following method:

DataLakeFileClient.upload_data

The following code example shows how to upload a file to a directory using the upload_data method:

def upload_file_to_directory(self, directory_client: DataLakeDirectoryClient, local_path: str, file_name: str):
    file_client = directory_client.get_file_client(file_name)

    with open(file=os.path.join(local_path, file_name), mode="rb") as data:
        file_client.upload_data(data, overwrite=True)

You can use this method to create and upload content to a new file, or you can set the overwrite argument to True to overwrite an existing file.

Append data to a file

You can upload data to be appended to a file by using the following method:

DataLakeFileClient.append_data method.

The following code example shows how to append data to the end of a file using these steps:

Create a DataLakeFileClient object to represent the file resource you're working with.
Upload data to the file using the append_data method.
Complete the upload by calling the flush_data method to write the previously uploaded data to the file.

def append_data_to_file(self, directory_client: DataLakeDirectoryClient, file_name: str):
    file_client = directory_client.get_file_client(file_name)
    file_size = file_client.get_file_properties().size
    
    data = b"Data to append to end of file"
    file_client.append_data(data, offset=file_size, length=len(data))

    file_client.flush_data(file_size + len(data))

With this method, data can only be appended to a file and the operation is limited to 4000 MiB per request.

Download from a directory

The following code example shows how to download a file from a directory to a local file using these steps:

Create a DataLakeFileClient object to represent the file you want to download.
Open a local file for writing.
Call the DataLakeFileClient.download_file method to read from the file, then write the data to the local file.

def download_file_from_directory(self, directory_client: DataLakeDirectoryClient, local_path: str, file_name: str):
    file_client = directory_client.get_file_client(file_name)

    with open(file=os.path.join(local_path, file_name), mode="wb") as local_file:
        download = file_client.download_file()
        local_file.write(download.readall())
        local_file.close()

List directory contents

You can list directory contents by using the following method and enumerating the result:

FileSystemClient.get_paths

Enumerating the paths in the result may make multiple requests to the service while fetching the values.

The following code example prints the path of each subdirectory and file that is located in a directory:

def list_directory_contents(self, file_system_client: FileSystemClient, directory_name: str):
    paths = file_system_client.get_paths(path=directory_name)

    for path in paths:
        print(path.name + '\n')

Delete a directory

You can delete a directory by using the following method:

DataLakeDirectoryClient.delete_directory

The following code example shows how to delete a directory:

def delete_directory(self, directory_client: DataLakeDirectoryClient):
    directory_client.delete_directory()