A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/Azure-Samples/functions-e2e-blob-pdf-to-text below:

Azure-Samples/functions-e2e-blob-pdf-to-text: This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale.

Flex Consumption plan - PDF to text processor | Azure Functions

Processing binary files from Azure Blob Storage is a key scenario for Azure Functions. This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale. It also uses managed identity and a virtual network between the function app and storage account for security best practices.

This solution creates two containers in blob storage, unprocessed-pdf and processed-text. An Event Grid-based Blob storage triggered function written in JavaScript is executed when a PDF file is added to the unprocessed-pdf container, converts the PDF to text using the PDF.js library, and saves the text to the processed-text container.

Using an Event Grid-based Blob storage trigger reduces latency by triggering your function instantly as changes occur in the container. This type of Blob storage trigger is the only type of Blob storage trigger that can be used when running in a Flex Consumption plan.

The communication between the function and the storage account happens via a system assigned managed identity, and the storage account is restricted behind a virtual network. The Azure Function uses VNet integration to reach the storage account. You can opt out of a VNet being used in the sample by setting SKIP_VNET to true in the parameters.

Important

This sample creates several resources. Make sure to delete the resource group after testing to minimize charges!

Before you can run this sample, you must have the following:

Provision the solution on Azure

To set up this sample, follow these steps:

  1. Clone this repository to your local machine.
  2. in the root folder use the Azure Developer CLI (azd) to provision a new resource group with all the resources for the sample using the the environment name you and in the location you provide.

Alternatively, you can opt-out of a VNet being used in the sample. To do so, use azd env to configure SKIP_VNET to true before running azd up:

azd env set SKIP_VNET true
azd up
Inspect the solution (optional)
  1. Once the deployment is done, inspect the new resource group. The Flex Consumption function app and plan, storage, and App Insights have been created and configured:
  2. By default the storage account will be locked to the virtual network. To be able to access the containers using the Azure Portal, you can add your client IP Address to the storage account Firewall. In the Azure Portal open the storage account created by the sample, go to Networking in the Security + networking section, and add your client IP address to the Firewall. After a minute you will be able to browse to the data storage containers. This step is not required if you turned off the VNet creation.
  3. The storage account has two extra containers in blob storage:
  4. Open the processed-text and unprocessed-pdf containers, which are empty.
  1. Using the Azure Portal or any other tool, upload PDF files to the unprocessed-pdf container. There are sample PDF files in the local data folder. For example, once all files in data folder are uploaded to the unprocessed-pdf container you should see:
  2. Browse to the processed-text folder and notice that within seconds all the uploaded PDF files have now been processed into text files by the Flex Consumption hosted function:

When you no longer need the resources created in this sample, run the following command to delete the Azure resources:

For more information on Azure Functions, Event Hubs, and VNet integration, see the following resources:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4