Processing binary files from Azure Blob Storage is a key scenario for Azure Functions. This end-to-end JavaScript sample showcases an event-based Blob storage triggered function that converts PDF documents to text at scale. It also uses managed identity and a virtual network between the function app and storage account for security best practices.
This solution creates two containers in blob storage, unprocessed-pdf
and processed-text
. An Event Grid-based Blob storage triggered function written in JavaScript is executed when a PDF file is added to the unprocessed-pdf
container, converts the PDF to text using the PDF.js library, and saves the text to the processed-text
container.
Using an Event Grid-based Blob storage trigger reduces latency by triggering your function instantly as changes occur in the container. This type of Blob storage trigger is the only type of Blob storage trigger that can be used when running in a Flex Consumption plan.
The communication between the function and the storage account happens via a system assigned managed identity, and the storage account is restricted behind a virtual network. The Azure Function uses VNet integration to reach the storage account. You can opt out of a VNet being used in the sample by setting SKIP_VNET to true in the parameters.
Important
This sample creates several resources. Make sure to delete the resource group after testing to minimize charges!
Before you can run this sample, you must have the following:
To set up this sample, follow these steps:
Alternatively, you can opt-out of a VNet being used in the sample. To do so, use azd env
to configure SKIP_VNET
to true
before running azd up
:
azd env set SKIP_VNET true azd upInspect the solution (optional)
Networking
in the Security + networking
section, and add your client IP address to the Firewall. After a minute you will be able to browse to the data storage containers. This step is not required if you turned off the VNet creation. processed-text
and unprocessed-pdf
containers, which are empty.unprocessed-pdf
container. There are sample PDF files in the local data folder. For example, once all files in data folder are uploaded to the unprocessed-pdf
container you should see: processed-text
folder and notice that within seconds all the uploaded PDF files have now been processed into text files by the Flex Consumption hosted function: When you no longer need the resources created in this sample, run the following command to delete the Azure resources:
For more information on Azure Functions, Event Hubs, and VNet integration, see the following resources:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4