Stay organized with collections Save and categorize content based on your preferences.
Analyze multimodal data in Python with BigQuery DataFramesPreview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see the launch stage descriptions.
Note: To provide feedback or request support for this feature, send an email to bq-objectref-feedback@google.com.This tutorial shows you how to analyze multimodal data in a Python notebook by using BigQuery DataFrames classes and methods.
This tutorial uses the product catalog from the public Cymbal pet store dataset.
To upload a notebook already populated with the tasks covered in this tutorial, see BigFrames Multimodal DataFrame.
ObjectivesIn this document, you use the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator.
New Google Cloud users might be eligible for a
free trial.
For more information about, see the following pricing pages:
Before you beginIn the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery, BigQuery Connection, Cloud Storage, and Vertex AI APIs.
To get the permissions that you need to complete this tutorial, ask your administrator to grant you the following IAM roles:
roles/bigquery.connectionAdmin
)roles/resourcemanager.projectIamAdmin
)roles/storage.admin
)roles/bigquery.user
)roles/bigquery.dataEditor
)roles/bigquery.objectRefAdmin
)roles/bigquery.readSessionUser
)roles/aiplatform.notebookRuntimeUser
)roles/aiplatform.notebookRuntimeUser
)roles/dataform.codeCreator
)For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Set upIn this section, you create the Cloud Storage bucket, connection, and notebook used in this tutorial.
Create a bucketCreate a Cloud Storage bucket for storing transformed objects:
In the Google Cloud console, go to the Buckets page.
Click add_box Create.
On the Create a bucket page, in the Get started section, enter a globally unique name that meets the bucket name requirements.
Click Create.
Create a Cloud resource connection and get the connection's service account. BigQuery uses the connection to access objects in Cloud Storage.
Go to the BigQuery page.
In the Explorer pane, click add Add data.
The Add data dialog opens.
In the Filter By pane, in the Data Source Type section, select Business Applications.
Alternatively, in the Search for data sources field, you can enter Vertex AI
.
In the Featured data sources section, click Vertex AI.
Click the Vertex AI Models: BigQuery Federation solution card.
In the Connection type list, select Vertex AI remote models, remote functions and BigLake (Cloud Resource).
In the Connection ID field, type bigframes-default-connection
.
Click Create connection.
Click Go to connection.
In the Connection info pane, copy the service account ID for use in a later step.
Grant the connection's service account the roles that it needs to access Cloud Storage and Vertex AI. You must grant these roles in the same project you created or selected in the Before you begin section.
To grant the role, follow these steps:
Go to the IAM & Admin page.
Click person_add Grant access.
In the New principals field, enter the service account ID that you copied earlier.
In the Select a role field, choose Cloud Storage, and then select Storage Object User.
Click Add another role.
In the Select a role field, select Vertex AI, and then select Vertex AI User.
Click Save.
Create a notebook where you can run Python code:
Go to the BigQuery page.
In the tab bar of the editor pane, click the arrow_drop_down drop-down arrow next to add_box SQL query, and then click Notebook.
In the Start with a template pane, click Close.
Click Connect > Connect to a runtime.
If you have an existing runtime, accept the default settings and click Connect. If you don't have an existing runtime, select Create new Runtime, and then click Connect.
It might take several minutes for the runtime to get set up.
Create a multimodal DataFrame that integrates structured and unstructured data by using the from_glob_path
method of the Session
class:
Click play_circle_filled Run.
The final call to df_image
returns the images that have been added to the DataFrame. Alternatively, you could call the .display
method.
Combine text and image data in the multimodal DataFrame:
Click Run play_circle_filled.
The code returns the DataFrame data.
In the notebook, create a code cell and copy the following code into it:
Click Run play_circle_filled.
The code returns images from the DataFrame where the author
column value is alice
.
Transform image data by using the following methods of the Series.BlobAccessor
class:
The transformed images are written to Cloud Storage.
Transform images:
{dst_bucket}
to refer to the bucket that you created, in the format gs://mybucket
.Click Run play_circle_filled.
The code returns the original images as well as all of their transformations.
Generate text from multimodal data by using the predict
method of the GeminiTextGenerator
class:
Click Run play_circle_filled.
The code returns the first two images in df_image
, along with text generated in response to the question what item is it?
for both images.
In the notebook, create a code cell and copy the following code into it:
Click Run play_circle_filled.
The code returns the first two images in df_image
, with text generated in response to the question what item is it?
for the first image, and text generated in response to the question what color is the picture?
for the second image.
Generate embeddings for multimodal data by using the predict
method of the MultimodalEmbeddingGenerator
class:
Click Run play_circle_filled.
The code returns the embeddings generated by a call to an embedding model.
Chunk PDF objects by using the pdf_chunk
method of the Series.BlobAccessor
class:
Click Run play_circle_filled.
The code returns the chunked PDF data.
appspot.com
URL, delete selected resources inside the project instead of deleting the whole project.If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4