A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.langchain.com/docs/integrations/document_loaders/google_drive/ below:

Google Drive | πŸ¦œοΈπŸ”— LangChain

Google Drive

Google Drive is a file storage and synchronization service developed by Google.

This notebook covers how to load documents from Google Drive. Currently, only Google Docs are supported.

Prerequisites​
  1. Create a Google Cloud project or use an existing project
  2. Enable the Google Drive API
  3. Authorize credentials for desktop app
  4. pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
πŸ§‘ Instructions for ingesting your Google Docs data​

Set the environmental variable GOOGLE_APPLICATION_CREDENTIALS to an empty string ("").

By default, the GoogleDriveLoader expects the credentials.json file to be located at ~/.credentials/credentials.json, but this is configurable using the credentials_path keyword argument. Same thing with token.json - default path: ~/.credentials/token.json, constructor param: token_path.

The first time you use GoogleDriveLoader, you will be displayed with the consent screen in your browser for user authentication. After authentication, token.json will be created automatically at the provided or the default path. Also, if there is already a token.json at that path, then you will not be prompted for authentication.

GoogleDriveLoader can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:

%pip install --upgrade --quiet langchain-google-community[drive]
from langchain_google_community import GoogleDriveLoader
loader = GoogleDriveLoader(
folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
token_path="/path/where/you/want/token/to/be/created/google_token.json",

recursive=False,
)

When you pass a folder_id by default all files of type document, sheet and pdf are loaded. You can modify this behaviour by passing a file_types argument

loader = GoogleDriveLoader(
folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
file_types=["document", "sheet"],
recursive=False,
)
Passing in Optional File Loaders​

When processing files other than Google Docs and Google Sheets, it can be helpful to pass an optional file loader to GoogleDriveLoader. If you pass in a file loader, that file loader will be used on documents that do not have a Google Docs or Google Sheets MIME type. Here is an example of how to load an Excel document from Google Drive using a file loader.

from langchain_community.document_loaders import UnstructuredFileIOLoader
from langchain_google_community import GoogleDriveLoader
file_id = "1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz"
loader = GoogleDriveLoader(
file_ids=[file_id],
file_loader_cls=UnstructuredFileIOLoader,
file_loader_kwargs={"mode": "elements"},
)

You can also process a folder with a mix of files and Google Docs/Sheets using the following pattern:

folder_id = "1asMOHY1BqBS84JcRbOag5LOJac74gpmD"
loader = GoogleDriveLoader(
folder_id=folder_id,
file_loader_cls=UnstructuredFileIOLoader,
file_loader_kwargs={"mode": "elements"},
)
Extended usage​

An external (unofficial) component can manage the complexity of Google Drive : langchain-googledrive It's compatible with the Μ€langchain_community.document_loaders.GoogleDriveLoader and can be used in its place.

To be compatible with containers, the authentication uses an environment variable Μ€GOOGLE_ACCOUNT_FILE to credential file (for user or service).

%pip install --upgrade --quiet  langchain-googledrive

from langchain_googledrive.document_loaders import GoogleDriveLoader
loader = GoogleDriveLoader(
folder_id=folder_id,
recursive=False,
num_results=2,
)

By default, all files with these mime-type can be converted to Document.

It's possible to update or customize this. See the documentation of GDriveLoader.

But, the corresponding packages must be installed.

%pip install --upgrade --quiet  unstructured
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")
Loading auth Identities​

Authorized identities for each file ingested by Google Drive Loader can be loaded along with metadata per Document.

from langchain_google_community import GoogleDriveLoader

loader = GoogleDriveLoader(
folder_id=folder_id,
load_auth=True,

)

doc = loader.load()

You can pass load_auth=True, to add Google Drive document access identities to metadata.

Loading extended metadata​

Following extra fields can also be fetched within metadata of each Document:

from langchain_google_community import GoogleDriveLoader

loader = GoogleDriveLoader(
folder_id=folder_id,
load_extended_matadata=True,

)

doc = loader.load()

You can pass load_extended_matadata=True, to add Google Drive document extended details to metadata.

Customize the search pattern​

All parameter compatible with Google list() API can be set.

To specify the new pattern of the Google request, you can use a PromptTemplate(). The variables for the prompt can be set with kwargs in the constructor. Some pre-formated request are proposed (use {query}, {folder_id} and/or {mime_type}):

You can customize the criteria to select the files. A set of predefined filter are proposed:

template description gdrive-all-in-folder Return all compatible files from a folder_id gdrive-query Search query in all drives gdrive-by-name Search file with name query gdrive-query-in-folder Search query in folder_id (and sub-folders if recursive=true) gdrive-mime-type Search a specific mime_type gdrive-mime-type-in-folder Search a specific mime_type in folder_id gdrive-query-with-mime-type Search query with a specific mime_type gdrive-query-with-mime-type-and-folder Search query with a specific mime_type and in folder_id
loader = GoogleDriveLoader(
folder_id=folder_id,
recursive=False,
template="gdrive-query",
query="machine learning",
num_results=2,
supportsAllDrives=False,
)
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")

You can customize your pattern.

from langchain_core.prompts.prompt import PromptTemplate

loader = GoogleDriveLoader(
folder_id=folder_id,
recursive=False,
template=PromptTemplate(
input_variables=["query", "query_name"],
template="fullText contains '{query}' and name contains '{query_name}' and trashed=false",
),
query="machine learning",
query_name="ML",
num_results=2,
)
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")

The conversion can manage in Markdown format:

Set the attribut return_link to True to export links.

Modes for GSlide and GSheet​

The parameter mode accepts different values:

The parameter gslide_mode accepts different values:

loader = GoogleDriveLoader(
template="gdrive-mime-type",
mime_type="application/vnd.google-apps.presentation",
gslide_mode="slide",
num_results=2,
)
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")

The parameter gsheet_mode accepts different values:

loader = GoogleDriveLoader(
template="gdrive-mime-type",
mime_type="application/vnd.google-apps.spreadsheet",
gsheet_mode="elements",
num_results=2,
)
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")
Advanced usage​

All Google File have a 'description' in the metadata. This field can be used to memorize a summary of the document or others indexed tags (See method lazy_update_description_with_summary()).

If you use the mode="snippet", only the description will be used for the body. Else, the metadata['summary'] has the field.

Sometime, a specific filter can be used to extract some information from the filename, to select some files with specific criteria. You can use a filter.

Sometimes, many documents are returned. It's not necessary to have all documents in memory at the same time. You can use the lazy versions of methods, to get one document at a time. It's better to use a complex query in place of a recursive search. For each folder, a query must be applied if you activate recursive=True.

import os

loader = GoogleDriveLoader(
gdrive_api_file=os.environ["GOOGLE_ACCOUNT_FILE"],
num_results=2,
template="gdrive-query",
filter=lambda search, file: "#test" not in file.get("description", ""),
query="machine learning",
supportsAllDrives=False,
)
for doc in loader.load():
print("---")
print(doc.page_content.strip()[:60] + "...")

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4