Hyperbrowser is a platform for running and scaling headless browsers. It lets you launch and manage browser sessions at scale and provides easy to use solutions for any webscraping needs, such as scraping a single page or crawling an entire site.
Key Features:
This notebook provides a quick overview for getting started with Hyperbrowser document loader.
For more information about Hyperbrowser, please visit the Hyperbrowser website or if you want to check out the docs, you can visit the Hyperbrowser docs.
Overview Integration details Class Package Local Serializable JS support HyperbrowserLoader langchain-hyperbrowser ❌ ❌ ❌ Loader features Source Document Lazy Loading Native Async Support HyperbrowserLoader ✅ ✅ SetupTo access Hyperbrowser document loader you'll need to install the langchain-hyperbrowser
integration package, and create a Hyperbrowser account and get an API key.
Head to Hyperbrowser to sign up and generate an API key. Once you've done this set the HYPERBROWSER_API_KEY environment variable:
InstallationInstall langchain-hyperbrowser.
%pip install -qU langchain-hyperbrowser
Initialization
Now we can instantiate our model object and load documents:
from langchain_hyperbrowser import HyperbrowserLoader
loader = HyperbrowserLoader(
urls="https://example.com",
api_key="YOUR_API_KEY",
)
Load
docs = loader.load()
docs[0]
Document(metadata={'title': 'Example Domain', 'viewport': 'width=device-width, initial-scale=1', 'sourceURL': 'https://example.com'}, page_content='Example Domain\n\n# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this\ndomain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)')
Lazy Load
page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
page = []
Advanced Usage
You can specify the operation to be performed by the loader. The default operation is scrape
. For scrape
, you can provide a single URL or a list of URLs to be scraped. For crawl
, you can only provide a single URL. The crawl
operation will crawl the provided page and subpages and return a document for each page.
loader = HyperbrowserLoader(
urls="https://hyperbrowser.ai", api_key="YOUR_API_KEY", operation="crawl"
)
Optional params for the loader can also be provided in the params
argument. For more information on the supported params, visit https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait or https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait.
loader = HyperbrowserLoader(
urls="https://example.com",
api_key="YOUR_API_KEY",
operation="scrape",
params={"scrape_options": {"include_tags": ["h1", "h2", "p"]}},
)
API reference
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4