For every attachment downloaded by the page, page.on("download") event is emitted. All these attachments are downloaded into a temporary folder. You can obtain the download url, file name and payload stream using the Download object from the event.
You can specify where to persist downloaded files using the downloads_path option in browser_type.launch().
note
Downloaded files are deleted when the browser context that produced them is closed.
Here is the simplest way to handle the file download:
with page.expect_download() as download_info:
page.get_by_text("Download file").click()
download = download_info.value
download.save_as("/path/to/save/at/" + download.suggested_filename)
async with page.expect_download() as download_info:
await page.get_by_text("Download file").click()
download = await download_info.value
await download.save_as("/path/to/save/at/" + download.suggested_filename)
Variations
If you have no idea what initiates the download, you can still handle the event:
page.on("download", lambda download: print(download.path()))
async def handle_download(download):
print(await download.path())
page.on("download", handle_download)
Note that handling the event forks the control flow and makes the script harder to follow. Your scenario might end while you are downloading a file since your main control flow is not awaiting for this operation to resolve.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4