A plugin for playwright-extra and puppeteer-extra to provide Smart Proxy Manager specific functionalities.
QuickStart for playwright-extranpm install playwright playwright-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth @cliqz/adblocker-playwright
sample.js
with following content and replace <SPM_APIKEY>
with your SPM Apikey// playwright-extra is a drop-in replacement for playwright, // it augments the installed playwright with plugin functionality const { chromium } = require('playwright-extra') // add zyte-smartproxy-plugin const SmartProxyPlugin = require('zyte-smartproxy-plugin'); chromium.use(SmartProxyPlugin({ spm_apikey: '<SPM_APIKEY>', static_bypass: false, // enable to save bandwidth (but may break some websites) })); // add stealth plugin and use defaults (all evasion techniques) const StealthPlugin = require('puppeteer-extra-plugin-stealth'); chromium.use(StealthPlugin()); // create adblocker to block all ads (saves bandwidth) const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright'); const fetch = require('cross-fetch'); // playwright usage as normal (async () => { const adBlocker = await PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch); const browser = await chromium.launch({ headless: false }); const page = await browser.newPage({ignoreHTTPSErrors: true}); // uncomment to enable adBlocker (saves bandwidth but may break some websites) // adBlocker.enableBlockingInPage(page); await page.goto('https://toscrape.com', {timeout: 180000}); await page.screenshot({path: 'screenshot.png'}) await browser.close(); })();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
sample.js
using Nodenpm install puppeteer puppeteer-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker
sample.js
with following content and replace <SPM_APIKEY>
with your SPM Apikey// puppeteer-extra is a drop-in replacement for puppeteer, // it augments the installed puppeteer with plugin functionality const puppeteer = require('puppeteer-extra') // add zyte-smartproxy-plugin const SmartProxyPlugin = require('zyte-smartproxy-plugin'); puppeteer.use(SmartProxyPlugin({ spm_apikey: '<SPM_APIKEY>', static_bypass: false, // enable to save bandwidth (but may break some websites) })); // add stealth plugin and use defaults (all evasion techniques) const StealthPlugin = require('puppeteer-extra-plugin-stealth'); puppeteer.use(StealthPlugin()); // uncomment to enable adblocker plugin (saves bandwidth but may break some websites) // const AdBlockerPlugin = require('puppeteer-extra-plugin-adblocker'); // puppeteer.use(AdBlockerPlugin({blockTrackers: true})); // puppeteer usage as normal (async () => { const browser = await puppeteer.launch({ headless: false }); const page = await browser.newPage({ignoreHTTPSErrors: true}); await page.goto('https://toscrape.com', {timeout: 180000}); await page.screenshot({path: 'screenshot.png'}) await browser.close(); })();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
sample.js
using Nodespm_apikey
undefined
Zyte Smart Proxy Manager API key that can be found on your zyte.com account. spm_host
http://proxy.zyte.com:8011
Zyte Smart Proxy Manager proxy host. static_bypass
true
When true
Zyte SmartProxy Plugin will skip proxy use (saves proxy bandwidth) for static assets defined by static_bypass_regex
or pass false
to use proxy. static_bypass_regex
/.*?\.(?:txt|json|css|less|gif|ico|jpe?g|svg|png|webp|mkv|mp4|mpe?g|webm|eot|ttf|woff2?)$/
Regex to use filtering URLs for static_bypass
. headers
{'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'}
List of headers to be appended to requests spm_session_id
undefined
When specified Zyte SmartProxy Plugin will use an existing Zyte Smart Proxy Manager session, otherwise a new session will be created.
Some websites may not work with AdBlocker or static_bypass
enabled (default). Try to disable them if you encounter any issues.
When using headless: true
mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using 'X-Crawlera-Profile': 'desktop' in that case:
puppeteer.use(SmartProxyPlugin({spm_apikey: '<SPM_APIKEY>', headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}}));
--proxy-server=http://proxy.zyte.com:8011 --disable-site-isolation-trials
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4