All websites and online applications today face risks from bot attacks. These automated programs can overload servers, scrape content, generate fake user activity, or attempt unauthorized access to sensitive data. Harmful bots can range from simple command-line scripts to full-featured automated web browsers.
On the client side, bot attacks were traditionally mitigated by CAPTCHA-style challenges, which have always been detrimental to user experience. With recent advances in machine learning, some bots can even solve CAPTCHA challenges faster and more accurately than humans.
On the server side, your default Web Application Firewall can offer some level of protection from bots.
What is a Web Application Firewall?A Web Application Firewall (WAF) protects web applications from various online threats, including bot attacks. It is a software barrier between a web application and the internet, monitoring and filtering HTTP traffic between them. WAFs analyze incoming requests and block or allow traffic based on predefined security rules.
A WAF can detect bots using a handful of simple techniques:
These rigid rules provide a base level of protection from simple bots. But more sophisticated attackers can mimic human traffic patterns, spoof request metadata, or use proxies to change their IP address.
What is Fingerprint Bot Detection?Fingerprint provides a robust client-side Bot Detection solution. It runs in the browser, collecting vast amounts of data that bots leak (errors, network overrides, browser attribute inconsistencies, API changes, and more) to reliably distinguish real users from headless browsers, automation tools, their derivatives, and plugins. We have covered how to use Fingerprint Bot Detection to protect data endpoints accessible from your website in the Content scraping prevention tutorial.
In this article, you will explore how to integrate Fingerprint Bot Detection into your application firewall. By dynamically blocking IP addresses linked to bot visits, you can leverage the strengths of both Fingerprint and WAF to fortify your application against bot attacks.
Integrating Fingerprint Bot Detection with your Web Application FirewallThe Content scraping prevention tutorial already covered the following steps to detect bots on the client side and deny them access to your data:
Please refer to the full article for a detailed explanation of each step.
This tutorial will build on the existing functionality and take things to the next level. First, we will save the IP address of each detected bot to a database. Then, we will build a simple dashboard where you can monitor your bot visits and manually block the associated IP addresses.
You can try the live final result on our website. The example uses Next.js, but the same principles apply to any web application. The code snippets in the article are simplified for readability, but you can find the full source code on GitHub.
1. Save the IP address of each detected botUse the Server API to get the Bot Detection result. You can use one of our Server SDKs:
import {
FingerprintJsServerApiClient
} from '@fingerprintjs/fingerprintjs-pro-server-api';
const client = new FingerprintJsServerApiClient({ apiKey: SERVER_API_KEY })
const eventResponse = await client.getEvent(requestId);
botDetection = eventResponse.products?.botd?.data;
The botDetection
result has the following information available:
{
"bot": {
"result": "bad"
"type": "headlessChrome"
},
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/110.0.5481.177 Safari/537.36",
"url": "https://yourdomain.com/search",
"ip": "61.127.217.15",
"time": "2023-09-08T16:43:23.241Z",
"requestId": "1234557403227.AbclEC",
}
A "good" bot might be a search engine crawler or a monitoring tool. A "bad" bot means a headless browser or automation tool is accessing the website. In that case, deny the data request and save the bot visit to a database:
if (botDetection?.bot.result === "bad") {
res.status(403).json({
message: "Malicious bot detected, scraping this data is not allowed.",
});
BotVisitDbModel.create({
ip: botData.ip,
requestId: botData.requestId,
timestamp: botData.time,
botResult: botData.bot.result,
botType: botData.bot.type,
})
return;
}
2. Display detected bot visits in a dashboard (optional)
You can build an internal dashboard that shows all detected bot visits and allows you to manually block the associated IP addresses. Alternatively, you could automatically block bot IPs the moment they are detected (jump to Step 3 if you prefer).
First, build an API endpoint for retrieving bot visits:
export default async function handler(_req, res,) {
const botVisits = await BotVisitDbModel.findAll({
order: [['timestamp', 'DESC']],
});
res.status(200).json(botVisits);
}
Use the endpoint to display a table of bot visits in a table:
import { useMutation, useQuery } from 'react-query';
export default function BotVisitsPage() {
const { data: botVisits } = useQuery('get-bot-visits', () =>
fetch('/api/bot-firewall/get-bot-visits').then((res) => res.json()),
);
const { mutate: blockIp } = useMutation('block-ip', async (ip) =>
fetch('/api/bot-firewall/block-ip', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ ip }),
}).then((res) => res.json()),
);
return (
<table>
<thead>
<tr>
<th>Timestamp</th>
<th>Bot Type</th>
<th>IP Address</th>
<th>Action</th>
</tr>
</thead>
<tbody>
{botVisits?.map((botVisit) => {
return (
<tr key={botVisit.requestId}>
<td>{botVisit.timestamp}</td>
<td>
{botVisit.botResult} ({botVisit.botType})
</td>
<td>{botVisit.ip}</td>
<td>
<button onClick={() => blockIp(botVisit.ip)}>Block this IP</button>
</td>
</tr>
);
})}
</tbody>
</table>
);
}
3. Block bot IPs in your web application firewall
In this example, we are proxying the website through Cloudflare, using that as the web application firewall, and updating a custom ruleset via the Cloudflare API. The section below assumes basic familiarity with Cloudflare, but this approach applies to any WAF solution with API-editable access rules.
export default async function blockIp(req, res) {
const { ip } = req.body;
await BlockedIpDbModel.upsert({
ip,
timestamp: new Date().toISOString(),
});
const blockedIps = (
await BlockedIpDbModel.findAll({
order: [['timestamp', 'DESC']],
})
).map((ip) => ip.ip);
const newRules = await buildFirewallRules(blockedIps);
await updateRulesetUsingCloudflareAPI(newRules);
return res.status(200).json({ result: 'success' });
}
Note: In this example, we block IP addresses directly in the Cloudflare Custom Rules. We use our own database to keep track of currently blocked IPs and update the entire ruleset on every change. Alternatively, you could use Lists to store blocked IPs and update just that, bearing in mind the List's limitations around IPv6.
Cloudflare rule expressions are limited to 4096 characters, so you can fit a maximum of 84 IPv6 addresses per rule. The buildFirewallRules
function splits the list of blocked IPs into multiple rules if necessary.
The free Cloudflare plan allows up to five custom rules, with more rules available on higher plans. These limitations will vary depending on your WAF provider, plan, and the chosen blocking approach (Cloudflare Lists can accommodate more IP addresses, for example).
const MAX_IPS_PER_RULE = 84;
const MAX_RULES = 5;
const MAX_BLOCKED_IPS = MAX_IPS_PER_RULE * MAX_RULES;
export const buildFirewallRules = async (
blockedIps,
maxIpsPerRule = MAX_IPS_PER_RULE,
): Promise<CloudflareRule[]> => {
const chunks = _.chunk(blockedIps, maxIpsPerRule);
const ruleExpressions = chunks.map((chunk) => {
const ipList = chunk.map((ip) => `"${ip}"`).join(' ');
return `http.x_forwarded_for in {${ipList}}`;
});
const rules: CloudflareRule[] = ruleExpressions.map((expression, index) => ({
action: 'block',
description: `Block Bot IP addresses #${index + 1}`,
expression,
}));
return rules;
};
Finally, use the Cloudflare API, to update your custom ruleset with the rules compiled above. You are going to need:
async function updateRulesetUsingCloudflareAPI(rules: CloudflareRule[]) {
const apiToken = process.env.CLOUDFLARE_API_TOKEN ?? '';
const zoneId = process.env.CLOUDFLARE_ZONE_ID ?? '';
const customRulesetId = process.env.CLOUDFLARE_RULESET_ID ?? '';
const url = `https://api.cloudflare.com/client/v4/zones/${zoneId}/rulesets/${customRulesetId}`;
const options = {
method: 'PUT',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiToken}`,
},
body: JSON.stringify({
description: 'Custom ruleset for blocking Fingerprint-detected bot IPs',
kind: 'root',
name: 'default',
phase: 'http_request_firewall_custom',
rules,
}),
};
const response = await fetch(url, options);
if (!response.ok) {
console.error(response.statusText, await response.json());
throw new Error('Updating firewall ruleset failed', { cause: response.statusText });
}
return await response.json();
}
4. Unblock IP addresses
Unblocking an IP address works the same as blocking one, only delete the IP from the table of blocked IPs instead of adding it. Then, update the Cloudflare ruleset as before.
export default async function unblockIp(req, res) {
const { ip } = req.body;
BlockedIpDbModel.destroy({
where: {
ip,
},
});
const blockedIps = await getBlockedIps();
const newRules = await buildFirewallRules(blockedIps);
await updateRulesetUsingCloudflareAPI(newRules);
return res.status(200).json({ result: 'success' });
}
5. Define a time limit for blocking IP addresses
You might want to block IP addresses only for a limited time, and then unblock them automatically. Cloudflare does not support defining a time-to-live on its custom rules. But you can create a simple cron job that runs periodically, deletes expired IP blocks, and updates the Cloudflare ruleset:
import { syncFirewallRuleset } from '../src/server/botd-firewall/cloudflareApiHelper';
import { schedule } from 'node-cron';
schedule('*/5 * * * *', () => {
deleteOldIpBlocks();
});
const IP_BLOCK_TIME_TO_LIVE_MS = 1000 * 60 * 60;
async function deleteOldIpBlocks() {
await BlockedIpDbModel.destroy({
where: {
timestamp: {
[Op.lt]: new Date(Date.now() - IP_BLOCK_TIME_TO_LIVE_MS).toISOString(),
},
},
});
const blockedIps = await getBlockedIps();
const newRules = await buildFirewallRules(blockedIps);
await updateRulesetUsingCloudflareAPI(newRules);
}
Explore the Bot Firewall Demo
We built a fully open-source Bot Firewall demo to demonstrate the concepts above. Try it to see how you can use Fingerprint Bot Detection in combination with your web application firewall to better protect yourself from malicious bots.
To prevent users from interfering with other people's demo experience, the demo only allows you to block your own IP. To try it out, visit the web scraping demo as a bot using a locally running browser automation tool like Puppeteer or Playwright.
For example, assuming you already have Node and NPM installed, you can visit the page using Playwright:
mkdir bot-firewall-test && cd bot-firewall-test
.npm init -y
.npm install playwright
.npx playwright install
.index.js
file like below. Note that the bot must spend enough time on the page for the Fingerprint JS agent to load and identify it.const playwright = require("playwright");
(async () => {
const browser = await playwright["chromium"].launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.goto("https://demo.fingerprint.com/web-scraping");
await page.waitForTimeout(3000);
console.log(await page.getByRole("heading").first().textContent());
await browser.close();
})();
node index.js
. Your bot visit will be saved to the database.node index.js
again or just visit the web scraping page using your regular browser.The bot IP will be blocked from loading the page completely.
Note: If you use iCloud Private Relay, your Safari will have a different IP address than your local bot, but the bot itself is still blocked.
Feel free to jump into the GitHub repo to see the full source code. If you have any questions or want to learn more about Fingerprint Bot Detection, you can join our Discord server, get in touch with our sales team, or start a 14-day free trial.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3