RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://fingerprint.com/blog/bot-detection-powered-application-firewall/ below:

How to Identify and Block Bots in Your Firewall

What are bot attacks?

All websites and online applications today face risks from bot attacks. These automated programs can overload servers, scrape content, generate fake user activity, or attempt unauthorized access to sensitive data. Harmful bots can range from simple command-line scripts to full-featured automated web browsers.

On the client side, bot attacks were traditionally mitigated by CAPTCHA-style challenges, which have always been detrimental to user experience. With recent advances in machine learning, some bots can even solve CAPTCHA challenges faster and more accurately than humans.

On the server side, your default Web Application Firewall can offer some level of protection from bots.

What is a Web Application Firewall?

A Web Application Firewall (WAF) protects web applications from various online threats, including bot attacks. It is a software barrier between a web application and the internet, monitoring and filtering HTTP traffic between them. WAFs analyze incoming requests and block or allow traffic based on predefined security rules.

A WAF can detect bots using a handful of simple techniques:

Looking for unusual traffic patterns, such as a very large number of requests per second.
Looking at the request metadata, such as the User-Agent.
Comparing the IP address against a reputation database of IP ranges, countries, and data centers known to host bots.

These rigid rules provide a base level of protection from simple bots. But more sophisticated attackers can mimic human traffic patterns, spoof request metadata, or use proxies to change their IP address.

What is Fingerprint Bot Detection?

Fingerprint provides a robust client-side Bot Detection solution. It runs in the browser, collecting vast amounts of data that bots leak (errors, network overrides, browser attribute inconsistencies, API changes, and more) to reliably distinguish real users from headless browsers, automation tools, their derivatives, and plugins. We have covered how to use Fingerprint Bot Detection to protect data endpoints accessible from your website in the Content scraping prevention tutorial.

Fingerprint Bot Detection is much better than a WAF at detecting sophisticated bots, such as headless browsers. However, it relies on collecting signals from the browser. The bot is still able to load the page before it is detected.
A WAF, working at a server level, has limited bot detection capabilities. But it can block bots sooner — before they even load the page.

In this article, you will explore how to integrate Fingerprint Bot Detection into your application firewall. By dynamically blocking IP addresses linked to bot visits, you can leverage the strengths of both Fingerprint and WAF to fortify your application against bot attacks.

Integrating Fingerprint Bot Detection with your Web Application Firewall

The Content scraping prevention tutorial already covered the following steps to detect bots on the client side and deny them access to your data:

Sign up for Fingerprint Pro.
Install the JavaScript agent on your website.
Identify each visitor and send the corresponding identification request ID to the server.
Use the request ID to retrieve and validate the full identification event from the Server API.
Return the requested data or deny the request based on the Bot Detection result.

Please refer to the full article for a detailed explanation of each step.

This tutorial will build on the existing functionality and take things to the next level. First, we will save the IP address of each detected bot to a database. Then, we will build a simple dashboard where you can monitor your bot visits and manually block the associated IP addresses.

You can try the live final result on our website. The example uses Next.js, but the same principles apply to any web application. The code snippets in the article are simplified for readability, but you can find the full source code on GitHub.

1. Save the IP address of each detected bot

Use the Server API to get the Bot Detection result. You can use one of our Server SDKs:

import { 
  FingerprintJsServerApiClient 
} from '@fingerprintjs/fingerprintjs-pro-server-api';



const client = new FingerprintJsServerApiClient({ apiKey: SERVER_API_KEY })
const eventResponse = await client.getEvent(requestId);
botDetection = eventResponse.products?.botd?.data;

The botDetection result has the following information available:

{
  "bot": {
    "result": "bad" 
      "type": "headlessChrome"
    },
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/110.0.5481.177 Safari/537.36",
  "url": "https://yourdomain.com/search",
  "ip": "61.127.217.15",
  "time": "2023-09-08T16:43:23.241Z",
  "requestId": "1234557403227.AbclEC",
}

A "good" bot might be a search engine crawler or a monitoring tool. A "bad" bot means a headless browser or automation tool is accessing the website. In that case, deny the data request and save the bot visit to a database:


if (botDetection?.bot.result === "bad") {
  
  res.status(403).json({
    message: "Malicious bot detected, scraping this data is not allowed.",
  });

  
  BotVisitDbModel.create({
    ip: botData.ip,
    requestId: botData.requestId,
    timestamp: botData.time,
    botResult: botData.bot.result,
    botType: botData.bot.type,
  })
  return;
}

2. Display detected bot visits in a dashboard (optional)

You can build an internal dashboard that shows all detected bot visits and allows you to manually block the associated IP addresses. Alternatively, you could automatically block bot IPs the moment they are detected (jump to Step 3 if you prefer).

First, build an API endpoint for retrieving bot visits:


export default async function handler(_req, res,) {
  const botVisits = await BotVisitDbModel.findAll({
    order: [['timestamp', 'DESC']],
  });
  res.status(200).json(botVisits);
}

Use the endpoint to display a table of bot visits in a table:

import { useMutation, useQuery } from 'react-query';

export default function BotVisitsPage() {
  const { data: botVisits } = useQuery('get-bot-visits', () =>
    fetch('/api/bot-firewall/get-bot-visits').then((res) => res.json()),
  );

  const { mutate: blockIp } = useMutation('block-ip', async (ip) =>
    fetch('/api/bot-firewall/block-ip', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ ip }),
    }).then((res) => res.json()),
  );

  return (
    <table>
      <thead>
        <tr>
          <th>Timestamp</th>
          <th>Bot Type</th>
          <th>IP Address</th>
          <th>Action</th>
        </tr>
      </thead>
      <tbody>
        {botVisits?.map((botVisit) => {
          return (
            <tr key={botVisit.requestId}>
              <td>{botVisit.timestamp}</td>
              <td>
                {botVisit.botResult} ({botVisit.botType})
              </td>
              <td>{botVisit.ip}</td>
              <td>
                <button onClick={() => blockIp(botVisit.ip)}>Block this IP</button>
              </td>
            </tr>
          );
        })}
      </tbody>
    </table>
  );
}

3. Block bot IPs in your web application firewall

In this example, we are proxying the website through Cloudflare, using that as the web application firewall, and updating a custom ruleset via the Cloudflare API. The section below assumes basic familiarity with Cloudflare, but this approach applies to any WAF solution with API-editable access rules.


export default async function blockIp(req, res) {
  const { ip } = req.body;
  
  await BlockedIpDbModel.upsert({
    ip,
    timestamp: new Date().toISOString(),
  });

  
  const blockedIps = (
    await BlockedIpDbModel.findAll({
      order: [['timestamp', 'DESC']],
    })
  ).map((ip) => ip.ip);

  
  const newRules = await buildFirewallRules(blockedIps);

  
  await updateRulesetUsingCloudflareAPI(newRules);

  
  return res.status(200).json({ result: 'success' });
}

Note: In this example, we block IP addresses directly in the Cloudflare Custom Rules. We use our own database to keep track of currently blocked IPs and update the entire ruleset on every change. Alternatively, you could use Lists to store blocked IPs and update just that, bearing in mind the List's limitations around IPv6.

Cloudflare rule expressions are limited to 4096 characters, so you can fit a maximum of 84 IPv6 addresses per rule. The buildFirewallRules function splits the list of blocked IPs into multiple rules if necessary.

The free Cloudflare plan allows up to five custom rules, with more rules available on higher plans. These limitations will vary depending on your WAF provider, plan, and the chosen blocking approach (Cloudflare Lists can accommodate more IP addresses, for example).



const MAX_IPS_PER_RULE = 84;
const MAX_RULES = 5;
const MAX_BLOCKED_IPS = MAX_IPS_PER_RULE * MAX_RULES; 

export const buildFirewallRules = async (
  
  blockedIps,
  maxIpsPerRule = MAX_IPS_PER_RULE,
): Promise<CloudflareRule[]> => {
  
  const chunks = _.chunk(blockedIps, maxIpsPerRule);

  
  const ruleExpressions = chunks.map((chunk) => {
    const ipList = chunk.map((ip) => `"${ip}"`).join(' ');
    return `http.x_forwarded_for in {${ipList}}`;
  });

  
  const rules: CloudflareRule[] = ruleExpressions.map((expression, index) => ({
    action: 'block',
    description: `Block Bot IP addresses #${index + 1}`,
    expression,
  }));

  return rules;
};

Finally, use the Cloudflare API, to update your custom ruleset with the rules compiled above. You are going to need:

Your Cloudflare API token (create one in the Cloudflare dashboard)
Your Cloudflare zone ID (find it in the Cloudflare dashboard)
Your Custom ruleset ID (find it using the Cloudflare API)

async function updateRulesetUsingCloudflareAPI(rules: CloudflareRule[]) {
  const apiToken = process.env.CLOUDFLARE_API_TOKEN ?? '';
  const zoneId = process.env.CLOUDFLARE_ZONE_ID ?? '';
  const customRulesetId = process.env.CLOUDFLARE_RULESET_ID ?? '';

  const url = `https://api.cloudflare.com/client/v4/zones/${zoneId}/rulesets/${customRulesetId}`;
  const options = {
    method: 'PUT',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiToken}`,
    },
    body: JSON.stringify({
      description: 'Custom ruleset for blocking Fingerprint-detected bot IPs',
      kind: 'root',
      name: 'default',
      phase: 'http_request_firewall_custom',
      rules,
    }),
  };

  const response = await fetch(url, options);
  if (!response.ok) {
    console.error(response.statusText, await response.json());
    throw new Error('Updating firewall ruleset failed', { cause: response.statusText });
  }

  return await response.json();
}

4. Unblock IP addresses

Unblocking an IP address works the same as blocking one, only delete the IP from the table of blocked IPs instead of adding it. Then, update the Cloudflare ruleset as before.

export default async function unblockIp(req, res) {
  const { ip } = req.body;
  
  BlockedIpDbModel.destroy({
    where: {
      ip,
    },
  });

  
  const blockedIps = await getBlockedIps();
  const newRules = await buildFirewallRules(blockedIps);
  await updateRulesetUsingCloudflareAPI(newRules);
  return res.status(200).json({ result: 'success' });
}

5. Define a time limit for blocking IP addresses

You might want to block IP addresses only for a limited time, and then unblock them automatically. Cloudflare does not support defining a time-to-live on its custom rules. But you can create a simple cron job that runs periodically, deletes expired IP blocks, and updates the Cloudflare ruleset:

import { syncFirewallRuleset } from '../src/server/botd-firewall/cloudflareApiHelper';
import { schedule } from 'node-cron';


schedule('*/5 * * * *', () => {
  deleteOldIpBlocks();
});


const IP_BLOCK_TIME_TO_LIVE_MS = 1000 * 60 * 60;

async function deleteOldIpBlocks() {
  
  await BlockedIpDbModel.destroy({
    where: {
      timestamp: {
        [Op.lt]: new Date(Date.now() - IP_BLOCK_TIME_TO_LIVE_MS).toISOString(),
      },
    },
  });

  
  const blockedIps = await getBlockedIps();
  const newRules = await buildFirewallRules(blockedIps);
  await updateRulesetUsingCloudflareAPI(newRules);
}

Explore the Bot Firewall Demo

We built a fully open-source Bot Firewall demo to demonstrate the concepts above. Try it to see how you can use Fingerprint Bot Detection in combination with your web application firewall to better protect yourself from malicious bots.

To prevent users from interfering with other people's demo experience, the demo only allows you to block your own IP. To try it out, visit the web scraping demo as a bot using a locally running browser automation tool like Puppeteer or Playwright.

For example, assuming you already have Node and NPM installed, you can visit the page using Playwright:

Run mkdir bot-firewall-test && cd bot-firewall-test.
Run npm init -y.
Run npm install playwright.
Run npx playwright install.
Create an index.js file like below. Note that the bot must spend enough time on the page for the Fingerprint JS agent to load and identify it.

const playwright = require("playwright");

(async () => {
  const browser = await playwright["chromium"].launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto("https://demo.fingerprint.com/web-scraping");
  await page.waitForTimeout(3000);
  console.log(await page.getByRole("heading").first().textContent());
  await browser.close();
})();

Run node index.js. Your bot visit will be saved to the database.
Open the demo to see your bot visit.
Click Block this IP.
Run node index.js again or just visit the web scraping page using your regular browser.

The bot IP will be blocked from loading the page completely.

Note: If you use iCloud Private Relay, your Safari will have a different IP address than your local bot, but the bot itself is still blocked.

Feel free to jump into the GitHub repo to see the full source code. If you have any questions or want to learn more about Fingerprint Bot Detection, you can join our Discord server, get in touch with our sales team, or start a 14-day free trial.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3