Reading time: 2 minutes
Cloudflare taisyklė atbaidyti botams
Galbūt kada ieškojai Cloudflare Firewall taisyklės botų pristabdymui? Na, pateikiu ją čia, nes kaip žinoma, dauguma piktybinių botų nesilaiko robots.txt direktyvų. O ši taisyklė padės atbaidyti daugumą tokio pobūdžio botų (bet tikrai ne visus, nes tai neįmanoma).
Kas čia daroma? Na, sutikrinama kiekviena užklausa, o jos User-Agent identifikacinis aprašas transformuojamas į lowercase (mažąsias raides). Jei User-Agent aprašas turi bent vieną iš nurodytų frazių, botas blokuojamas.
Kaip ir minėjau, tai nepadeda nuo absoliučiai visų botų, bet atmuša tuos, kurie naudoja numatytuosius User-Agent aprašus pagal nutylėjimą.
(lower(http.user_agent) contains "curl") or (lower(http.user_agent) contains
"java") or (lower(http.user_agent) contains "python") or (lower(http.user_agent)
eq "") or (lower(http.user_agent) contains "go-http-client") or
(lower(http.user_agent) contains "apache-httpclient") or (lower(http.user_agent)
contains "headlesschrome") or (lower(http.user_agent) contains "phantomjs") or
(lower(http.user_agent) contains "axios") or (lower(http.user_agent) contains
"scrapy") or (lower(http.user_agent) contains "urllib") or
(lower(http.user_agent) contains "puppeteer") or (lower(http.user_agent)
contains "zombie") or (lower(http.user_agent) contains "mysuperuseragent") or
(lower(http.user_agent) contains "faraday") or (lower(http.user_agent) contains
"aiohttp") or (lower(http.user_agent) contains "httpx") or
(lower(http.user_agent) contains "libwww-perl") or (lower(http.user_agent)
contains "httpunit") or (lower(http.user_agent) contains "nutch") or
(lower(http.user_agent) contains "phpcrawl") or (lower(http.user_agent) contains
"mechanicalsoup") or (lower(http.user_agent) contains "geturl") or
(lower(http.user_agent) contains "semrushbot") or (lower(http.user_agent)
contains "ahrefsbot") or (lower(http.user_agent) contains "uptimerobot") or
(lower(http.user_agent) contains "petalbot") or (lower(http.user_agent) contains
"aspiegelbot") or (lower(http.user_agent) contains "dotbot") or
(lower(http.user_agent) contains "leechftp") or (lower(http.user_agent) contains
"masscan") or (lower(http.user_agent) contains "facebookscraper") or
(lower(http.user_agent) contains "phpcrawl") or (lower(http.user_agent) contains
"majestic") or (lower(http.user_agent) contains "linkbot") or
(lower(http.user_agent) contains "extractor") or (lower(http.user_agent)
contains "download") or (lower(http.user_agent) contains "scrape") or
(lower(http.user_agent) contains "stats") or (lower(http.user_agent) contains
"harvest") or (lower(http.user_agent) contains "steal") or
(lower(http.user_agent) contains "copy") or (lower(http.user_agent) contains
"take") or (lower(http.user_agent) contains "scan") or (lower(http.user_agent)
contains "smart") or (lower(http.user_agent) contains "stealth") or
(lower(http.user_agent) contains "fastify") or (lower(http.user_agent) contains
"bypass") or (lower(http.user_agent) contains "payload") or
(lower(http.user_agent) contains "scrapingbee") or (lower(http.user_agent)
contains "scraping") or (lower(http.user_agent) contains "node.js") or
(lower(http.user_agent) contains "wordpress") or (lower(http.user_agent)
contains "infobot") or (lower(http.user_agent) contains "grapeshotcrawler") or
(lower(http.user_agent) contains "googlebot" and not cf.client.bot)
Visa pateikiama informacija - asmeninė autoriaus nuomonė. Kilus naiškumams rekomenduojama susisiekti elektroniniu paštu: admin@artefaktas.eu
Artefaktas.eu is licensed under CC BY-NC-ND 4.0