bot-detect plugin can be used to identify and prevent web crawlers from crawling websites.
Configuration Fields
Name
Type
Requirement
Default Value
Description
allow
array of string
Optional
-
A regular expression to match the User-Agent request header and will allow access if the match hits
deny
array of string
Optional
-
A regular expression to match the User-Agent request header and will block the request if the match hits
blocked_code
number
Optional
403
The HTTP status code returned when a request is blocked
blocked_message
string
Optional
-
The HTTP response Body returned when a request is blocked
If field allow and field deny are not configured at the same time, the default logic to identify crawlers will be executed. By configuring the allow field, requests that would otherwise hit the default logic can be allowed. The judgement can be extended by configuring the deny field
The default set of crawler judgment regular expressions is as follows:
Configuration Samples
Release Requests that would otherwise Hit the Crawler Rules
Without this configuration, the default Golang web library request will be treated as a crawler and access will be denied.
Add Crawler Judgement
According to this configuration, the following requests will be denied:
Only Enabled for Specific Routes or Domains
In the rule sample of _match_route_, route-a and route-b are the route names provided when creating a new gateway route. When the current route names matches the configuration, the rule following shall be applied. In the rule sample of _match_domain_, *.example.com and test.com are the domain names used for request matching. When the current domain name matches the configuration, the rule following shall be applied. All rules shall be checked following the order of items in the _rules_ field, The first matched rule will be applied. All remained will be ignored.