Skip to content

Commit 0e5cae8

Browse files
authored
Merge pull request #125 from ByteInternet/update-429-docs
Improve ratelimiting documentation
2 parents 7cef83e + 007a73a commit 0e5cae8

File tree

3 files changed

+41
-37
lines changed

3 files changed

+41
-37
lines changed

docs/best-practices/performance/how-to-enable-pagespeed-booster.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ The steps include:
9999
- Configure SSL and DNS
100100
- Configuring Varnish
101101
- Modifying your Varnish VCL configuration
102-
- Add the user agent **PSB**to the [allowlist for the ratelimiter](../../hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md#whitelisting-additional-user-agents) in `~/nginx/http.ratelimit` file
102+
- Add the user agent **PSB**to the [allowlist for the ratelimiter](../../hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md#allowlisting-additional-user-agents) in `~/nginx/http.ratelimit` file
103103
- Turn off ESI Block Parsing
104104
- Add PageSpeed Booster as Flush Target
105105

docs/hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md

Lines changed: 39 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -27,28 +27,30 @@ On Hypernode we currently differentiate between two rate limiting methods and th
2727
- Rate limiting based on User Agents and requests per second (zone `bots`)
2828
- Rate limiting based on requests per IP address (zone `zoneperip`)
2929

30-
Both methods are implemented using [this module](http://nginx.org/en/docs/http/ngx_http_limit_req_module.html)
30+
Both methods are implemented using [NginX's limit_req module](http://nginx.org/en/docs/http/ngx_http_limit_req_module.html)
3131

3232
### Determining the Applied Rate Limiting Method
3333

3434
You can quickly determine which method of Rate Limiting was the cause of the request being 429'd since each time any of the rate-limiting methods are hit, a message with be logged in the Nginx error log.
3535

36-
To do so you first look up the request in the access logs, which can be done using the hypernode-parse-nginx-logs (**pnl**) command: `pnl --today --fields time,status,remote_addr,request,ua --filter status=429`
36+
To look for rate limiting messages in the error log, you can run the following command:
3737

38-
Copy the IP address from the output generated by this command and look up the corresponding log entry in the aforementioned Nginx error log with `cat /var/log/nginx/error.log | grep "1.2.3.4"`
39-
40-
These entries look as follows:
38+
```console
39+
$ grep limiting.requests /var/log/nginx/error.log
40+
2020/06/07 13:33:37 [error] limiting requests, excess: 0.072 by zone "bots", client: 203.0.113.104, server: example.hypernode.io, request: "GET /api/ HTTP/2.0", host: "example.hypernode.io"
41+
2020/06/07 13:33:37 [error] limiting connections by zone "zoneperip", client: 198.51.100.69, server: example.hypernode.io, request: "POST /admin/ HTTP/2.0", host: "example.hypernode.io"
42+
```
4143

4244
A log entry where rate limit is applied to user-agents and requests per second (based on the `bots` zone):
4345

44-
```nginx
45-
2016/08/15 18:25:54 [error] 11372#11372: *45252 limiting requests, excess: 0.586 by zone "bots", client: 1.2.3.4, server: , request: "GET /azie/flip.html HTTP/1.1", host: "www.example.nl"
46+
```
47+
2020/06/07 13:33:37 [error] limiting requests, excess: 0.072 by zone "bots", client: 203.0.113.104, server: example.hypernode.io, request: "GET /api/ HTTP/2.0", host: "example.hypernode.io"
4648
```
4749

4850
A log entry where the rate limit is applied per IP address (based on the `zoneperip` zone):
4951

5052
```
51-
2016/08/12 10:23:39 [error] 25118#25118: *24362 limiting connections by zone "zoneperip", client: 1.2.3.4, server: , request: "GET /index.php/admin/abcdef/ HTTP/1.1", host: "www.example.nl", referrer: "http://example.nl/index.php/admin/abcdef/"
53+
2020/06/07 13:33:37 [error] limiting connections by zone "zoneperip", client: 198.51.100.69, server: example.hypernode.io, request: "POST /admin/ HTTP/2.0", host: "example.hypernode.io"
5254
```
5355

5456
**Note: Per IP rate limiting only applies to requests handled by PHP and not to the static content.**
@@ -68,35 +70,41 @@ Some bots are default exempt from rate limitings, like Google, Bing, and several
6870
```nginx
6971
map $http_user_agent $limit_bots {
7072
default '';
71-
~*(google|bing|heartbeat|uptimerobot|shoppimon|facebookexternal|monitis.com|Zend_Http_Client|magereport.com|SendCloud|Adyen|contentkingapp|GuzzleHttp|Mollie) '';
72-
~*(http|crawler|spider|bot|search|Wget/|Python-urllib|PHPCrawl|bGenius|MauiBot) 'bot';
73-
}
74-
73+
~*(google|bing|heartbeat|uptimerobot|shoppimon|facebookexternal|monitis.com|Zend_Http_Client|magereport.com|SendCloud/|Adyen|ForusP|contentkingapp|node-fetch|Hipex) '';
74+
~*(http|crawler|spider|bot|search|Wget|Python-urllib|PHPCrawl|bGenius|MauiBot|aspiegel) 'bot';
75+
}
7576
```
7677

7778
**Note: do not remove the heartbeat entry! As this will break the monitoring of your Hypernode**
7879

7980
As you can see, this sorts all visitors into two groups:
8081

81-
- On the first (whitelist) line, you find the keywords that are exempt from the rate liming, like: google’, ‘bing’, ‘heartbeat, or ‘monitis.com
82-
- On the second (blacklist) line, you will find the keyword for generic and abusive bots and crawlers, which will always be rate limited, like crawler, spider, bot
82+
- On the first line, the allowlist, you find the keywords that are exempt from the rate liming, like: `google`, `bing`, `heartbeat`, or `magereport.com`.
83+
- The second line, contains keywords for generic and abusive bots and crawlers, which can trigger the ratelimiter, like `crawler`, `spider`, or `bot`
8384

8485
The keywords are separated by `|` characters since it is a regular expression.
8586

86-
### Whitelisting Additional User Agents
87+
### Allowlisting Additional User Agents
88+
89+
To extend the allowlist, first determine what user agent you wish to add. Use the access log files to see what bots get blocked and which user agent identification it uses. To find the user agent, you can use the following command:
90+
91+
```console
92+
$ pnl --today --fields time,status,remote_addr,request,user_agent --filter status=429
93+
2020-06-07T13:33:37+00:00 429 203.0.113.104 GET /api/ HTTP/2.0 SpecialSnowflakeCrawler 3.1.4
94+
2020-06-07T13:35:37+00:00 429 203.0.113.104 GET /api/ HTTP/2.0 SpecialSnowflakeCrawler 3.1.4
95+
```
8796

88-
To extend the whitelist, first determine what user agent you wish to add. Use the access log files to see what bots get blocked and which user agent identification it uses. Say the bot we want to add has the User Agent `SpecialSnowflakeCrawler 3.1.4`. Which contains the word ‘crawler’, so it matches the second regular expression and is labeled as a bot. Since the whitelist line overrules the blacklist line, the best way to allow this bot is to add their user agent to the whitelist instead of removing ‘crawler’ from the blacklist:
97+
In the example above you can see that a bot with the User Agent `SpecialSnowflakeCrawler 3.1.4` triggered the ratelimiter. As it contains the word ‘crawler’, it matches the second regular expression and is labeled as a bot. Since the allowlist line overrules the denylist line, the best way to allow this bot is to add their user agent to the allowlist instead of removing ‘crawler’ from the blacklist:
8998

9099
```nginx
91100
map $http_user_agent $limit_bots {
92101
default '';
93-
~*(specialsnowflakecrawler|google|bing|heartbeat|uptimerobot|shoppimon|facebookexternal|monitis.com|Zend_Http_Client|magereport.com|SendCloud|Adyen|contentkingapp|GuzzleHttp) '';
94-
~*(http|crawler|spider|bot|search|Wget/|Python-urllib|PHPCrawl|bGenius|MauiBot) 'bot';
102+
~*(specialsnowflakecrawler|google|bing|heartbeat|uptimerobot|shoppimon|facebookexternal|monitis.com|Zend_Http_Client|magereport.com|SendCloud/|Adyen|ForusP|contentkingapp|node-fetch|Hipex) '';
103+
~*(http|crawler|spider|bot|search|Wget|Python-urllib|PHPCrawl|bGenius|MauiBot|aspiegel) 'bot';
95104
}
96-
97105
```
98106

99-
Instead of adding the complete User Agent to the regex, it’s often better to limit it to just an identifying keyword, as shown above. The reason behind this is that the string is evaluated as a Regular Expression, which means that extra care needs to be taken when adding anything other than alphanumeric characters.
107+
Instead of adding the complete User Agent to the regex, it’s often better to limit it to just an identifying keyword, as shown above. The reason behind this is that the string is evaluated as a Regular Expression, which means that extra care needs to be taken when adding anything other than alphanumeric characters. Also as user agents might change slightly over time, this may this bot will no longer be allowlisted over time.
100108

101109
### Known Rate Limited Plugins and Service Provider
102110

@@ -115,50 +123,47 @@ Besides the above-known plugins that will hit the blacklisted keyword, `http.rat
115123

116124
To prevent a single IP from using all the FPM workers available simultaneously, leaving no workers available for other visitors, we implemented a per IP rate limit mechanism. This mechanism sets a maximum amount of PHP-FPM workers that can be used by one IP to 20. This way, one single IP address cannot deplete all the available FPM workers, leaving other visitors with an error page or a non-responding site.
117125

118-
**Please note:** if [Hypernode Managed Vhosts](hypernode-managed-vhosts.md) is enabled, only add the `http.conn_ratelimit` file in the Nginx root. Don't add it to the specific vhost as well, as these files will cancel each other out.
126+
**Please note:** if [Hypernode Managed Vhosts](hypernode-managed-vhosts.md) is enabled, only add the `http.ratelimit` file in the Nginx root. Don't add it to the specific vhost as well, as this may cause conflicts.
119127

120128
### Exclude IP Addresses from the per IP Rate Limiting
121129

122-
In some cases, it might be necessary to exclude specific IP addresses from the per IP rate limiting. If you wish to exclude an IP address, you can do so by creating a config file called `/data/web/nginx/http.conn_ratelimit` with the following content:
130+
In some cases, it might be necessary to exclude specific IP addresses from the per IP rate limiting. If you wish to exclude an IP address, you can do so by creating a config file called `/data/web/nginx/http.ratelimit` with the following content:
123131

124132
```nginx
125133
geo $conn_limit_map {
126134
default $remote_addr;
127-
1.2.3.4 '';
135+
198.51.100.69 '';
128136
}
129-
130137
```
131138

132-
In this example, we have excluded the IP address **1.2.3.4** by setting an empty value in the form of `''`.
139+
In this example, we have excluded the IP address **198.51.100.69** by setting an empty value in the form of `''`.
133140

134-
In addition to whitelisting one single IP address, it is also possible to whitelist a whole range of IP addresses. You can do this by using the so-called CIDR notation (e.g., 10.0.0.0/24 to whitelist all IP addresses within the range 10.0.0.0 to 10.0.0.255). In that case, you can use the following snippet in `/data/web/nginx/http.conn_ratelimit` instead:
141+
In addition to excluding a single IP address, it is also possible to allow a whole range of IP addresses. You can do this by using the so-called CIDR notation (e.g., 198.51.100.0/24 to whitelist all IP addresses within the range 198.51.100.0 to 198.51.100.255). In that case, you can use the following snippet in `/data/web/nginx/http.ratelimit` instead:
135142

136143
```nginx
137144
geo $conn_limit_map {
138145
default $remote_addr;
139-
10.0.0.0/24 '';
146+
198.51.100.0/24 '';
140147
}
141-
142148
```
143149

144150
### Disable per IP Rate Limiting
145151

146152
When your shop performance is very poor, it’s possible all your FPM workers are busy just serving regular traffic. Handling a request takes so much time that all workers are continuously depleted by a small number of visitors. We highly recommend optimizing your shop for speed and a temporary upgrade to a bigger plan if this situation arises. Disabling the rate limit will not fix this problem but only change the error message from a `Too many requests` error to a timeout error.
147153

148-
For debugging purposes, however, it could be helpful to disable the per-IP connection limit for all IP’s. With the following snippet in `/data/web/nginx/http.conn_ratelimit` , it is possible to altogether disable IP based rate limiting:
154+
For debugging purposes, however, it could be helpful to disable the per-IP connection limit for all IP’s. With the following snippet in `/data/web/nginx/http.ratelimit` , it is possible to altogether disable IP based rate limiting:
149155

150156
```nginx
151157
geo $conn_limit_map {
152158
default '';
153159
}
154-
155160
```
156161

157162
**Warning: Only use this setting for debugging purposed! Using this setting on production Hypernodes is highly discouraged, as your shop can be easily taken offline by a single IP using slow and/or flood attacks.**
158163

159164
### Exclude Specific URLs from the per IP Rate Limiting Mechanism
160165

161-
To exclude specific URLs from being rate-limited you can create a file `/data/web/nginx/before_redir.ratelimit_exclude` with the following content (this could also be done in a http.\* file):
166+
To exclude specific URLs from being rate-limited you can create a file `/data/web/nginx/server.ratelimit` with the following content:
162167

163168
```nginx
164169
set $ratelimit_request_url "$remote_addr";
@@ -169,18 +174,18 @@ if ($request_uri ~ ^\/(.*)\/rest\/V1\/example-call\/(.*) ) {
169174
if ($request_uri ~ ^\/elasticsearch.php$ ) {
170175
set $ratelimit_request_url '';
171176
}
172-
173177
```
174178

175-
In the example above, the URLs `*/rest/V1/example-call/*` and `/elasticsearch.php` are the ones that have to be excluded. You can now use the `$ratelimit_request` variable in the file `/data/web/nginx/http.conn_ratelimit` (see the example below) to exclude these URLs from the rate limiter and make sure that bots and crawlers will still be rate limited based on their User Agent.
179+
In the example above, the URLs `*/rest/V1/example-call/*` and `/elasticsearch.php` are the ones that have to be excluded. You now have to use the `$ratelimit_request` variable as a default value in the file `/data/web/nginx/http.ratelimit` (see below) to exclude these URLs from the rate limiter and make sure that bots and crawlers will still be rate limited based on their User Agent.
176180

177181
```nginx
178182
geo $conn_limit_map {
179183
default $ratelimit_request_url;
180184
}
181-
182185
```
183186

187+
You can also combine this with a regular allowlist, and exclude IP Addresses as described above.
188+
184189
### How to Serve a Custom Static Error Page to Rate Limited IP Addresses
185190

186191
If you would like to, you may serve a custom error page to IP addresses that are rate limited. Simply create a static HTML file in `/data/web/public` with any content that you wish to show to these rate-limited IP addresses. Furthermore, you need to create an Nginx configuration file called `/data/web/nginx/server.custom_429` as well. The content of this file should be as follows:
@@ -191,7 +196,6 @@ location = /ratelimited.html {
191196
root /data/web/public;
192197
internal;
193198
}
194-
195199
```
196200

197201
This snippet will serve a custom static file called `ratelimited.html` to IP addresses that are using too many PHP workers.

docs/troubleshooting/performance/how-to-implement-pagespeed-booster.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ To setup PageSpeed Booster on your development environment you'll need a unique
6464
`hypernode-manage-vhosts psb.example.com --https --force-https --varnish`
6565
1. Now point the DNS to the PageSpeed Booster instance with the records you got at the PageSpeed Booster page in your Control Panel.
6666
1. Make sure Varnish is enabled on the server: hypernode-systemctl settings varnish_enabled.
67-
1. Add the user agent **PSB**to the [allowlist for the ratelimiter](../../hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md#whitelisting-additional-user-agents) in\*\*~/nginx/http.ratelimit\*\* file.
67+
1. Add the user agent **PSB**to the [allowlist for the ratelimiter](../../hypernode-platform/nginx/how-to-resolve-rate-limited-requests-429-too-many-requests.md#allowlisting-additional-user-agents) in\*\*~/nginx/http.ratelimit\*\* file.
6868
1. Disable the [basic-authentication](../../hypernode-platform/nginx/basic-authentication-on-hypernode-development-plans.md#disable-the-basic-authentication) on the development Hypernode.
6969

7070
### Configuring Varnish

0 commit comments

Comments
 (0)