-
Notifications
You must be signed in to change notification settings - Fork 0
misc: d8-2639 #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.0.x
Are you sure you want to change the base?
misc: d8-2639 #374
Conversation
add turnstile middlware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Cloudflare Turnstile bot protection middleware to defend against automated traffic, particularly targeting faceted search requests that can cause high server load. The middleware runs early in the request lifecycle to minimize resource consumption from bot traffic.
Changes:
- Added
TurnstileMiddlewareclass that intercepts HTTP requests before Drupal bootstrap - Added
TurnstileServicewith bot detection, Turnstile verification, and challenge page rendering - Configured middleware as an HTTP kernel decorator in service definitions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
| modules/access_misc/src/StackMiddleware/TurnstileMiddleware.php | Implements the HTTP kernel middleware that delegates bot protection checks to TurnstileService |
| modules/access_misc/src/Services/TurnstileService.php | Core service handling Turnstile verification, bot detection, cookie management, and challenge page rendering |
| modules/access_misc/access_misc.services.yml | Service definitions for the Turnstile service and middleware with HTTP kernel decoration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $error = $request->query->has('error') ? 'Verification failed. Please try again.' : ''; | ||
|
|
||
| // Sanitize return URL. | ||
| if (!preg_match('/^\/[a-zA-Z0-9\-\_\/\?\&\=\[\]\%\.\+\:\#\~\@\!\'\(\)\,\;\* ]*$/', $return_url)) { |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same URL sanitization issue exists here. The regex pattern allows literal spaces in the return URL which could lead to malformed URLs. This should be consistent with line 70 and properly handle URL encoding.
| 'YandexBot', 'Googlebot', 'bingbot', 'Baiduspider', 'Sogou', 'Exabot', | ||
| 'facebot', 'ia_archiver', 'Screaming Frog', 'python', 'Python', | ||
| 'Go-http-client', 'Java/', 'wget', 'curl', 'libwww', 'lwp-trivial', | ||
| 'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot', | ||
| 'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal', | ||
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | ||
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', | ||
| 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap', |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking "Googlebot" and other search engine crawlers ("bingbot", "Baiduspider", "YandexBot") will prevent search engines from indexing faceted search results. While this might be intentional to prevent crawler-based DoS on faceted searches, it could negatively impact SEO if these pages should be indexed. Consider documenting this decision or implementing a more nuanced approach that allows legitimate search engine crawlers but rate-limits them.
| 'YandexBot', 'Googlebot', 'bingbot', 'Baiduspider', 'Sogou', 'Exabot', | |
| 'facebot', 'ia_archiver', 'Screaming Frog', 'python', 'Python', | |
| 'Go-http-client', 'Java/', 'wget', 'curl', 'libwww', 'lwp-trivial', | |
| 'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot', | |
| 'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal', | |
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | |
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', | |
| 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap', | |
| 'Sogou', 'Exabot', 'facebot', 'ia_archiver', 'Screaming Frog', | |
| 'python', 'Python', 'Go-http-client', 'Java/', 'wget', 'curl', | |
| 'libwww', 'lwp-trivial', 'httrack', 'nutch', 'msnbot', 'Discordbot', | |
| 'WhatsApp', 'Twitterbot', 'facebookexternalhit', 'LinkedInBot', | |
| 'Slackbot', 'Telegram', 'Signal', 'DataForSeoBot', 'SeznamBot', | |
| 'BingPreview', 'PageSpeed', 'Lighthouse', 'Chrome-Lighthouse', | |
| 'HeadlessChrome', 'PhantomJS', 'SlimerJS', 'CensysInspect', | |
| 'NetcraftSurveyAgent', 'masscan', 'nmap', |
| } | ||
|
|
||
| // Calculate base path for "skip" link. | ||
| $base_path = strtok($return_url, '?'); |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strtok() function without proper handling can be problematic. If $return_url is empty or contains no '?', strtok() returns the entire string, which is correct. However, calling strtok() without resetting can cause issues if called multiple times in the same request. Consider using explode() or parse_url() for more reliable URL parsing.
| $base_path = strtok($return_url, '?'); | |
| $base_path = explode('?', $return_url, 2)[0]; |
| $query = $request->query->all(); | ||
| if (isset($query['f']) && is_array($query['f']) && count($query['f']) > 0) { | ||
| return $this->checkFacetedRequest($request); | ||
| } |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected at the end of this line. This should be removed to maintain code cleanliness.
|
|
||
| // First line of defense: block known bots. | ||
| if ($this->isKnownBot($user_agent)) { | ||
| error_log('Blocked known bot faceted request: ' . $request->getRequestUri() . ' | UA: ' . $user_agent); |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using error_log() for logging blocked bot requests is not ideal in Drupal context. Consider using Drupal's logger service (\Drupal::logger()) which provides better integration with Drupal's logging system, allows for proper log levels, and enables centralized log management. This would also make the logs more easily accessible through Drupal's admin interface.
| error_log('Blocked known bot faceted request: ' . $request->getRequestUri() . ' | UA: ' . $user_agent); | |
| \Drupal::logger('access_misc')->warning('Blocked known bot faceted request: @uri | UA: @ua', [ | |
| '@uri' => $request->getRequestUri(), | |
| '@ua' => $user_agent, | |
| ]); |
| $possible_paths = [ | ||
| 'sites/default/files/private/.keys/secrets.json', | ||
| __DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json', | ||
| '/files/private/.keys/secrets.json', | ||
| ]; |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded file paths using __DIR__ and relative paths could fail in different execution contexts. The path __DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json' assumes a specific directory structure that may not be consistent across all environments or during different bootstrap phases. Consider using Drupal's file system service or configuration API for more reliable path resolution.
| $possible_paths = [ | |
| 'sites/default/files/private/.keys/secrets.json', | |
| __DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json', | |
| '/files/private/.keys/secrets.json', | |
| ]; | |
| $possible_paths = []; | |
| // Prefer Drupal's file system service and the private stream wrapper. | |
| if (class_exists('\Drupal')) { | |
| try { | |
| $file_system = \Drupal::service('file_system'); | |
| $private_path = $file_system->realpath('private://.keys/secrets.json'); | |
| if (!empty($private_path)) { | |
| $possible_paths[] = $private_path; | |
| } | |
| } | |
| catch (\Exception $e) { | |
| // If the file system service is unavailable, fall back to static paths. | |
| } | |
| } | |
| // Fallback static paths for non-standard environments. | |
| $possible_paths = array_merge($possible_paths, [ | |
| 'sites/default/files/private/.keys/secrets.json', | |
| '/files/private/.keys/secrets.json', | |
| ]); |
| const COOKIE_NAME = 'STYXKEY_turnstile_verified'; | ||
| const COOKIE_DURATION = 86400; |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constants are defined using const without visibility modifiers. In PHP classes, it's a best practice to explicitly declare constant visibility (public, protected, or private). Since these constants appear to be used only within the class, consider declaring them as private const or protected const.
| const COOKIE_NAME = 'STYXKEY_turnstile_verified'; | |
| const COOKIE_DURATION = 86400; | |
| private const COOKIE_NAME = 'STYXKEY_turnstile_verified'; | |
| private const COOKIE_DURATION = 86400; |
| public function handle(Request $request, int $type = self::MAIN_REQUEST, bool $catch = TRUE): Response { | ||
| // Check if Turnstile protection should run. | ||
| $response = $this->turnstileService->checkRequest($request); | ||
|
|
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected at the end of this line. This should be removed to maintain code cleanliness.
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | ||
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bot detection includes legitimate services like "Lighthouse", "PageSpeed", "Chrome-Lighthouse" which are Google's performance testing tools. Blocking these prevents the site from being analyzed by Google PageSpeed Insights and Lighthouse audits, which could negatively impact SEO and performance monitoring. Consider removing these or creating a separate allowlist for legitimate monitoring tools.
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | |
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', | |
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', | |
| 'HeadlessChrome', 'PhantomJS', 'SlimerJS', |
| 'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot', | ||
| 'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal', | ||
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | ||
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', | ||
| 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap', |
Copilot
AI
Jan 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bot detection also blocks legitimate social media crawlers ("facebookexternalhit", "Twitterbot", "LinkedInBot", "Slackbot") which are needed for proper link previews when URLs are shared on social media platforms. Blocking these will result in poor or missing preview cards when your site's links are shared. Consider creating an allowlist for these legitimate services.
| 'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot', | |
| 'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal', | |
| 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse', | |
| 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS', | |
| 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap', | |
| 'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Telegram', | |
| 'Signal', 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', | |
| 'Lighthouse', 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', | |
| 'SlimerJS', 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap', |
add turnstile controller
Describe context / purpose for this PR
add turnstile middlware
Issue link
Any other related PRs?
Link to MultiDev instance
http://md-2639-accessmatch.pantheonsite.io
Checklist for PR author