Skip to content

Conversation

@protitude
Copy link
Collaborator

@protitude protitude commented Jan 17, 2026

Describe context / purpose for this PR

add turnstile middlware

Issue link

Any other related PRs?

Link to MultiDev instance

http://md-2639-accessmatch.pantheonsite.io

Checklist for PR author

  • I have checked that the PR is ready to be merged
  • I have reviewed the DIFF and checked that the changes are as expected
  • I have assigned myself or someone else to review the PR

add turnstile middlware
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Cloudflare Turnstile bot protection middleware to defend against automated traffic, particularly targeting faceted search requests that can cause high server load. The middleware runs early in the request lifecycle to minimize resource consumption from bot traffic.

Changes:

  • Added TurnstileMiddleware class that intercepts HTTP requests before Drupal bootstrap
  • Added TurnstileService with bot detection, Turnstile verification, and challenge page rendering
  • Configured middleware as an HTTP kernel decorator in service definitions

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 17 comments.

File Description
modules/access_misc/src/StackMiddleware/TurnstileMiddleware.php Implements the HTTP kernel middleware that delegates bot protection checks to TurnstileService
modules/access_misc/src/Services/TurnstileService.php Core service handling Turnstile verification, bot detection, cookie management, and challenge page rendering
modules/access_misc/access_misc.services.yml Service definitions for the Turnstile service and middleware with HTTP kernel decoration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

$error = $request->query->has('error') ? 'Verification failed. Please try again.' : '';

// Sanitize return URL.
if (!preg_match('/^\/[a-zA-Z0-9\-\_\/\?\&\=\[\]\%\.\+\:\#\~\@\!\'\(\)\,\;\* ]*$/', $return_url)) {
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same URL sanitization issue exists here. The regex pattern allows literal spaces in the return URL which could lead to malformed URLs. This should be consistent with line 70 and properly handle URL encoding.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +269
'YandexBot', 'Googlebot', 'bingbot', 'Baiduspider', 'Sogou', 'Exabot',
'facebot', 'ia_archiver', 'Screaming Frog', 'python', 'Python',
'Go-http-client', 'Java/', 'wget', 'curl', 'libwww', 'lwp-trivial',
'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot',
'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal',
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap',
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking "Googlebot" and other search engine crawlers ("bingbot", "Baiduspider", "YandexBot") will prevent search engines from indexing faceted search results. While this might be intentional to prevent crawler-based DoS on faceted searches, it could negatively impact SEO if these pages should be indexed. Consider documenting this decision or implementing a more nuanced approach that allows legitimate search engine crawlers but rate-limits them.

Suggested change
'YandexBot', 'Googlebot', 'bingbot', 'Baiduspider', 'Sogou', 'Exabot',
'facebot', 'ia_archiver', 'Screaming Frog', 'python', 'Python',
'Go-http-client', 'Java/', 'wget', 'curl', 'libwww', 'lwp-trivial',
'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot',
'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal',
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap',
'Sogou', 'Exabot', 'facebot', 'ia_archiver', 'Screaming Frog',
'python', 'Python', 'Go-http-client', 'Java/', 'wget', 'curl',
'libwww', 'lwp-trivial', 'httrack', 'nutch', 'msnbot', 'Discordbot',
'WhatsApp', 'Twitterbot', 'facebookexternalhit', 'LinkedInBot',
'Slackbot', 'Telegram', 'Signal', 'DataForSeoBot', 'SeznamBot',
'BingPreview', 'PageSpeed', 'Lighthouse', 'Chrome-Lighthouse',
'HeadlessChrome', 'PhantomJS', 'SlimerJS', 'CensysInspect',
'NetcraftSurveyAgent', 'masscan', 'nmap',

Copilot uses AI. Check for mistakes.
}

// Calculate base path for "skip" link.
$base_path = strtok($return_url, '?');
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strtok() function without proper handling can be problematic. If $return_url is empty or contains no '?', strtok() returns the entire string, which is correct. However, calling strtok() without resetting can cause issues if called multiple times in the same request. Consider using explode() or parse_url() for more reliable URL parsing.

Suggested change
$base_path = strtok($return_url, '?');
$base_path = explode('?', $return_url, 2)[0];

Copilot uses AI. Check for mistakes.
$query = $request->query->all();
if (isset($query['f']) && is_array($query['f']) && count($query['f']) > 0) {
return $this->checkFacetedRequest($request);
}
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected at the end of this line. This should be removed to maintain code cleanliness.

Copilot uses AI. Check for mistakes.

// First line of defense: block known bots.
if ($this->isKnownBot($user_agent)) {
error_log('Blocked known bot faceted request: ' . $request->getRequestUri() . ' | UA: ' . $user_agent);
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using error_log() for logging blocked bot requests is not ideal in Drupal context. Consider using Drupal's logger service (\Drupal::logger()) which provides better integration with Drupal's logging system, allows for proper log levels, and enables centralized log management. This would also make the logs more easily accessible through Drupal's admin interface.

Suggested change
error_log('Blocked known bot faceted request: ' . $request->getRequestUri() . ' | UA: ' . $user_agent);
\Drupal::logger('access_misc')->warning('Blocked known bot faceted request: @uri | UA: @ua', [
'@uri' => $request->getRequestUri(),
'@ua' => $user_agent,
]);

Copilot uses AI. Check for mistakes.
Comment on lines +226 to +230
$possible_paths = [
'sites/default/files/private/.keys/secrets.json',
__DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json',
'/files/private/.keys/secrets.json',
];
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded file paths using __DIR__ and relative paths could fail in different execution contexts. The path __DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json' assumes a specific directory structure that may not be consistent across all environments or during different bootstrap phases. Consider using Drupal's file system service or configuration API for more reliable path resolution.

Suggested change
$possible_paths = [
'sites/default/files/private/.keys/secrets.json',
__DIR__ . '/../../../../../sites/default/files/private/.keys/secrets.json',
'/files/private/.keys/secrets.json',
];
$possible_paths = [];
// Prefer Drupal's file system service and the private stream wrapper.
if (class_exists('\Drupal')) {
try {
$file_system = \Drupal::service('file_system');
$private_path = $file_system->realpath('private://.keys/secrets.json');
if (!empty($private_path)) {
$possible_paths[] = $private_path;
}
}
catch (\Exception $e) {
// If the file system service is unavailable, fall back to static paths.
}
}
// Fallback static paths for non-standard environments.
$possible_paths = array_merge($possible_paths, [
'sites/default/files/private/.keys/secrets.json',
'/files/private/.keys/secrets.json',
]);

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +15
const COOKIE_NAME = 'STYXKEY_turnstile_verified';
const COOKIE_DURATION = 86400;
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constants are defined using const without visibility modifiers. In PHP classes, it's a best practice to explicitly declare constant visibility (public, protected, or private). Since these constants appear to be used only within the class, consider declaring them as private const or protected const.

Suggested change
const COOKIE_NAME = 'STYXKEY_turnstile_verified';
const COOKIE_DURATION = 86400;
private const COOKIE_NAME = 'STYXKEY_turnstile_verified';
private const COOKIE_DURATION = 86400;

Copilot uses AI. Check for mistakes.
public function handle(Request $request, int $type = self::MAIN_REQUEST, bool $catch = TRUE): Response {
// Check if Turnstile protection should run.
$response = $this->turnstileService->checkRequest($request);

Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected at the end of this line. This should be removed to maintain code cleanliness.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +268
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot detection includes legitimate services like "Lighthouse", "PageSpeed", "Chrome-Lighthouse" which are Google's performance testing tools. Blocking these prevents the site from being analyzed by Google PageSpeed Insights and Lighthouse audits, which could negatively impact SEO and performance monitoring. Consider removing these or creating a separate allowlist for legitimate monitoring tools.

Suggested change
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
'DataForSeoBot', 'SeznamBot', 'BingPreview',
'HeadlessChrome', 'PhantomJS', 'SlimerJS',

Copilot uses AI. Check for mistakes.
Comment on lines +265 to +269
'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot',
'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal',
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap',
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bot detection also blocks legitimate social media crawlers ("facebookexternalhit", "Twitterbot", "LinkedInBot", "Slackbot") which are needed for proper link previews when URLs are shared on social media platforms. Blocking these will result in poor or missing preview cards when your site's links are shared. Consider creating an allowlist for these legitimate services.

Suggested change
'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Twitterbot',
'facebookexternalhit', 'LinkedInBot', 'Slackbot', 'Telegram', 'Signal',
'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed', 'Lighthouse',
'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS', 'SlimerJS',
'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap',
'httrack', 'nutch', 'msnbot', 'Discordbot', 'WhatsApp', 'Telegram',
'Signal', 'DataForSeoBot', 'SeznamBot', 'BingPreview', 'PageSpeed',
'Lighthouse', 'Chrome-Lighthouse', 'HeadlessChrome', 'PhantomJS',
'SlimerJS', 'CensysInspect', 'NetcraftSurveyAgent', 'masscan', 'nmap',

Copilot uses AI. Check for mistakes.
add turnstile controller
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants