Skip to content

Skip fnmatch for URLs longer than FILENAME_MAX#505

Merged
freekmurze merged 2 commits into
spatie:mainfrom
mattiasgeniar:fix-fnmatch-long-url
May 18, 2026
Merged

Skip fnmatch for URLs longer than FILENAME_MAX#505
freekmurze merged 2 commits into
spatie:mainfrom
mattiasgeniar:fix-fnmatch-long-url

Conversation

@mattiasgeniar
Copy link
Copy Markdown
Contributor

fnmatch() raises a warning when the haystack exceeds the platform's FILENAME_MAX (4096 on Linux, 1024 on macOS/BSD). Under a framework like Laravel that converts warnings to ErrorException this kills the crawl job with fnmatch(): Filename exceeds the maximum allowed length of 4096 characters whenever a discovered URL is unusually long and alwaysCrawl() or neverCrawl() patterns are set.

Guard matchesAlwaysCrawl() / matchesNeverCrawl() with a length check (1024, the lowest common FILENAME_MAX across supported platforms) so overly-long URLs short-circuit to false and fall through to the normal CrawlProfile::shouldCrawl() check downstream. Real URLs almost never exceed this, and a URL too long to glob can't meaningfully match a pattern anyway.

Two regression tests added in AlwaysNeverCrawlTest.php cover both methods.

mattiasgeniar and others added 2 commits May 18, 2026 14:40
fnmatch() raises a warning when the haystack exceeds the platform's
FILENAME_MAX (4096 on Linux, 1024 on macOS/BSD), which under Laravel's
error handler bubbles up as an ErrorException and kills the crawl job.
Guard matchesAlwaysCrawl()/matchesNeverCrawl() with a length check so
overly-long discovered URLs short-circuit to false and fall through to
the normal CrawlProfile check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@freekmurze freekmurze merged commit 0c0b6e4 into spatie:main May 18, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants