Skip to content

Crawler blacklist, show URLs when crawling#113

Merged
VincentBean merged 4 commits intodevelopfrom
feature/crawler-ignore-list
Mar 8, 2026
Merged

Crawler blacklist, show URLs when crawling#113
VincentBean merged 4 commits intodevelopfrom
feature/crawler-ignore-list

Conversation

@VincentBean
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable URL blacklisting for the crawler (with platform presets) and a UI table to show discovered/crawled URLs while a crawl is running.

Changes:

  • Add “Advanced” crawler form section with platform preset selector + URL blacklist textarea.
  • Apply URL blacklist filtering when queueing newly discovered links.
  • Add a Livewire table to show URLs (and crawled status) during crawling.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
resources/views/components/form/textarea.blade.php New reusable textarea form component (supports Livewire binding and optional Alpine x-ref).
packages/crawler/src/Validation/ValidRegexLines.php Adds a validation rule for “one regex per line” blacklist input.
packages/crawler/src/ServiceProvider.php Registers the new crawler-crawled-urls-table Livewire component.
packages/crawler/src/Livewire/Tables/CrawledUrlsTable.php Implements the table for displaying URLs and crawled status with a filter.
packages/crawler/src/Livewire/Forms/CrawlerForm.php Adds url_blacklist form field + validation rule (but currently conflicts with persistence).
packages/crawler/src/Livewire/CrawlerForm.php Loads/saves blacklist into crawler settings (but currently will also persist a non-column field).
packages/crawler/src/Actions/CrawlUrl.php Filters queued links using the blacklist (currently contains a fatal type-hint issue and invalid regex composition).
packages/crawler/resources/views/livewire/crawler-form.blade.php Adds UI for presets + advanced URL blacklist input.
packages/crawler/resources/views/crawler/index.blade.php Shows the new crawled-URLs table when crawler state is Crawling.
packages/crawler/config/crawler.php Adds platform blacklist presets (Magento, WordPress, Joomla, Drupal).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@VincentBean VincentBean merged commit 50a9ccd into develop Mar 8, 2026
17 checks passed
@VincentBean VincentBean deleted the feature/crawler-ignore-list branch March 13, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants