🕷️ Multi-Site Web Crawler with Fulltext & Port Scanning

A powerful Python-based crawler that performs keyword discovery, fulltext extraction, and optional per-host port scanning — across multiple domains at once.

🚀 Features

🔍 Multi-Site, Multi-Keyword Search
Crawl multiple starting URLs and search for multiple terms in parallel.
🧠 Fulltext Capture
Stores all visible page text for later NLP or forensic analysis.
📊 CSV Export
All metadata — titles, URLs, IPs, keywords, fulltext, open ports — is exported to .csv.
🔌 Optional Custom Port Scan
Scan user-defined ports on matched IP addresses to discover open services.
🖥️ GUI & CLI Interface
- Use crawler_gui.py with a browser interface (Streamlit)
- Use crawler_cli.py for headless jobs or automation
🐳 Docker & Portainer Ready
Ships with a Dockerfile and docker-compose.yml — deployable to cloud, local or container orchestration platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Dockerfile		Dockerfile
README.md		README.md
crawler_cli.py		crawler_cli.py
crawler_gui.py		crawler_gui.py
docker-compose.yml		docker-compose.yml
readme.txt		readme.txt
requirements.txt		requirements.txt
webcrawler.py		webcrawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕷️ Multi-Site Web Crawler with Fulltext & Port Scanning

🚀 Features

About

Uh oh!

Releases

Packages

Languages

JoranJix/website-crawler

Folders and files

Latest commit

History

Repository files navigation

🕷️ Multi-Site Web Crawler with Fulltext & Port Scanning

🚀 Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages