Skip to content

JoranJix/website-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ•ท๏ธ Multi-Site Web Crawler with Fulltext & Port Scanning

A powerful Python-based crawler that performs keyword discovery, fulltext extraction, and optional per-host port scanning โ€” across multiple domains at once.

๐Ÿš€ Features

  • ๐Ÿ” Multi-Site, Multi-Keyword Search
    Crawl multiple starting URLs and search for multiple terms in parallel.

  • ๐Ÿง  Fulltext Capture
    Stores all visible page text for later NLP or forensic analysis.

  • ๐Ÿ“Š CSV Export
    All metadata โ€” titles, URLs, IPs, keywords, fulltext, open ports โ€” is exported to .csv.

  • ๐Ÿ”Œ Optional Custom Port Scan
    Scan user-defined ports on matched IP addresses to discover open services.

  • ๐Ÿ–ฅ๏ธ GUI & CLI Interface

    • Use crawler_gui.py with a browser interface (Streamlit)
    • Use crawler_cli.py for headless jobs or automation
  • ๐Ÿณ Docker & Portainer Ready
    Ships with a Dockerfile and docker-compose.yml โ€” deployable to cloud, local or container orchestration platforms.

About

Multi-Site Keyword Crawler with Fulltext & Port Scanning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published