PyRobots is a web reconnaissance tool developed in Python 3, designed to analyze and extract potentially sensitive paths from a target website’s robots.txt
file. It supports both direct downloading of explicitly listed paths as well as a brute-force discovery mechanism to uncover hidden directories and resources that may be restricted from crawlers.
- Downloads and parses the
robots.txt
file from the target host. - Identifies and extracts
Disallow
,Allow
,Sitemap
, andCrawl-delay
directives. - Supports two scanning modes:
- Quick Scan: Downloads explicitly listed paths from the
robots.txt
. - Directory Scan: Performs recursive brute-force enumeration using customizable wordlists for filenames and file extensions.
- Quick Scan: Downloads explicitly listed paths from the
- Respects
Crawl-delay
directives where specified. - Multithreaded for performance, with intelligent throttling and retry mechanisms.
- Supports user-agent aware parsing (e.g., separate rules for
*
,Googlebot
, etc.).
Mode | Description |
---|---|
Quick | Parses robots.txt and downloads all entries listed under Disallow and Allow directives. |
Directory | Extends Quick mode by performing brute-force discovery against extracted directories using a combination of filename and extension wordlists. |
All results are saved in the following directory structure:
output/<domain>/
cd pyRobots
python3 -m pip install -r requirements.txt
sudo chmod +x pyRobots.py
python3 pyRobots.py
You will be prompted to enter a target URL. The tool will then:
- Retrieve and parse the robots.txt file.
- Execute the selected scan mode.
- Optionally download exposed or disallowed resources.
- Present an interactive prompt to initiate directory brute-forcing.
- Python 3.8+
- Internet access (for retrieving target robots.txt and resources)
- Dependencies listed in requirements.txt
Install required packages with:
pip install -r requirements.txt
Two customizable wordlists are used for brute-forcing:
- wordlists/common.txt — common directory or file names.
- wordlists/extensions.txt — common file extensions (e.g., .php, .bak, .old).
You may modify or extend these wordlists to suit your needs.
This tool is intended strictly for educational and authorized security testing purposes. Usage against unauthorized systems may violate legal regulations. The author assumes no responsibility for misuse.
MIT License. See LICENSE file for details.