Confidence-aware OSINT CLI for collecting, correlating, and exporting open-source digital footprint signals.
OSINT Tracker is a Python command-line tool that helps you collect and organize publicly available intelligence indicators from multiple sources.
It was built as an educational, practical project to show how OSINT workflows can be made safer and more consistent through:
- clear status labels (
FOUND,NOT_FOUND,BLOCKED,UNKNOWN) - explicit confidence levels (
HIGH,MEDIUM,LOW) - structured reporting and correlation insights
- runtime guardrails for request volume and input safety
Real-world use cases include:
- beginner-friendly OSINT learning and cybersecurity training
- quick digital footprint checks during triage
- generating repeatable JSON reports for small investigations
Only implemented features are listed below.
- Username search across GitHub, Reddit, Twitter, and Instagram.
Uses a conservative
HEAD -> GETstrategy with platform-specific content markers and canonical URL path checks. - Confidence-aware account status model. Each username/social result returns status + confidence instead of only true/false.
- Offline email analysis. Validates and normalizes email input, identifies provider, detects local-part pattern, classifies personal vs generic usage, and labels local-part length.
- Live IP lookup via
ipwho.is. Returns country/city/ISP/coordinates when available, with retries and structured failure reasons (for examplerate_limited,request_failed). - Social scan built from username results.
Reuses username signals and converts them to social availability states (
available,not available,blocked,unknown). - Local metadata extraction. Reads file name, path, size, MIME type, created/modified timestamps, and SHA-256 hash.
- Cross-source correlation engine.
Produces weighted, human-readable insights and an
Overall confidencelabel, while treating social data asDERIVED(supporting, not independent proof). - JSON report export.
Saves a stable schema to both
output/report.jsonandoutput/reports/result.json. - Runtime safety controls. Includes input length limits, max network operations per run, shared HTTP request budget, and optional pacing delays.
- Language: Python
- Runtime library:
requests - Testing:
pytest - Dev tooling:
black,flake8 - Python standard library used heavily:
argparse,concurrent.futures,ipaddress,pathlib,hashlib,json,datetime,threading,mimetypes
osint-tracker/
├── main.py
├── requirements.txt
├── requirements-dev.txt
├── project.md
├── src/
│ ├── __init__.py
│ ├── core/
│ ├── modules/
│ │ ├── __init__.py
│ │ ├── username_search.py
│ │ ├── email_lookup.py
│ │ ├── ip_lookup.py
│ │ ├── social_scan.py
│ │ └── metadata_extractor.py
│ └── utils/
│ ├── __init__.py
│ ├── request_context.py
│ ├── correlation.py
│ └── formatter.py
├── tests/
│ ├── test_cli.py
│ ├── test_username_search.py
│ ├── test_email_lookup.py
│ ├── test_ip_lookup.py
│ ├── test_correlation.py
│ └── test_report_schema.py
└── output/
├── report.json
└── reports/
└── result.json
Folder guide for beginners:
main.py-> CLI entry point. Parses flags, runs modules, handles errors, prints results, and exports reports.src/modules/-> Core feature modules (username, email, IP, social, metadata).src/utils/request_context.py-> Shared request budget and delay logic used by network modules.src/utils/correlation.py-> Combines multi-source outputs into readable investigative insights.src/utils/formatter.py-> Formats terminal output and builds/exports JSON report payloads.src/core/-> Present in the project structure but currently empty.tests/-> Automated tests for CLI behavior, module logic, correlation rules, and report schema consistency.output/-> Generated report artifacts.
git clone https://github.com/urvalkheni/osint-tracker.git
cd osint-tracker
python -m pip install -r requirements.txt
python main.pyWhat each step does:
git clone ...downloads the project.cd osint-trackermoves into the project folder.pip install -r requirements.txtinstalls runtime dependency (requests).python main.pyruns the CLI (with no flags, it shows help).
Optional (development tools):
python -m pip install -r requirements-dev.txtRun tests:
pytest -qCheck where a username appears:
python main.py --username octocatAnalyze an email offline (no network call):
python main.py --email demo.user@gmail.comLookup IP geolocation/network details:
python main.py --ip 8.8.8.8Run social scan from username-based signals:
python main.py --social octocatExtract local file metadata:
python main.py --metadata project.mdRun multiple sources and export a report:
python main.py --username octocat --email demo.user@gmail.com --ip 8.8.8.8 --outputUseful CLI flags:
--username USERNAME Search username across configured platforms
--email EMAIL Run offline email analysis
--ip IP_ADDRESS Run IP intelligence lookup
--social USERNAME Build social scan from username results
--metadata FILE_PATH Extract local file metadata
--output Export JSON report files
-v, --verbose Show detailed errors (stack trace)
Example terminal output (real format, sample values):
[+] Username Analysis
Target: octocat
GitHub -> FOUND | confidence=HIGH | HTTP 200 | HEAD->GET (https://github.com/octocat)
Reddit -> NOT_FOUND | confidence=HIGH | HTTP 404 | HEAD (https://www.reddit.com/user/octocat)
Twitter -> BLOCKED | confidence=MEDIUM | HTTP 200 | HEAD->GET | Blocked by platform (https://twitter.com/octocat)
Instagram -> UNKNOWN | confidence=LOW | HTTP 200 | HEAD->GET | Unable to determine account status (https://www.instagram.com/octocat)
Note: Result may be affected by platform restrictions or anti-bot protections
[+] Correlation Summary
- Username found on GitHub
- Social source quality: DERIVED
- Overall confidence: Medium
Example exported JSON shape (trimmed):
{
"username": {
"query": "octocat",
"original_query": "octocat",
"normalized": false,
"normalization_reason": null,
"results": []
},
"email": {
"email": "demo.user@gmail.com",
"provider": "Google",
"type": "personal",
"confidence": "LOW"
},
"ip": {
"ip": "8.8.8.8",
"status": "SUCCESS"
},
"social": {
"query": "octocat",
"original_query": "octocat",
"normalized": false,
"normalization_reason": null,
"results": []
},
"metadata": {
"query": "project.md",
"result": {
"sha256": "..."
}
},
"correlation": [
"Overall confidence: Medium"
],
"execution": {
"timestamp_utc": "2026-...",
"inputs_used": {
"username": "octocat",
"output": true
}
}
}High-level flow:
- Input: You provide one or more CLI flags (
--username,--email,--ip,--social,--metadata). - Validation and safety checks: The app validates input types/length and enforces run-level safety limits.
- Module execution: Selected modules run and return structured dictionaries/lists.
- Formatting: Results are printed in consistent CLI sections.
- Correlation (only when at least 2 sources are provided): Cross-source insights are generated with source-quality and confidence weighting.
- Output export (optional): With
--output, report JSON is written tooutput/report.jsonandoutput/reports/result.json.
- Username and IP modules depend on external services and live HTTP behavior.
- Platform HTML/behavior changes can reduce detection quality.
- Anti-bot pages, login walls, and CAPTCHA can produce
BLOCKEDorUNKNOWN. - Email analysis is heuristic and intentionally labeled low confidence.
- Social scan is derived from username results and is not an independent confirmation source.
- Correlation output is guidance for investigation, not legal proof of identity.
Runtime safety environment variables:
OSINT_MAX_TARGET_INPUT_LENGTH(default256)OSINT_MAX_NETWORK_OPERATIONS_PER_RUN(default3)OSINT_MAX_HTTP_REQUESTS_PER_RUN(default30)OSINT_REQUEST_DELAY_SEC(default0)OSINT_MODULE_DELAY_SECONDS(default0)
This project demonstrates practical skills in:
- CLI application design with
argparse - HTTP data collection with retries, timeouts, and error contracts
- confidence-aware heuristic analysis
- multi-source data correlation and signal weighting
- structured report schema design and JSON export
- defensive coding and safety controls for network tooling
- test-driven validation with mocked external dependencies
- Add optional asynchronous request mode for faster large checks.
- Expand platform adapters with versioned marker profiles.
- Add CSV export and configurable output locations.
- Add richer metadata extraction for common document formats.
This tool is for educational and authorized security research only. Use it only on targets and data sources you are legally allowed to investigate.