Skip to content

urvalkheni/osint-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSINT Tracker

Confidence-aware OSINT CLI for collecting, correlating, and exporting open-source digital footprint signals.

🚀 Overview

OSINT Tracker is a Python command-line tool that helps you collect and organize publicly available intelligence indicators from multiple sources.

It was built as an educational, practical project to show how OSINT workflows can be made safer and more consistent through:

  • clear status labels (FOUND, NOT_FOUND, BLOCKED, UNKNOWN)
  • explicit confidence levels (HIGH, MEDIUM, LOW)
  • structured reporting and correlation insights
  • runtime guardrails for request volume and input safety

Real-world use cases include:

  • beginner-friendly OSINT learning and cybersecurity training
  • quick digital footprint checks during triage
  • generating repeatable JSON reports for small investigations

🔥 Features

Only implemented features are listed below.

  • Username search across GitHub, Reddit, Twitter, and Instagram. Uses a conservative HEAD -> GET strategy with platform-specific content markers and canonical URL path checks.
  • Confidence-aware account status model. Each username/social result returns status + confidence instead of only true/false.
  • Offline email analysis. Validates and normalizes email input, identifies provider, detects local-part pattern, classifies personal vs generic usage, and labels local-part length.
  • Live IP lookup via ipwho.is. Returns country/city/ISP/coordinates when available, with retries and structured failure reasons (for example rate_limited, request_failed).
  • Social scan built from username results. Reuses username signals and converts them to social availability states (available, not available, blocked, unknown).
  • Local metadata extraction. Reads file name, path, size, MIME type, created/modified timestamps, and SHA-256 hash.
  • Cross-source correlation engine. Produces weighted, human-readable insights and an Overall confidence label, while treating social data as DERIVED (supporting, not independent proof).
  • JSON report export. Saves a stable schema to both output/report.json and output/reports/result.json.
  • Runtime safety controls. Includes input length limits, max network operations per run, shared HTTP request budget, and optional pacing delays.

🛠 Tech Stack

  • Language: Python
  • Runtime library: requests
  • Testing: pytest
  • Dev tooling: black, flake8
  • Python standard library used heavily: argparse, concurrent.futures, ipaddress, pathlib, hashlib, json, datetime, threading, mimetypes

📂 Project Structure

osint-tracker/
├── main.py
├── requirements.txt
├── requirements-dev.txt
├── project.md
├── src/
│   ├── __init__.py
│   ├── core/
│   ├── modules/
│   │   ├── __init__.py
│   │   ├── username_search.py
│   │   ├── email_lookup.py
│   │   ├── ip_lookup.py
│   │   ├── social_scan.py
│   │   └── metadata_extractor.py
│   └── utils/
│       ├── __init__.py
│       ├── request_context.py
│       ├── correlation.py
│       └── formatter.py
├── tests/
│   ├── test_cli.py
│   ├── test_username_search.py
│   ├── test_email_lookup.py
│   ├── test_ip_lookup.py
│   ├── test_correlation.py
│   └── test_report_schema.py
└── output/
    ├── report.json
    └── reports/
        └── result.json

Folder guide for beginners:

  • main.py -> CLI entry point. Parses flags, runs modules, handles errors, prints results, and exports reports.
  • src/modules/ -> Core feature modules (username, email, IP, social, metadata).
  • src/utils/request_context.py -> Shared request budget and delay logic used by network modules.
  • src/utils/correlation.py -> Combines multi-source outputs into readable investigative insights.
  • src/utils/formatter.py -> Formats terminal output and builds/exports JSON report payloads.
  • src/core/ -> Present in the project structure but currently empty.
  • tests/ -> Automated tests for CLI behavior, module logic, correlation rules, and report schema consistency.
  • output/ -> Generated report artifacts.

⚙️ Setup & Installation

git clone https://github.com/urvalkheni/osint-tracker.git
cd osint-tracker
python -m pip install -r requirements.txt
python main.py

What each step does:

  1. git clone ... downloads the project.
  2. cd osint-tracker moves into the project folder.
  3. pip install -r requirements.txt installs runtime dependency (requests).
  4. python main.py runs the CLI (with no flags, it shows help).

Optional (development tools):

python -m pip install -r requirements-dev.txt

Run tests:

pytest -q

🧪 Usage Examples

Check where a username appears:

python main.py --username octocat

Analyze an email offline (no network call):

python main.py --email demo.user@gmail.com

Lookup IP geolocation/network details:

python main.py --ip 8.8.8.8

Run social scan from username-based signals:

python main.py --social octocat

Extract local file metadata:

python main.py --metadata project.md

Run multiple sources and export a report:

python main.py --username octocat --email demo.user@gmail.com --ip 8.8.8.8 --output

Useful CLI flags:

--username USERNAME   Search username across configured platforms
--email EMAIL         Run offline email analysis
--ip IP_ADDRESS       Run IP intelligence lookup
--social USERNAME     Build social scan from username results
--metadata FILE_PATH  Extract local file metadata
--output              Export JSON report files
-v, --verbose         Show detailed errors (stack trace)

📊 Example Output

Example terminal output (real format, sample values):

[+] Username Analysis

Target: octocat

GitHub     -> FOUND | confidence=HIGH | HTTP 200 | HEAD->GET (https://github.com/octocat)
Reddit     -> NOT_FOUND | confidence=HIGH | HTTP 404 | HEAD (https://www.reddit.com/user/octocat)
Twitter    -> BLOCKED | confidence=MEDIUM | HTTP 200 | HEAD->GET | Blocked by platform (https://twitter.com/octocat)
Instagram  -> UNKNOWN | confidence=LOW | HTTP 200 | HEAD->GET | Unable to determine account status (https://www.instagram.com/octocat)

Note: Result may be affected by platform restrictions or anti-bot protections

[+] Correlation Summary

- Username found on GitHub
- Social source quality: DERIVED
- Overall confidence: Medium

Example exported JSON shape (trimmed):

{
    "username": {
        "query": "octocat",
        "original_query": "octocat",
        "normalized": false,
        "normalization_reason": null,
        "results": []
    },
    "email": {
        "email": "demo.user@gmail.com",
        "provider": "Google",
        "type": "personal",
        "confidence": "LOW"
    },
    "ip": {
        "ip": "8.8.8.8",
        "status": "SUCCESS"
    },
    "social": {
        "query": "octocat",
        "original_query": "octocat",
        "normalized": false,
        "normalization_reason": null,
        "results": []
    },
    "metadata": {
        "query": "project.md",
        "result": {
            "sha256": "..."
        }
    },
    "correlation": [
        "Overall confidence: Medium"
    ],
    "execution": {
        "timestamp_utc": "2026-...",
        "inputs_used": {
            "username": "octocat",
            "output": true
        }
    }
}

🧠 How It Works

High-level flow:

  1. Input: You provide one or more CLI flags (--username, --email, --ip, --social, --metadata).
  2. Validation and safety checks: The app validates input types/length and enforces run-level safety limits.
  3. Module execution: Selected modules run and return structured dictionaries/lists.
  4. Formatting: Results are printed in consistent CLI sections.
  5. Correlation (only when at least 2 sources are provided): Cross-source insights are generated with source-quality and confidence weighting.
  6. Output export (optional): With --output, report JSON is written to output/report.json and output/reports/result.json.

🔐 Limitations / Notes

  • Username and IP modules depend on external services and live HTTP behavior.
  • Platform HTML/behavior changes can reduce detection quality.
  • Anti-bot pages, login walls, and CAPTCHA can produce BLOCKED or UNKNOWN.
  • Email analysis is heuristic and intentionally labeled low confidence.
  • Social scan is derived from username results and is not an independent confirmation source.
  • Correlation output is guidance for investigation, not legal proof of identity.

Runtime safety environment variables:

  • OSINT_MAX_TARGET_INPUT_LENGTH (default 256)
  • OSINT_MAX_NETWORK_OPERATIONS_PER_RUN (default 3)
  • OSINT_MAX_HTTP_REQUESTS_PER_RUN (default 30)
  • OSINT_REQUEST_DELAY_SEC (default 0)
  • OSINT_MODULE_DELAY_SECONDS (default 0)

🎯 Learning Outcomes

This project demonstrates practical skills in:

  • CLI application design with argparse
  • HTTP data collection with retries, timeouts, and error contracts
  • confidence-aware heuristic analysis
  • multi-source data correlation and signal weighting
  • structured report schema design and JSON export
  • defensive coding and safety controls for network tooling
  • test-driven validation with mocked external dependencies

🚀 Future Improvements

  • Add optional asynchronous request mode for faster large checks.
  • Expand platform adapters with versioned marker profiles.
  • Add CSV export and configurable output locations.
  • Add richer metadata extraction for common document formats.

⚠️ Disclaimer

This tool is for educational and authorized security research only. Use it only on targets and data sources you are legally allowed to investigate.

About

Confidence-aware OSINT CLI tool for collecting, analyzing, and correlating digital footprint signals across multiple sources with structured reporting and safety controls.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages