The HEARTH scripts directory contains a comprehensive suite of Python tools for processing, validating, and managing threat hunting content. The codebase has been completely refactored with enterprise-grade features including error handling, logging, caching, validation, and testing.
- Centralized configuration with environment variable overrides
- JSON-based configuration files
- Singleton pattern for global access
- Type-safe configuration with dataclasses
from config_manager import get_config
config = get_config().config
print(config.base_directory)- Structured logging with multiple handlers
- File and console output with different formats
- Configurable log levels
- Singleton logger instance
from logger_config import get_logger
logger = get_logger()
logger.info(\"Processing hunt files\")- Comprehensive validation for hunt data
- URL, file path, and format validation
- MITRE ATT&CK tactic validation
- Security-focused input sanitization
from validators import HuntValidator
validator = HuntValidator()
validator.validate_hunt_id('H001', 'Flames')- File-based and memory caching
- TTL (Time To Live) support
- File modification detection
- Decorator-based caching
from cache_manager import cached
@cached(ttl=3600)
def expensive_operation(data):
return process_data(data)- Object-oriented hunt file processing
- Multiple export formats (JSON, JavaScript)
- Comprehensive error handling
- Statistics generation
from hunt_parser import HuntProcessor
processor = HuntProcessor()
hunts = processor.process_all_hunts()Shared utilities for markdown parsing, file discovery, and data extraction.
Custom exception classes for different error types:
FileProcessingErrorMarkdownParsingErrorValidationErrorConfigurationErrorAIAnalysisError
- Comprehensive exception hierarchy
- Graceful error recovery
- Detailed error logging with context
- User-friendly error messages
- Multi-level caching (memory + disk)
- Lazy loading of resources
- Efficient data structures
- Debounced operations
- Input sanitization and validation
- Path traversal protection
- URL validation
- Secure file handling
- Environment-based configuration
- JSON configuration files
- Runtime configuration updates
- Type-safe configuration objects
- Structured logging
- Performance metrics
- Cache statistics
- Processing statistics
- Comprehensive unit tests
- Integration tests
- Mock-based testing
- Test coverage for all components
from hunt_parser import HuntProcessor
from logger_config import get_logger
logger = get_logger()
processor = HuntProcessor()
try:
# Process all hunt files
hunts = processor.process_all_hunts()
# Export to JavaScript format
processor.export_hunts(hunts)
# Generate statistics
stats = processor.generate_statistics(hunts)
processor.print_statistics(stats)
except Exception as error:
logger.error(f\"Processing failed: {error}\")from config_manager import get_config
config_manager = get_config()
# Update configuration
config_manager.update_config(
base_directory=\"/custom/path\",
max_hunts_for_comparison=20
)
# Save configuration
config_manager.save_config(\"custom_config.json\")from validators import HuntValidator
validator = HuntValidator()
# Validate hunt data
hunt_data = {
'id': 'H001',
'category': 'Flames',
'title': 'Test Hunt',
'tactic': 'Execution'
}
validated_data = validator.validate_hunt_data(hunt_data)from cache_manager import get_cache_manager
cache = get_cache_manager()
# Manual caching
cache.set('key', data, file_path='source.md')
cached_data = cache.get('key')
# Decorator caching
@cached(ttl=1800)
def process_file(file_path):
return expensive_processing(file_path)Original hunt parsing script, maintained for backward compatibility.
Enhanced object-oriented hunt parser with full feature set.
python3 scripts/hunt_parser.pyGenerates contributor leaderboard from hunt submissions.
python3 scripts/generate_leaderboard.pyAI-powered duplicate detection for hunt submissions.
Comprehensive test suite for all components.
python3 scripts/test_runner.py# Base configuration
export HEARTH_BASE_DIR=\"/path/to/hearth\"
export HEARTH_OUTPUT_DIR=\"/path/to/output\"
# Processing settings
export HEARTH_MAX_COMPARISON_HUNTS=15
export HEARTH_SIMILARITY_THRESHOLD=0.8
# AI settings
export OPENAI_MODEL=\"gpt-4\"
export OPENAI_API_KEY=\"your-api-key\"
# GitHub settings
export GITHUB_REPO_URL=\"https://github.com/your/repo\"
export GITHUB_BRANCH=\"main\"{
\"base_directory\": \".\",
\"hunt_directories\": [\"Flames\", \"Embers\", \"Alchemy\"],
\"output_directory\": \".\",
\"max_hunts_for_comparison\": 10,
\"similarity_threshold\": 0.7,
\"hunts_data_filename\": \"hunts-data.js\",
\"contributors_filename\": \"Keepers/Contributors.md\"
}Run the complete test suite:
python3 scripts/test_runner.pyRun specific test categories:
python3 -m unittest scripts.test_runner.TestHuntValidator
python3 -m unittest scripts.test_runner.TestCacheManagerfrom cache_manager import get_cache_manager
cache = get_cache_manager()
stats = cache.get_cache_stats()
print(f\"Cache entries: {stats['memory_entries']}\")
print(f\"Cache size: {stats['total_size_bytes']} bytes\")processor = HuntProcessor()
hunts = processor.process_all_hunts()
stats = processor.generate_statistics(hunts)
print(f\"Total hunts: {stats['total_hunts']}\")
print(f\"Categories: {stats['category_counts']}\")- Always use try-catch blocks for file operations
- Log errors with context using the centralized logger
- Use specific exception types for different error conditions
- Provide meaningful error messages for users
- Implement graceful degradation when possible
- Follow the established architecture patterns
- Add comprehensive tests for new features
- Use type hints for all function signatures
- Document new configuration options
- Update this README for new features
If you're migrating from the original scripts:
-
Replace parse_hunts.py usage:
# Old from parse_hunts import main main() # New from hunt_parser import HuntProcessor processor = HuntProcessor() processor.process_all_hunts()
-
Update configuration:
- Move hardcoded values to configuration files
- Use environment variables for sensitive data
-
Add error handling:
- Wrap operations in try-catch blocks
- Use the centralized logger
-
Enable caching:
- Add @cached decorators to expensive functions
- Use cache.get/set for manual caching
-
Import Errors:
- Ensure scripts directory is in Python path
- Check for missing dependencies
-
File Permission Errors:
- Verify read/write permissions on directories
- Check cache directory permissions
-
Configuration Issues:
- Validate JSON configuration syntax
- Check environment variable names
-
Performance Issues:
- Monitor cache hit rates
- Check log files for bottlenecks
- Use profiling tools for optimization
Enable debug logging:
import logging
logging.getLogger('hearth').setLevel(logging.DEBUG)- Python 3.8+
- pathlib (built-in)
- json (built-in)
- re (built-in)
- typing (built-in)
- dataclasses (built-in)
- unittest (built-in)
Optional:
- openai (for AI analysis)
- python-dotenv (for environment variables)
This documentation reflects the enhanced architecture and provides comprehensive guidance for using the improved HEARTH scripts system.