Intelligent SETI Signal Analysis — Rust-Accelerated Processing with Machine Learning Classification
The Search for Extraterrestrial Intelligence generates terabytes of radio observation data, yet the standard analysis tool — turboSETI — is a pure-Python, batch-only pipeline with no machine learning, no RFI learning, and no way to distinguish a drifting ET signal from a drifting satellite. MitraSETI replaces this bottleneck with a Rust-powered de-Doppler engine (up to 45x faster on real Breakthrough Listen data), a CNN + Transformer classifier that automatically rejects RFI and flags anomalies, and a streaming observation mode that can run multi-day campaigns unattended — complete with desktop and web interfaces for real-time monitoring.
- 45x faster processing on million-channel observations via parallel Rust de-Doppler search with rayon
- Two-stage ML classification — rule-based filtering eliminates 99%+ obvious RFI; CNN+Transformer inference runs only on survivors
- Out-of-distribution detection — ensemble of MSP, Energy, and Spectral distance methods flags anomalous signals
- 9-class signal taxonomy — NARROWBAND_DRIFTING, NARROWBAND_STATIONARY, BROADBAND, PULSED, CHIRP, RFI_TERRESTRIAL, RFI_SATELLITE, NOISE, CANDIDATE_ET
- Streaming observation mode — continuous multi-day campaigns with auto-training, self-correcting thresholds, and daily HTML reports
- Catalog cross-matching — SIMBAD, NVSS, FIRST, and ATNF Pulsar catalogs with 24-hour caching
- AstroLens integration — optical + radio cross-reference for multi-modal discovery
- Desktop + Web UI — PyQt5 desktop app and FastAPI + Jinja2 web interface with animated starfield and glass theme
- Format support — Sigproc
.filand HDF5.h5(Breakthrough Listen format) with blimpy fallback
Measured on real Breakthrough Listen observation data. No synthetic files, no cherry-picked runs.
| Dataset | File | Channels | Time Steps | MitraSETI | turboSETI | Speedup |
|---|---|---|---|---|---|---|
| Voyager-1 | .h5 (48 MB) |
1,048,576 | 16 | 0.06 s | 2.53 s | 45x |
| TRAPPIST-1 | .fil (14 MB) |
65,536 | — | 0.43–0.54 s | 0.08–0.64 s | comparable |
Detection comparison on Voyager-1:
| Tool | Detections | Notes |
|---|---|---|
| turboSETI | 3 hits, SNR 245.7 | Raw hit list, no classification |
| MitraSETI | 1 candidate | ML-classified, OOD-scored, RFI-filtered |
turboSETI reports raw hits. MitraSETI reports classified candidates — signals that survived rule-based filtering, CNN+Transformer inference, and out-of-distribution analysis. One high-confidence candidate is more actionable than three unscreened hits.
Benchmarks run on an Apple Silicon workstation. Rust core compiled with
--release. turboSETI 2.x via pip.
┌──────────────────────────────┐
│ Filterbank / HDF5 Input │
│ (.fil / .h5 files) │
└──────────────┬───────────────┘
│
▼
┌───────────────────────────────────────┐
│ RUST CORE (mitraseti-core) │
│ │
│ ┌─────────────┐ ┌────────────────┐ │
│ │ Filterbank │ │ De-Doppler │ │
│ │ Reader │──│ Engine │ │
│ │ .fil / .h5 │ │ rayon ∥ │ │
│ └─────────────┘ └───────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ RFI Filter │ │
│ │ known-band, │ │
│ │ zero-drift, │ │
│ │ broadband, │ │
│ │ persistence │ │
│ └────────┬────────┘ │
└──────────────────────────┼────────────┘
│
┌────────────────────▼────────────────────┐
│ PYTHON ML LAYER │
│ │
│ Stage 1: Rule-Based Filter (all hits) │
│ │ │
│ ▼ survivors only │
│ Stage 2: CNN+Transformer Inference │
│ + OOD Detection │
└────────────────────┬───────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Catalog │ │ AstroLens │ │ Signal │
│ Cross-Match │ │ Optical │ │ Export │
│ SIMBAD/NVSS/ │ │ Cross-Ref │ │ JSON / DB │
│ FIRST/Pulsar │ │ │ │ │
└───────┬────────┘ └───────┬────────┘ └───────┬────────┘
└────────────────────────┼────────────────────────┘
│
┌────────────▼────────────┐
│ UI LAYER │
│ Desktop (PyQt5) │
│ Web (FastAPI + Jinja2) │
│ WebSocket live stream │
└─────────────────────────┘
| Stage | Component | What It Does | Technology |
|---|---|---|---|
| 1. Ingest | FilterbankReader |
Reads .fil (Sigproc) and .h5 (HDF5/BL) files; handles 3D data, large-file subsetting, blimpy fallback |
Rust (core/src/filterbank.rs), h5py, blimpy |
| 2. De-Doppler | DedopplerEngine |
Brute-force drift-rate search across all channels; finds narrowband signals drifting due to relative acceleration | Rust (core/src/dedoppler.rs), rayon parallelism |
| 3. RFI Rejection | RFIFilter |
Eliminates known-band interference, zero-drift signals, broadband events; persistence scoring across observations | Rust (core/src/rfi_filter.rs) |
| 4. Clustering | _cluster_hits |
Deduplicates nearby hits in frequency/drift space; keeps highest-SNR per cluster | Python (pipeline.py) |
| 5a. Rule Filter | Stage 1 classifier | Fast pass over all signals: checks drift rate range, SNR thresholds, boundary artifacts; eliminates 99%+ obvious RFI | Python (pipeline.py) |
| 5b. ML Inference | Stage 2 classifier | CNN+Transformer on surviving candidates; produces 9-class probabilities, confidence, RFI probability | PyTorch (inference/signal_classifier.py) |
| 5c. OOD Detection | RadioOODDetector |
Ensemble of MSP + Energy + Spectral distance; flags signals outside training distribution as anomalies | PyTorch (inference/ood_detector.py) |
| 6. Features | FeatureExtractor |
Extracts SNR, drift rate, bandwidth, spectral index, kurtosis, skewness per signal | NumPy/SciPy (inference/feature_extractor.py) |
| 7. Catalog | RadioCatalogQuery |
Cross-references coordinates against SIMBAD, NVSS, FIRST, ATNF Pulsar catalogs | astroquery (catalog/radio_catalogs.py) |
| 8. Storage | SignalDB |
Async SQLite storage for signals, observations, and candidates | aiosqlite (api/database.py) |
The mitraseti-core crate (core/) is not a thin wrapper — it implements the compute-intensive stages of the pipeline in Rust for concrete, measurable gains:
Parallelism with rayon. The de-Doppler search distributes drift-rate trials across all available CPU cores. On a 16-core machine processing a million-channel observation, this alone accounts for the 45x speedup over turboSETI's single-threaded Python loop.
Memory safety without garbage collection. Filterbank files routinely exceed 1 GB. Rust's ownership model ensures zero-copy data handling and deterministic memory usage — no GC pauses during long processing runs.
Zero-copy Python interop via PyO3. The Rust core exposes DedopplerEngine, RFIFilter, FilterbankReader, and data types directly to Python through PyO3 bindings. NumPy arrays pass through without serialization overhead.
FFT via rustfft. Spectral analysis uses the pure-Rust rustfft crate, avoiding C library dependencies while maintaining competitive throughput.
ndarray for tensor operations. The Rust equivalent of NumPy, providing familiar n-dimensional array semantics with SIMD-friendly memory layout.
# core/Cargo.toml — key dependencies
rayon = "1.10" # data-parallel iterators
ndarray = "0.16" # n-dimensional arrays
rustfft = "6.2" # FFT engine
pyo3 = "0.22" # Python bindings
hdf5 = "0.8" # HDF5 support (optional feature)MitraSETI's classifier (inference/signal_classifier.py) uses a SpectralCNNBackbone for frequency-domain feature extraction combined with a 2-layer, 4-head Transformer encoder for temporal pattern recognition:
Input spectrogram (256 freq × 64 time)
│
▼
Conv1d backbone (spectral features along frequency axis)
│
▼
Transformer encoder (2 layers, 4 heads — temporal patterns)
│
▼
MLP classification head → 9 classes
Signal classes:
| Class | Description |
|---|---|
NARROWBAND_DRIFTING |
Narrowband signal with non-zero drift rate — the primary ET signature |
NARROWBAND_STATIONARY |
Narrowband, zero drift — likely local RFI |
BROADBAND |
Wideband emission |
PULSED |
Periodic pulsed signal |
CHIRP |
Frequency-swept signal |
RFI_TERRESTRIAL |
Terrestrial radio frequency interference |
RFI_SATELLITE |
Satellite downlink interference |
NOISE |
Background noise, no signal present |
CANDIDATE_ET |
Passes all filters — requires human review |
The two-stage design is critical for performance on real data where >99% of detections are RFI:
-
Stage 1 (rule-based, all signals): Checks drift rate range (0.05–10.0 Hz/s), SNR thresholds, drift boundary artifacts, and zero-drift patterns. Runs in microseconds per signal. Eliminates the vast majority.
-
Stage 2 (ML, survivors only): Extracts 256×64 spectrograms around each candidate frequency, runs batch CNN+Transformer inference, and applies OOD detection. Only invoked on the small fraction that pass Stage 1.
Signals that don't match any of the 9 training classes are flagged rather than forced into the nearest bucket. The OOD detector uses an ensemble of three methods:
- Maximum Softmax Probability (MSP): Low max probability → likely OOD
- Energy Score: Free energy of the logit vector
- Spectral Distance: Distance from training distribution in feature space
| File | Size | Description |
|---|---|---|
signal_classifier_v1.pt |
~3 MB | CNN+Transformer weights trained on Breakthrough Listen data |
ood_calibration.json |
~5 MB | OOD detector calibration thresholds |
Inference supports MPS (Apple Silicon), CUDA (NVIDIA), and CPU backends automatically.
- Python 3.10+
- Rust 1.70+ (install via rustup)
- maturin (for building the Rust extension)
macOS
git clone https://github.com/deepfieldlabs/MitraSETI.git
cd MitraSETI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Build the Rust core (requires Rust toolchain)
pip install maturin
maturin develop --release
# Install MitraSETI in editable mode
pip install -e .Linux
# Install system dependencies (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install -y python3-dev libhdf5-dev pkg-config
git clone https://github.com/deepfieldlabs/MitraSETI.git
cd MitraSETI
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Build the Rust core
pip install maturin
maturin develop --release
pip install -e .Windows
git clone https://github.com/deepfieldlabs/MitraSETI.git
cd MitraSETI
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
# Build the Rust core (requires Rust toolchain from rustup.rs)
pip install maturin
maturin develop --release
pip install -e .python -c "import mitraseti_core; print('Rust core loaded')"
python -c "from inference.signal_classifier import SignalClassifier; print('ML layer ready')"MitraSETI includes a FastAPI-powered web interface with interactive signal browsing, sky map, reports, and catalog cross-reference lookups. The web UI runs standalone — no separate API process is needed.
# Start the web server
cd MitraSETI
python -m web.app --port 9090
# Open in browser
open http://localhost:9090Pages: Dashboard · Waterfall Viewer · Signal Gallery · RFI Dashboard · Sky Map · Streaming Monitor · Reports · About
The REST API (for programmatic access) is also available:
uvicorn api.main:app --host 0.0.0.0 --port 8000
# Swagger docs: http://localhost:8000/docsThe PyQt5 desktop application provides a native interface with waterfall viewer, signal gallery, sky map, RFI dashboard, and streaming observation panel.
python main.py

MitraSETI desktop app — real-time signal processing and ML classification

Waterfall Viewer — spectrogram display with drift-rate overlay and zoom controls

Signal Gallery — browse detections with ML classification and confidence scores

Sky Radar — interactive celestial map with radio + optical source overlay
The desktop app includes:
- Waterfall Viewer — zoom, pan, drift line overlay, ON/OFF comparison
- Signal Gallery — browse detections with ML classification badges and confidence scores
- Sky Map Panel — interactive sky map with AstroLens optical overlay
- RFI Dashboard — real-time rejection statistics
- Streaming Panel — control and monitor continuous observation runs
Process filterbank files directly from the command line:
# Process a single file
python pipeline.py observation.fil
# Process multiple files
python pipeline.py data/*.fil data/*.h5
# Specify model weights and output
python pipeline.py observation.h5 \
--model models/signal_classifier_v1.pt \
--ood-cal models/ood_calibration.json \
--json-output results.json
# Process with custom database
python pipeline.py observation.fil --db results.sqliteExample output:
============================================================
File: Voyager1.h5
Status: success
Raw hits: 847
Filtered: 23
RFI: 19
Candidates: 1
Anomalies: 0
Time: 0.060s
Top candidates:
freq=8419921066.000000 Hz drift=0.3928 Hz/s SNR=245.7 class=narrowband_drifting conf=0.982
# Build and start all services
docker-compose up --build
# Run in background
docker-compose up -d
# Process files via the containerized API
curl -X POST http://localhost:8000/process \
-F "file=@observation.fil" \
-F "source_name=Voyager-1"The Docker setup includes the Rust core pre-compiled with HDF5 support, all Python dependencies, and the web server.
MitraSETI's streaming observation engine runs multi-day campaigns unattended, continuously processing filterbank files as they arrive.
# Run a 7-day continuous observation
python scripts/streaming_observation.py --days 7
# Aggressive mode (lower SNR thresholds, more candidates)
python scripts/streaming_observation.py --days 3 --mode aggressive
# Generate a report from existing data without processing
python scripts/streaming_observation.py --report-only
# Reset state and start fresh
python scripts/streaming_observation.py --reset- Auto-training: Initial model training at 5 files, fine-tuning every 500 processing cycles
- Self-correcting: Adjusts SNR thresholds based on candidate rates — too many candidates triggers stricter filtering, too few relaxes it
- ON-OFF cadence analysis: ABACAD pattern RFI rejection for source/reference observation pairs
- Daily HTML reports: Charts, candidate rankings, processing statistics, and performance metrics saved to
mitraseti_artifacts/streaming_reports/ - State persistence: Saves/resumes across restarts via JSON state files
- Health monitoring: Tracks API status, disk space, and model state
MitraSETI exposes a REST API via FastAPI. Key endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
System health: GPU status, models loaded, disk space |
POST |
/process |
Upload and process a .fil / .h5 file through the full pipeline |
GET |
/signals |
List detected signals with filters (SNR, drift rate, classification, RFI score) |
GET |
/signals/{id} |
Get a single signal by ID |
PATCH |
/signals/{id} |
Update signal classification, verification status, or notes |
GET |
/candidates |
List signals promoted to ET candidate status, ordered by SNR |
GET |
/stats |
Aggregate processing statistics across all observations |
GET |
/catalog/crossref |
Cross-reference coordinates against SIMBAD, NVSS, FIRST, ATNF Pulsar catalogs |
GET |
/astrolens/crossref |
Check for AstroLens optical anomalies near a radio position |
WS |
/ws/live |
WebSocket stream for real-time signal updates during processing |
# Upload a file for processing
curl -X POST http://localhost:8000/process \
-F "file=@data/voyager1.h5" \
-F "source_name=Voyager-1" \
-F "ra=286.86" \
-F "dec=12.17"
# List candidates with SNR > 20
curl "http://localhost:8000/signals?min_snr=20&is_candidate=true"
# Cross-reference a detection against radio catalogs
curl "http://localhost:8000/catalog/crossref?ra=286.86&dec=12.17&radius_arcmin=5"MitraSETI is not a fork of turboSETI — it is a ground-up reimagination of the SETI signal analysis pipeline, built for the era of machine learning and massive data volumes.
| Capability | turboSETI | MitraSETI |
|---|---|---|
| Core language | Pure Python | Rust core + Python ML layer |
| Parallelism | Single-threaded | Multi-core via rayon (all CPU cores) |
| Processing speed | 2.53 s (Voyager-1 H5) | 0.06 s (45x faster) |
| ML classification | None | CNN + Transformer (9 classes) |
| RFI handling | Manual SNR threshold | Learned rejection + rule-based + persistence scoring |
| Out-of-distribution | None | Ensemble OOD detection (MSP + Energy + Spectral) |
| Signal taxonomy | Hit / no hit | 9 classes with confidence scores |
| Streaming mode | Batch only | Multi-day continuous with auto-training |
| Self-correcting | No | Adjusts thresholds based on candidate rates |
| ON-OFF cadence | External script | Built-in ABACAD pattern analysis |
| Optical cross-ref | None | AstroLens integration |
| Catalog matching | Basic | SIMBAD, NVSS, FIRST, ATNF Pulsar (24h cache) |
| Web interface | None | FastAPI + Jinja2 + WebSocket live stream |
| Desktop app | None | PyQt5 with waterfall viewer, sky map, gallery |
| API | None | REST API with Swagger docs |
| Daily reports | None | Auto-generated HTML with charts |
The intelligence layer is the key difference. turboSETI finds signals. MitraSETI understands them — automatically classifying, scoring, cross-referencing, and prioritizing so researchers focus on what matters.
MitraSETI includes first-class integration with AstroLens, enabling a unique optical + radio cross-reference workflow:
- Detect narrowband signals in radio filterbank data with MitraSETI
- Cross-reference signal coordinates against AstroLens optical catalog via the
/astrolens/crossrefAPI - Overlay optical imagery on the interactive sky map in the desktop app
- Correlate radio detections with known optical sources, transients, and anomalies
- Flag signals near stars, galaxies, or optical events that lack obvious radio counterparts
This multi-modal approach opens discovery pathways that single-wavelength analysis cannot achieve. A narrowband drifting signal coincident with an optically anomalous star is far more interesting than one coincident with a known satellite.
v0.2.0 Roadmap:
| Feature | Description |
|---|---|
| Taylor tree de-Doppler | O(N log N) algorithm replacing brute-force O(N²) — order-of-magnitude speedup on large files |
| CLI tool | Standalone mitraseti command for headless operation on HPC clusters |
| REST API extensions | Batch upload, observation scheduling, webhook notifications |
| Pre-trained model zoo | Models trained on GBT, Parkes, MeerKAT, and ATA datasets |
| GPU-accelerated de-Doppler | CUDA and Metal compute shaders for the de-Doppler search |
| Real-time SDR input | Direct ingestion from software-defined radios (RTL-SDR, USRP) |
| Cloud deployment | AWS / GCP Terraform modules for scalable processing |
Contributions are welcome. Whether it's a bug fix, new feature, or documentation improvement:
- Fork the repository
- Create a feature branch (
git checkout -b feature/taylor-tree-dedoppler) - Commit your changes (
git commit -m 'Implement Taylor tree de-Doppler') - Push to the branch (
git push origin feature/taylor-tree-dedoppler) - Open a Pull Request
Please make sure to:
- Write tests for new features (
pytestwith the existing test suite intests/) - Follow the existing code style (enforced via
ruff) - Update documentation as needed
See CONTRIBUTING.md for detailed guidelines.
This project is licensed under the MIT License — see the LICENSE file for details.
Created by Saman Tabatabaeian · Deep Field Labs
Built for the search for extraterrestrial intelligence.
If you find this useful, please star the repo — it helps the project reach more researchers.


