Skip to content

UFISH-Team/U-Probe

Repository files navigation

uprobe logo Universal Agentic Probe Design Platform


PyPI License Python Version Documentation Website

U-Probe is a universal and agentic probe design platform tailored for imaging-based spatial-omics. It overcomes the architectural limitations of existing tools and lowers the expertise barrier by combining an innovative declarative configuration system with LLM-based AI agents.

Whether you are designing standard probes for established protocols or developing entirely novel architectures, U-Probe provides a comprehensive, automated workflow from target sequence extraction to rigorous thermodynamic filtering.

Features

  • Universal Probe Architecture: Features a declarative configuration system and a directed acyclic graph (DAG)-based assembly engine. Easily design arbitrary and complex multi-part probe structures (e.g., DNA-FISH, smFISH, MERFISH, seqFISH, $\pi$-FISH, MiP-seq, RCA) without modifying any source code.
  • Agentic AI Workflow: Integrates hierarchical LLM-based AI agents (via PantheonOS) for conversational design. Users can describe experimental goals in plain language or provide scRNA-seq data, and the agents will autonomously construct configurations, select parameters, and execute the design pipeline.
  • Comprehensive Quality Filtering: Automatically computes and filters candidates based on GC content, melting temperature, secondary structure stability (via ViennaRNA), off-target mapping (via Bowtie2), and k-mer frequency (via Jellyfish).
  • End-to-End Automation: Streamlines the entire process from target sequence extraction to advanced post-processing, including overlap removal and equal spacing for tiling designs.
  • Extensible API & Web UI: Offers a clean Python API for programmatic integration, a command-line interface, and an interactive web server for visual workflow management and result analysis.

Installation

U-Probe can be easily installed via pip. We recommend using a virtual environment (like conda or venv).

Install via pip (Recommended)

pip install uprobe

Install from source (For development)

git clone https://github.com/UFISH-Team/U-Probe.git
cd U-Probe
pip install -e .

Using conda environment

git clone https://github.com/UFISH-Team/U-Probe.git
cd U-Probe
conda env create -f environments.yaml
conda activate uprobe
pip install .

Usage Guide

U-Probe provides three main entry points: Command Line Interface (CLI), Python API, and the HTTP/Web UI.

For source installs, install the package and runtime dependencies before running examples:

pip install -e .

Most workflow commands require two YAML files:

  1. genomes.yaml: genome paths such as FASTA and GTF.
  2. protocol.yaml: probe design parameters.

1. Command Line Interface (CLI)

The CLI is perfect for running tasks quickly in the terminal or shell scripts.

πŸ€– AI Smart Assistant

U-Probe includes an interactive AI Agent powered by Pantheon. You can design probes through natural language conversation without writing YAML files manually.

uprobe agent

CLI Agent sessions set UPROBE_OUTPUT_DIR automatically. New generated files should be under:

<workspace>/outputs/agent/<user>/<session>/agent_runs/<run>/

Use uprobe agent --force after upgrading if you need to refresh the installed Pantheon team template.

🌐 Start Web Server

U-Probe now comes with a built-in web server and UI for an intuitive visual experience.

# Start in development mode (default)
uprobe server --host 127.0.0.1 --port 8000

# Start in production mode with multiple workers
uprobe server --env production --host 0.0.0.0 --port 8000 --workers 4

🌟 Complete Workflow (Recommended)

To run the entire pipeline from genome index construction to final probe generation in one go:

uprobe run -p protocol.yaml -g genomes.yaml -o ./results --threads 10

Useful flags:

uprobe run -p protocol.yaml -g genomes.yaml -o ./results --continue-invalid --raw --threads 10

πŸ”§ Step-by-Step Execution

For advanced users who need intermediate results or custom workflows, you can execute each step individually:

# 1. Build genome index
uprobe build-index -p protocol.yaml -g genomes.yaml -t 10

# 2. Validate target genes against the GTF file
uprobe validate-targets -p protocol.yaml -g genomes.yaml --continue-invalid

# 3. Extract target region sequences
uprobe generate-targets -p protocol.yaml -g genomes.yaml -o ./results --continue-invalid

# 4. Construct initial probes from target sequences
uprobe construct-probes -p protocol.yaml -g genomes.yaml --targets ./results/target_sequences.csv -o ./results

# 5. Post-process probes (requires target + probe columns)
uprobe post-process -p protocol.yaml -g genomes.yaml --probes ./results/constructed_probes_combined.csv -o ./results

# 6. Generate HTML analysis report
uprobe generate-report -p protocol.yaml -g genomes.yaml --probes ./results/probes_*.csv -o ./results

🧬 Generate Barcodes

uprobe generate-barcodes \
  --strategy max_orthogonality \
  --name barcodes \
  --num-barcodes 16 \
  --length 8 \
  --alphabet ACT \
  --output ./results/barcodes

This writes barcodes.csv (column sequence) and barcodes.txt.

2. Python API (Ideal for Backend Integration)

If you are developing a web backend or data analysis pipeline, we recommend directly using UProbeAPI. It returns Pandas DataFrames, making it easy to process further.

import uprobe
from pathlib import Path

# Initialize API
api = uprobe.UProbeAPI(
    protocol_config=Path("protocol.yaml"),
    genomes_config=Path("genomes.yaml"),
    output_dir=Path("./results")
)

# --- Method 1: Complete Workflow ---
df_final = api.run_workflow(threads=10)

# --- Method 2: Step-by-Step Execution ---
api.build_genome_index(threads=10)
api.validate_targets()
df_targets = api.generate_target_seqs()
df_probes = api.construct_probes(df_targets)

import pandas as pd
df_combined = pd.concat([df_targets, df_probes], axis=1)
df_final = api.post_process_probes(df_combined)

# Generate HTML report
reports = api.generate_report(df_final)
print(reports["html_reports"])

# Quick barcode generation
barcodes = api.quick_generate_barcodes(num_barcodes=16, length=8, alphabet="ACT")

Configuration Details

U-Probe relies on two main configuration files to run. Here is a breakdown of how to structure them.

1. genomes.yaml

This file maps a genome name to its corresponding file paths. It tells U-Probe where to find the reference genome and annotation files, and which aligner indices to build.

# Example genomes.yaml
GRCh38:
  fasta: "/path/to/GRCh38.fasta"
  gtf: "/path/to/GRCh38.gtf"
  align_index:
    - bowtie2
    - blast
  jellyfish: false  

2. protocol.yaml

This is the core configuration file that defines all parameters for a specific probe design run. It is highly customizable.

# Example protocol.yaml snippet
name: my_rna_probe_design
genome: GRCh38
targets:
  - CD4

extracts:
  target_region:
    source: exon  # genome / exon / CDS / UTR
    length: 30
    overlap: 15

# Define how genes map to specific barcodes
encoding:
  CD4:
    BC1: ACGAGCCTTCCA
    BC2: CGGTAATGGACT

# Define the structure of your probes
probes:
  probe_1:
    template: "{part1}{part2}"
    parts:
      part1:
        expr: "rc(target_region[0:20])"
      part2:
        template: "CC{barcode1}TGCGTCTATTT{barcode2}TAGTGGAGCCT"
        parts:
          barcode1:
            expr: "encoding[target]['BC1']"
          barcode2:
            expr: "encoding[target]['BC2']"
  probe_2:
    template: "{part1}AGGCTCCACTA"
    parts:
      part1:
        expr: "rc(target_region[-10:])"

# Optional. If omitted or empty, the CLI auto-generates attributes,
# post_process, and summary based on source mode and probe structure.
attributes:
  target_gc:
    target: target_region
    type: gc_content
  target_tm:
    target: target_region
    type: annealing_temperature
  target_fold:
    target: target_region
    type: fold_score

post_process:
  filters:
    target_tm:
      condition: target_tm >= 50 & target_tm <= 90
  sorts:
    is_ascending:
      - target_gc
    is_descending:
      - target_fold

summary:
  report_name: rna_report   # rna_report / dna_report
  attributes:
    - target_gc
    - target_tm
    - target_fold

Key Sections in protocol.yaml:

  • name & genome: Basic metadata. The genome must match a key in your genomes.yaml.
  • targets: A list of target gene names or IDs you want to design probes for.
  • extracts: Parameters for extracting target sequences. The source can be genome, exon, CDS, or UTR. You can also define the sliding window length and overlap.
  • encoding: Mapping of specific genes to custom barcodes or identifiers (e.g., assigning BC1 and BC2 to CD4).
  • probes: The core of the design, powered by a Directed Acyclic Graph (DAG) architecture. This allows for complex, modular probe construction where parts and probes can reference each other.
    • template: Construct sequences using placeholders (e.g., {part1}{part2}).
    • expr: Apply Python-like expressions. You can use built-in functions like rc() (reverse complement), slice sequences ([0:20]), or fetch from the encoding map (encoding[target]['BC1']).
    • DAG References: Because of the DAG structure, subsequent probes or parts can dynamically reference the sequences of previously defined probes/parts in their expressions.
  • attributes: Define the biochemical or physical properties you want to calculate for specific targets or probe parts. Available attribute types include:
    • gc_content: Calculates the GC ratio of the sequence.
    • annealing_temperature: Calculates the melting temperature (Tm) using Primer3.
    • fold_score: Calculates the RNA folding minimum free energy (MFE) using ViennaRNA (lower is more stable).
    • self_match: Calculates the potential for self-dimerization.
    • mapped_sites: Aligns the sequence to the genome (via Bowtie2) and counts off-target mapped sites.
    • mapped_genes: Counts the number of unique genes the sequence aligns to (via Bowtie2).
    • kmer_count: Counts k-mer occurrences in the genome (via Jellyfish) to evaluate specificity.
  • post_process: Define strict filters (e.g., Tm ranges) based on the calculated attributes, and sorts to rank the best probes (ascending or descending).
  • remove_overlap: Optional post-processing step for controlling spacing between probes. Put it under post_process when you need it, e.g. post_process: { remove_overlap: { location_interval: 0 } }.
  • summary: Define the report_name (e.g., rna_report or dna_report) and the specific attributes you want to visualize and output in the final report.

For more detailed examples and advanced configurations, please refer to the tests/data/*.yaml directory.

Community & Support

Citation

If you use U-Probe in your research, please cite our paper (or software):

@software{uprobe2026,
  title={U-Probe: Universal Probe Design Tool},
  author={Zhang, Qian and Xu, Weize and Cai, Huaiyuan},
  year={2025},
  url={https://github.com/UFISH-Team/U-Probe},
  version={1.0.0}
}

(Note: Update the citation format with actual journal details once published.)

License

U-Probe is released under the MIT License. See the LICENSE file for details.

Acknowledgments

We thank the bioinformatics community for valuable feedback during development, and the authors of the following tools that U-Probe integrates:

  • PantheonOS - The multi-agent framework powering our AI agent design
  • Bowtie2 - Fast and memory-efficient sequence alignment
  • BLAST+ - Sequence similarity search
  • MMseqs2 - Ultra-fast and sensitive sequence search and clustering
  • Jellyfish - Fast k-mer counting
  • ViennaRNA - RNA secondary structure prediction
  • Primer3 - Primer and probe design algorithms
  • FastAPI & Vue.js - Powering our interactive Web UI

About

Universal & Agentic Probe Design Tool

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages