A Rust library for resolving the strand of origin of RNA-seq reads from their SAM bitwise flags and library preparation protocol. If you write custom BAM-processing code and need to know which strand a fragment comes from, this crate gives you a single function call — LibType::get_strand(flag) — instead of reimplementing the flag logic yourself every time.
The crate also provides named constants for all SAM flag bits (SamFlag::PAIRED, SamFlag::READ_REVERSE, …) and a general-purpose flag-testing helper (check_flag), both of which are useful on their own when manipulating BAM records in Rust.
A command-line tool is included as well. It walks a BAM file and writes a custom SF (Strand Fragment) tag to each read, with values +, -, or .. This can be convenient for quick inspection in a genome browser or for piping into downstream scripts, but the main value of this project is the library itself.
The --LibType flag follows the same naming convention used by Salmon and Kallisto. The first letter(s) describe the relative orientation of the mates; the suffix indicates which read carries the sense strand.
| LibType | Layout | Paired | Description |
|---|---|---|---|
| frFirstStrand | fr (inward) | yes | First-read is on the reverse strand of the RNA (e.g. dUTP, Illumina TruSeq Stranded) |
| frSecondStrand | fr (inward) | yes | First-read is on the sense strand of the RNA (e.g. Ligation method) |
| fFirstStrand | f (forward only) | no | Single-end, read is reverse-complemented relative to the RNA |
| fSecondStrand | f (forward only) | no | Single-end, read matches the sense strand of the RNA |
| ffFirstStrand | ff (same direction) | yes | Both mates on the same strand; first-read reverse = sense |
| ffSecondStrand | ff (same direction) | yes | Both mates on the same strand; first-read forward = sense |
| rfFirstStrand | rf (outward) | yes | Outward-facing mates; first-read forward, mate reverse = sense |
| rfSecondStrand | rf (outward) | yes | Outward-facing mates; reversed assignment |
| rFirstStrand | r (reverse only) | no | Single-end reverse; read reversed = sense |
| rSecondStrand | r (reverse only) | no | Single-end reverse; reversed assignment |
| Unstranded | — | no | Always assigns . (no strand information) |
| PairedUnstranded | — | yes | Always assigns .; requires both mates mapped |
If you are unsure which type your data uses, frFirstStrand (dUTP / Illumina TruSeq Stranded) is by far the most common for recent Illumina paired-end RNA-seq.
- Install Rust: https://www.rust-lang.org/tools/install
- Clone this repository.
- Inside the
BamStrandSpecifierfolder, run:
cargo build --release- The binary will be at
BamStrandSpecifier/target/release/strand_specifier.
Add the crate to your Cargo.toml, then:
use std::str::FromStr;
use strand_specifier_lib::{LibType, Strand, check_flag, SamFlag};
// Parse a library type from a string
let current_lib = "frFirstStrand";
let libtype = match LibType::from_str(current_lib) {
Ok(lib) => lib,
Err(_) => panic!("Incorrect library type: {}", current_lib),
};
// Get the fragment strand for a given SAM flag
let samflag: u16 = 163;
let strand = match libtype.get_strand(samflag) {
Some(s) => s,
None => Strand::NA,
};
// Low-level flag checking:
// check_flag(flag, must_be_set, must_not_be_set)
//
// Check that PROPERLY_PAIRED is set AND NOT_PRIMARY_ALN is not set
assert_eq!(true, check_flag(82, SamFlag::PROPERLY_PAIRED, SamFlag::NOT_PRIMARY_ALN));
// You can combine multiple conditions with +
assert_eq!(
true,
check_flag(
19,
SamFlag::PAIRED + SamFlag::PROPERLY_PAIRED,
SamFlag::NOT_PRIMARY_ALN
)
);-
SamFlag— A struct with named constants for every SAM flag bit (e.g.SamFlag::PAIRED,SamFlag::READ_REVERSE). Using these instead of raw integers makes flag logic self-documenting. All constants areu16and evaluated at compile time, so there is no runtime cost. -
Strand— An enum with variantsPlus,Minus, andNA. ImplementsDisplay(+,-,.) and can be constructed from&str. -
LibType— An enum representing the library preparation protocol. Its methodget_strand(flag: u16) -> Option<Strand>returns the fragment strand for a given SAM flag, orNoneif the flag combination is not consistent with the library type. Whenever the read is paired, the orientation of the mate is used alongside the read's own orientation, which makes the assignment more robust than relying on the read alone. -
check_flag(flag, must_be_set, must_not_be_set) -> bool— A low-level helper that tests whether all bits inmust_be_setare present inflagand all bits inmust_not_be_setare absent.// you can Also you the SamFlag structure for better lisibility. assert_eq!(true, check_flag(18, SamFlag::Paired + SamFlag::READ_REVERSE, SamFlag::NOT_PRIMARY_ALN));
A CLI wrapper is provided for convenience. It reads a BAM, assigns an SF tag to every read, and writes a new BAM:
strand_specifier \
--input <InputBam> \
--output <OutputBam> \
--LibType <LibraryType>
--LibType is optional and defaults to frFirstStrand. See the table above for accepted values.
- The input must be a valid BAM file.
- For paired-end library types (
fr*,ff*,rf*,PairedUnstranded), reads should be paired. Unpaired reads or reads whose mate is unmapped will receive no SF tag (the function returnsNone). - The BAM does not need to be sorted or indexed.
If you find a bug, have a question, or want to contribute test data for a library type that has not been validated yet, please open a GitHub Issue.
MIT