Function noLZSS::factorize_multiple_dna_w_rc
Defined in File factorizer.cpp
Function Documentation
-
std::vector<Factor> noLZSS::factorize_multiple_dna_w_rc(std::string_view text)
Factorizes a DNA text string with reverse complement awareness for multiple sequences and returns factors as a vector.
Factorizes a DNA text string with reverse complement awareness for multiple sequences into noLZSS factors.
This is the main user-facing function for in-memory DNA factorization with multiple sequences and reverse complement. It performs noLZSS factorization and returns all factors in a vector.
Performs non-overlapping Lempel-Ziv-Storer-Szymanski factorization on DNA sequences containing multiple sequences, considering both forward and reverse complement matches. This is particularly useful for genomic data where reverse complement patterns are biologically significant across multiple sequences.
See also
factorize_file_multiple_dna_w_rc() for file-based factorization
See also
factorize_file_multiple_dna_w_rc() for file-based factorization
Note
Factors are returned in order of appearance in the text
Note
The returned factors are non-overlapping and cover the entire input
Note
Reverse complement matches are encoded with RC_MASK in the ref field
Note
Factors are non-overlapping and cover the entire input
- Parameters:
text – Input DNA text string with multiple sequences and sentinels
text – Input DNA text string with multiple sequences and sentinels
- Returns:
Vector containing all factors from the factorization
- Returns:
Vector of Factor objects representing the factorization