Template Function noLZSS::nolzss_multiple_dna_w_rc
Defined in File factorizer.cpp
Function Documentation
-
template<class Sink>
static size_t noLZSS::nolzss_multiple_dna_w_rc(const std::string &S, Sink &&sink) Core noLZSS factorization algorithm implementation with reverse complement awareness for multiple DNA sequences.
Implements the non-overlapping Lempel-Ziv-Storer-Szymanski factorization using a compressed suffix tree, extended to handle multiple DNA sequences with reverse complement matches. The algorithm takes a concatenated string S of multiple sequences with sentinels and their reverse complements, builds a suffix tree over S, and finds the longest previous factor (either forward or reverse complement) for each position in the original sequences, emitting factors through a sink.
Note
This is the core algorithm for multiple DNA sequences factorization that all multiple DNA public functions use
Note
The sink pattern allows for memory-efficient processing
Note
All factors are emitted, including the last one
Note
Reverse complement matches are encoded with the RC_MASK in the ref field
- Template Parameters:
Sink – Callable type that accepts Factor objects (e.g., lambda, function)
- Parameters:
S – Input concatenated DNA text string with sentinels and reverse complements
sink – Callable that receives each computed factor
- Returns:
Number of factors emitted