Template Function noLZSS::nolzss_dna_w_rc
Defined in File factorizer.cpp
Function Documentation
-
template<class Sink>
static size_t noLZSS::nolzss_dna_w_rc(const std::string &T, Sink &&sink) Core noLZSS factorization algorithm implementation with reverse complement awareness for DNA.
Implements the non-overlapping Lempel-Ziv-Storer-Szymanski factorization using a compressed suffix tree, extended to handle DNA sequences with reverse complement matches. The algorithm constructs a combined string S = T ‘$’ rc(T) ‘#’ where rc(T) is the reverse complement, builds a suffix tree over S, and finds the longest previous factor (either forward or reverse complement) for each position in the original text T, emitting factors through a sink.
Note
This is the core algorithm for DNA-aware factorization that all DNA public functions use
Note
The sink pattern allows for memory-efficient processing
Note
All factors are emitted, including the last one
Note
Reverse complement matches are encoded with the RC_MASK in the ref field
- Template Parameters:
Sink – Callable type that accepts Factor objects (e.g., lambda, function)
- Parameters:
T – Input DNA text string
sink – Callable that receives each computed factor
- Returns:
Number of factors emitted