Template Function noLZSS::detail::nolzss_multiple_dna_w_rc
Defined in File factorizer_core.hpp
Function Documentation
-
template<class Sink>
size_t noLZSS::detail::nolzss_multiple_dna_w_rc(const std::string &S, Sink &&sink, size_t start_pos) Core noLZSS factorization algorithm implementation with reverse complement awareness for multiple DNA sequences.
Implements the non-overlapping Lempel-Ziv-Storer-Szymanski factorization using a compressed suffix tree, extended to handle multiple DNA sequences with reverse complement matches. The algorithm takes a concatenated string S of multiple sequences with sentinels and their reverse complements, builds a suffix tree over S, and finds the longest previous factor (either forward or reverse complement) for each position in the original sequences, emitting factors through a sink.
Note
This is the core algorithm for multiple DNA sequences factorization that all multiple DNA public functions use
Note
The sink pattern allows for memory-efficient processing
Note
All factors are emitted, including the last one
Note
Reverse complement matches are encoded with the RC_MASK in the ref field
Note
start_pos allows factorization to begin from a specific position, useful for reference+target factorization
Note
Tie-breaking: Both forward and RC candidates are tracked independently during the tree walk, then their TRUE LCPs are computed and compared. When true lengths are equal, forward is preferred. Among candidates of the same type, the one with the earliest position (smallest start for forward, smallest end for RC) wins.
- Template Parameters:
Sink – Callable type that accepts Factor objects (e.g., lambda, function)
- Parameters:
S – Input concatenated DNA text string with sentinels and reverse complements
sink – Callable that receives each computed factor
start_pos – Starting position for factorization (default: 0)
- Returns:
Number of factors emitted