Template Function noLZSS::nolzss_multiple_dna_w_rc

Function Documentation

template<class Sink>
static size_t noLZSS::nolzss_multiple_dna_w_rc(const std::string &S, Sink &&sink)

Core noLZSS factorization algorithm implementation with reverse complement awareness for multiple DNA sequences.

Implements the non-overlapping Lempel-Ziv-Storer-Szymanski factorization using a compressed suffix tree, extended to handle multiple DNA sequences with reverse complement matches. The algorithm takes a concatenated string S of multiple sequences with sentinels and their reverse complements, builds a suffix tree over S, and finds the longest previous factor (either forward or reverse complement) for each position in the original sequences, emitting factors through a sink.

Note

This is the core algorithm for multiple DNA sequences factorization that all multiple DNA public functions use

Note

The sink pattern allows for memory-efficient processing

Note

All factors are emitted, including the last one

Note

Reverse complement matches are encoded with the RC_MASK in the ref field

Template Parameters:

Sink – Callable type that accepts Factor objects (e.g., lambda, function)

Parameters:
  • S – Input concatenated DNA text string with sentinels and reverse complements

  • sink – Callable that receives each computed factor

Returns:

Number of factors emitted