Template Function noLZSS::nolzss_dna_w_rc

Function Documentation

template<class Sink>
static size_t noLZSS::nolzss_dna_w_rc(const std::string &T, Sink &&sink)

Core noLZSS factorization algorithm implementation with reverse complement awareness for DNA.

Implements the non-overlapping Lempel-Ziv-Storer-Szymanski factorization using a compressed suffix tree, extended to handle DNA sequences with reverse complement matches. The algorithm constructs a combined string S = T ‘$’ rc(T) ‘#’ where rc(T) is the reverse complement, builds a suffix tree over S, and finds the longest previous factor (either forward or reverse complement) for each position in the original text T, emitting factors through a sink.

Note

This is the core algorithm for DNA-aware factorization that all DNA public functions use

Note

The sink pattern allows for memory-efficient processing

Note

All factors are emitted, including the last one

Note

Reverse complement matches are encoded with the RC_MASK in the ref field

Template Parameters:

Sink – Callable type that accepts Factor objects (e.g., lambda, function)

Parameters:
  • T – Input DNA text string

  • sink – Callable that receives each computed factor

Returns:

Number of factors emitted