Function noLZSS::factorize_multiple_dna_w_rc

Function Documentation

std::vector<Factor> noLZSS::factorize_multiple_dna_w_rc(std::string_view text)

Factorizes a DNA text string with reverse complement awareness for multiple sequences and returns factors as a vector.

Factorizes a DNA text string with reverse complement awareness for multiple sequences into noLZSS factors.

This is the main user-facing function for in-memory DNA factorization with multiple sequences and reverse complement. It performs noLZSS factorization and returns all factors in a vector.

Performs non-overlapping Lempel-Ziv-Storer-Szymanski factorization on DNA sequences containing multiple sequences, considering both forward and reverse complement matches. This is particularly useful for genomic data where reverse complement patterns are biologically significant across multiple sequences.

See also

factorize_file_multiple_dna_w_rc() for file-based factorization

See also

factorize_file_multiple_dna_w_rc() for file-based factorization

Note

Factors are returned in order of appearance in the text

Note

The returned factors are non-overlapping and cover the entire input

Note

Reverse complement matches are encoded with RC_MASK in the ref field

Note

Factors are non-overlapping and cover the entire input

Parameters:
  • text – Input DNA text string with multiple sequences and sentinels

  • text – Input DNA text string with multiple sequences and sentinels

Returns:

Vector containing all factors from the factorization

Returns:

Vector of Factor objects representing the factorization