Function noLZSS::factorize_fasta_multiple_dna_w_rc

Function Documentation

FastaFactorizationResult noLZSS::factorize_fasta_multiple_dna_w_rc(const std::string &fasta_path)

Factorizes multiple DNA sequences from a FASTA file with reverse complement awareness.

Reads a FASTA file containing DNA sequences, parses them into individual sequences, prepares them for factorization using prepare_multiple_dna_sequences_w_rc(), and then performs noLZSS factorization with reverse complement awareness.

Reads a FASTA file containing DNA sequences, parses them into individual sequences, prepares them for factorization using prepare_multiple_dna_sequences_w_rc(), and then performs noLZSS factorization with reverse complement awareness.

Note

Only A, C, T, G nucleotides are allowed (case insensitive)

Note

Sequences are converted to uppercase before factorization

Note

Reverse complement matches are supported during factorization

Note

Nucleotide validation is performed by prepare_multiple_dna_sequences_w_rc()

Parameters:

fasta_path – Path to the FASTA file containing DNA sequences

Throws:
  • std::runtime_error – If FASTA file cannot be opened or contains no valid sequences

  • std::invalid_argument – If too many sequences (>125) in the FASTA file or invalid nucleotides found

Returns:

FastaFactorizationResult containing factors and sentinel factor indices