Function noLZSS::factorize_fasta_multiple_dna_no_rc

Function Documentation

FastaFactorizationResult noLZSS::factorize_fasta_multiple_dna_no_rc(const std::string &fasta_path)

Factorizes multiple DNA sequences from a FASTA file without reverse complement awareness.

Reads a FASTA file containing DNA sequences, parses them into individual sequences, prepares them for factorization using prepare_multiple_dna_sequences_no_rc(), and then performs noLZSS factorization without reverse complement awareness.

Reads a FASTA file containing DNA sequences, parses them into individual sequences, prepares them for factorization using prepare_multiple_dna_sequences_no_rc(), and then performs noLZSS factorization without reverse complement awareness.

Note

Only A, C, T, G nucleotides are allowed (case insensitive)

Note

Sequences are converted to uppercase before factorization

Note

Reverse complement matches are NOT supported during factorization

Note

Nucleotide validation is performed by prepare_multiple_dna_sequences_no_rc()

Parameters:

fasta_path – Path to the FASTA file containing DNA sequences

Throws:
  • std::runtime_error – If FASTA file cannot be opened or contains no valid sequences

  • std::invalid_argument – If too many sequences (>250) in the FASTA file or invalid nucleotides found

Returns:

FastaFactorizationResult containing factors and sentinel factor indices