Function noLZSS::count_factors_fasta_dna_no_rc_per_sequence

Function Documentation

FastaPerSequenceCountResult noLZSS::count_factors_fasta_dna_no_rc_per_sequence(const std::string &fasta_path, FastaDnaSanitizationMode sanitization_mode)

Counts per-sequence factors from DNA factorization without reverse complement.

Reads a FASTA file and factorizes each sequence independently without reverse complement awareness, returning per-sequence counts along with sequence metadata and the aggregate total.

Note

Memory-efficient - only counts factors without storing them

Note

Only A, C, T, G nucleotides are allowed (case insensitive)

Parameters:

fasta_path – Path to the FASTA file containing DNA sequences

Throws:
  • std::runtime_error – If FASTA file cannot be opened or contains no valid sequences

  • std::invalid_argument – If invalid nucleotides found

Returns:

Result containing sequence IDs, per-sequence counts, and the total count