Function noLZSS::write_factors_binary_file_fasta_dna_no_rc_per_sequence
Defined in File fasta_processor.cpp
Function Documentation
-
size_t noLZSS::write_factors_binary_file_fasta_dna_no_rc_per_sequence(const std::string &fasta_path, const std::string &out_dir, FastaDnaSanitizationMode sanitization_mode)
Writes factors from per-sequence DNA factorization without reverse complement to separate binary files.
Reads a FASTA file, factorizes each sequence independently without reverse complement awareness, and writes each sequence’s factors to a separate binary output file. File names include the sequence ID.
Note
Creates separate binary file for each sequence: <out_dir>/<seq_id>.bin
Note
Binary format per file: factors + metadata footer
Note
Only A, C, T, G nucleotides are allowed (case insensitive)
Note
Reverse complement matches are NOT supported during factorization
Warning
Ensure sufficient disk space for the output files
- Parameters:
fasta_path – Path to input FASTA file containing DNA sequences
out_dir – Path to output directory where binary factor files will be written
- Throws:
std::runtime_error – If FASTA file cannot be opened or contains no valid sequences
std::invalid_argument – If invalid nucleotides found
- Returns:
Total number of factors written across all sequences