Function noLZSS::parallel_write_factors_binary_file_fasta_dna_no_rc_per_sequence
Defined in File parallel_fasta_processor.cpp
Function Documentation
-
size_t noLZSS::parallel_write_factors_binary_file_fasta_dna_no_rc_per_sequence(const std::string &fasta_path, const std::string &out_dir, size_t num_threads = 0, FastaDnaSanitizationMode sanitization_mode = FastaDnaSanitizationMode::RemoveAmbiguous)
Parallel version of write_factors_binary_file_fasta_dna_no_rc_per_sequence.
Reads a FASTA file, factorizes each sequence independently without reverse complement awareness using parallel processing, and writes each sequence to a separate binary file.
Note
Each sequence is factorized independently in parallel
Note
Creates separate binary file for each sequence: <out_dir>/<seq_id>.bin
Note
Binary format per file: factors + metadata footer
Note
Only A, C, T, G nucleotides are allowed (case insensitive)
Note
Reverse complement matches are NOT supported during factorization
Warning
Ensure sufficient disk space for the output files
- Parameters:
fasta_path – Path to input FASTA file containing DNA sequences
out_dir – Path to output directory where binary factor files will be written
num_threads – Number of threads to use (0 = auto-detect based on sequence count)
- Throws:
std::runtime_error – If FASTA file cannot be opened or contains no valid sequences
std::invalid_argument – If invalid nucleotides found
- Returns:
Total number of factors written across all sequences