Function noLZSS::parallel_write_factors_binary_file_fasta_multiple_dna_no_rc
Defined in File parallel_fasta_processor.cpp
Function Documentation
-
size_t noLZSS::parallel_write_factors_binary_file_fasta_multiple_dna_no_rc(const std::string &fasta_path, const std::string &out_path, size_t num_threads = 0, FastaDnaSanitizationMode sanitization_mode = FastaDnaSanitizationMode::RemoveAmbiguous)
Parallel version of write_factors_binary_file_fasta_multiple_dna_no_rc.
Reads a FASTA file containing DNA sequences, prepares them for factorization without reverse complement awareness, and performs parallel factorization writing results to a binary output file with metadata.
Note
Binary format includes factors, sequence IDs, sentinel indices, and footer metadata
Note
Only A, C, T, G nucleotides are allowed (case insensitive)
Note
This function overwrites the output file if it exists
Note
Reverse complement matches are NOT supported during factorization
Note
For single-threaded execution (num_threads=1), no temporary files are created
Warning
Ensure sufficient disk space for the output file and temporary files
- Parameters:
fasta_path – Path to input FASTA file containing DNA sequences
out_path – Path to output file where binary factors will be written
num_threads – Number of threads to use (0 = auto-detect based on input size)
- Throws:
std::runtime_error – If FASTA file cannot be opened or contains no valid sequences
std::invalid_argument – If too many sequences (>250) in the FASTA file or invalid nucleotides found
- Returns:
Number of factors written to the output file