Function noLZSS::parallel_write_factors_binary_file_fasta_multiple_dna_no_rc

Function Documentation

size_t noLZSS::parallel_write_factors_binary_file_fasta_multiple_dna_no_rc(const std::string &fasta_path, const std::string &out_path, size_t num_threads = 0, FastaDnaSanitizationMode sanitization_mode = FastaDnaSanitizationMode::RemoveAmbiguous)

Parallel version of write_factors_binary_file_fasta_multiple_dna_no_rc.

Reads a FASTA file containing DNA sequences, prepares them for factorization without reverse complement awareness, and performs parallel factorization writing results to a binary output file with metadata.

Note

Binary format includes factors, sequence IDs, sentinel indices, and footer metadata

Note

Only A, C, T, G nucleotides are allowed (case insensitive)

Note

This function overwrites the output file if it exists

Note

Reverse complement matches are NOT supported during factorization

Note

For single-threaded execution (num_threads=1), no temporary files are created

Warning

Ensure sufficient disk space for the output file and temporary files

Parameters:
  • fasta_path – Path to input FASTA file containing DNA sequences

  • out_path – Path to output file where binary factors will be written

  • num_threads – Number of threads to use (0 = auto-detect based on input size)

Throws:
  • std::runtime_error – If FASTA file cannot be opened or contains no valid sequences

  • std::invalid_argument – If too many sequences (>250) in the FASTA file or invalid nucleotides found

Returns:

Number of factors written to the output file