Function noLZSS::prepare_multiple_dna_sequences_no_rc
Defined in File factorizer.cpp
Function Documentation
-
PreparedSequenceResult noLZSS::prepare_multiple_dna_sequences_no_rc(const std::vector<std::string> &sequences)
Prepares multiple DNA sequences for factorization without reverse complement and tracks sentinel positions.
Takes multiple DNA sequences, concatenates them with unique sentinels, and tracks sentinel positions. Unlike prepare_multiple_dna_sequences_w_rc(), this function does not append reverse complements. The output format is: S = T1!T2@T3$
Note
Sentinels range from 1-251, avoiding 0, A(65), C(67), G(71), T(84)
Note
The function validates that all sequences contain only valid DNA nucleotides
Note
Input sequences can be lowercase or uppercase, output is always uppercase
- Parameters:
sequences – Vector of DNA sequence strings (should contain only A, C, T, G)
- Throws:
std::invalid_argument – If too many sequences (>250) or invalid nucleotides found
std::runtime_error – If sequences contain invalid characters
- Returns:
PreparedSequenceResult containing:
prepared_string: The formatted string with sequences and sentinels
original_length: Total length of the concatenated string (same as prepared_string.length())
sentinel_positions: Positions of all sentinels in the prepared string