Function noLZSS::prepare_multiple_dna_sequences_w_rc
Defined in File factorizer.cpp
Function Documentation
-
PreparedSequenceResult noLZSS::prepare_multiple_dna_sequences_w_rc(const std::vector<std::string> &sequences)
Prepares multiple DNA sequences for factorization with reverse complement awareness.
Prepares multiple DNA sequences for factorization with reverse complement and tracks sentinel positions.
Takes multiple DNA sequences, concatenates them with unique sentinels, and appends their reverse complements with unique sentinels. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&
Prepares multiple DNA sequences for factorization with reverse complement and tracks sentinel positions.
Takes multiple DNA sequences, concatenates them with unique sentinels, appends their reverse complements with unique sentinels, and tracks sentinel positions. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&
Takes multiple DNA sequences, concatenates them with unique sentinels, appends their reverse complements with unique sentinels, and tracks sentinel positions. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&
Note
Sentinels range from 1-251, avoiding 0, A(65), C(67), G(71), T(84)
Note
The function validates that all sequences contain only valid DNA nucleotides
Note
Sentinels avoid 0, A(65), C(67), G(71), T(84) - lowercase nucleotides are safe as sentinels
Note
The function validates that all sequences contain only valid DNA nucleotides
Note
Input sequences can be lowercase or uppercase, output is always uppercase
Note
Sentinels avoid 0, A(65), C(67), G(71), T(84) - lowercase nucleotides are safe as sentinels
Note
The function validates that all sequences contain only valid DNA nucleotides
Note
Input sequences can be lowercase or uppercase, output is always uppercase
- Parameters:
sequences – Vector of DNA sequence strings (should contain only A, C, T, G)
sequences – Vector of DNA sequence strings (should contain only A, C, T, G)
sequences – Vector of DNA sequence strings (should contain only A, C, T, G)
- Throws:
std::invalid_argument – If too many sequences (>251) or invalid nucleotides found
std::runtime_error – If sequences contain invalid characters
std::invalid_argument – If too many sequences (>125) or invalid nucleotides found
std::runtime_error – If sequences contain invalid characters
std::invalid_argument – If too many sequences (>125) or invalid nucleotides found
std::runtime_error – If sequences contain invalid characters
- Returns:
Pair containing: (concatenated_string, original_length)
concatenated_string: The formatted string with sequences and reverse complements
original_length: Length of the original sequences part with sentinels (before reverse complements)
- Returns:
PreparedSequenceResult containing:
prepared_string: The formatted string with sequences and reverse complements
original_length: Length of the original sequences part (before reverse complements)
sentinel_positions: Positions of all sentinels in the prepared string
- Returns:
PreparedSequenceResult containing:
prepared_string: The formatted string with sequences and reverse complements
original_length: Length of the original sequences part (before reverse complements)
sentinel_positions: Positions of all sentinels in the prepared string