Function noLZSS::prepare_multiple_dna_sequences_w_rc

Function Documentation

PreparedSequenceResult noLZSS::prepare_multiple_dna_sequences_w_rc(const std::vector<std::string> &sequences)

Prepares multiple DNA sequences for factorization with reverse complement awareness.

Prepares multiple DNA sequences for factorization with reverse complement and tracks sentinel positions.

Takes multiple DNA sequences, concatenates them with unique sentinels, and appends their reverse complements with unique sentinels. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&

Prepares multiple DNA sequences for factorization with reverse complement and tracks sentinel positions.

Takes multiple DNA sequences, concatenates them with unique sentinels, appends their reverse complements with unique sentinels, and tracks sentinel positions. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&

Takes multiple DNA sequences, concatenates them with unique sentinels, appends their reverse complements with unique sentinels, and tracks sentinel positions. The output format is compatible with nolzss_multiple_dna_w_rc(): S = T1!T2@T3$rt(T3)rt(T2)^rt(T1)&

Note

Sentinels range from 1-251, avoiding 0, A(65), C(67), G(71), T(84)

Note

The function validates that all sequences contain only valid DNA nucleotides

Note

Sentinels avoid 0, A(65), C(67), G(71), T(84) - lowercase nucleotides are safe as sentinels

Note

The function validates that all sequences contain only valid DNA nucleotides

Note

Input sequences can be lowercase or uppercase, output is always uppercase

Note

Sentinels avoid 0, A(65), C(67), G(71), T(84) - lowercase nucleotides are safe as sentinels

Note

The function validates that all sequences contain only valid DNA nucleotides

Note

Input sequences can be lowercase or uppercase, output is always uppercase

Parameters:
  • sequences – Vector of DNA sequence strings (should contain only A, C, T, G)

  • sequences – Vector of DNA sequence strings (should contain only A, C, T, G)

  • sequences – Vector of DNA sequence strings (should contain only A, C, T, G)

Throws:
  • std::invalid_argument – If too many sequences (>251) or invalid nucleotides found

  • std::runtime_error – If sequences contain invalid characters

  • std::invalid_argument – If too many sequences (>125) or invalid nucleotides found

  • std::runtime_error – If sequences contain invalid characters

  • std::invalid_argument – If too many sequences (>125) or invalid nucleotides found

  • std::runtime_error – If sequences contain invalid characters

Returns:

Pair containing: (concatenated_string, original_length)

  • concatenated_string: The formatted string with sequences and reverse complements

  • original_length: Length of the original sequences part with sentinels (before reverse complements)

Returns:

PreparedSequenceResult containing:

  • prepared_string: The formatted string with sequences and reverse complements

  • original_length: Length of the original sequences part (before reverse complements)

  • sentinel_positions: Positions of all sentinels in the prepared string

Returns:

PreparedSequenceResult containing:

  • prepared_string: The formatted string with sequences and reverse complements

  • original_length: Length of the original sequences part (before reverse complements)

  • sentinel_positions: Positions of all sentinels in the prepared string