Function noLZSS::factorize_w_reference_file
Defined in File factorizer.cpp
Function Documentation
-
size_t noLZSS::factorize_w_reference_file(const std::string &reference_seq, const std::string &target_seq, const std::string &out_path)
Factorizes a target sequence using a reference sequence and writes factors to a binary file (general version).
Factorizes a target sequence using a reference sequence and writes factors to a binary file.
This is the file output version of factorize_w_reference(). It performs general reference-based factorization (no reverse complement) and writes results directly to a binary file in the noLZSS factor format with metadata footer.
The output file format:
Factors: Binary array of Factor structs (24 bytes each: start, length, reference)
Footer: Metadata including factor count, sequence count (2), sentinel count (1)
Use cases:
Processing large non-DNA sequences without storing all factors in memory
Saving factorization results for later analysis
Comparing general text documents with a reference
Concatenates a reference sequence and target sequence (ref@target), then performs noLZSS factorization starting from where the target sequence begins, and writes the resulting factors to a binary file. Suitable for general text or amino acid sequences.
See also
factorize_w_reference() for in-memory version
See also
factorize_dna_w_reference_seq_file() for DNA-specific version with reverse complement
Note
Output file includes footer with num_sequences=2, num_sentinels=1
Note
File uses buffered I/O (1MB buffer) for performance
Note
The reference field in factors points to positions in the combined string
Note
No reverse complement awareness - this is for general text, not DNA
Note
Factors start positions are absolute positions in the combined reference+target string
Note
No reverse complement matching is performed - suitable for text or amino acid sequences
Note
Binary format follows the same structure as other factorization binary outputs
Warning
The sentinel character ‘\x01’ (ASCII 1) must not appear in either input sequence, as it is used internally to separate the reference and target sequences
Warning
This function overwrites the output file if it exists
- Parameters:
reference_seq – Reference sequence (any text)
target_seq – Target sequence to factorize (any text)
out_path – Path to output binary file (will be overwritten if exists)
reference_seq – Reference sequence string (any text)
target_seq – Target sequence string to be factorized (any text)
out_path – Path to output file where binary factors will be written
- Throws:
std::runtime_error – If output file cannot be created
- Returns:
Number of factors written to the file
- Returns:
Number of factors written to the output file