Python API Reference
Core Functions
Core Python wrappers for noLZSS C++ functionality.
This module provides enhanced Python wrappers around the C++ factorization functions, adding input validation, error handling, and convenience features.
- noLZSS.core.factorize(data: str | bytes, validate: bool = True) List[Tuple[int, int, int]] [source]
Factorize a string or bytes object into LZ factors.
- Parameters:
data – Input string or bytes to factorize
validate – Whether to perform input validation (default: True)
- Returns:
List of (position, length, ref) tuples representing the factorization
- Raises:
ValueError – If input is invalid (empty, etc.)
TypeError – If input type is not supported
- noLZSS.core.factorize_file(filepath: str | Path, reserve_hint: int = 0) List[Tuple[int, int, int]] [source]
Factorize the contents of a file into LZ factors.
- Parameters:
filepath – Path to the input file
reserve_hint – Optional hint for reserving space in output vector (0 = no hint)
- Returns:
List of (position, length, ref) tuples representing the factorization
- Raises:
FileNotFoundError – If the file doesn’t exist
- noLZSS.core.count_factors(data: str | bytes, validate: bool = True) int [source]
Count the number of factors in a string without computing the full factorization.
- Parameters:
data – Input string or bytes to analyze
validate – Whether to perform input validation (default: True)
- Returns:
Number of factors in the factorization
- Raises:
ValueError – If input is invalid
TypeError – If input type is not supported
- noLZSS.core.count_factors_file(filepath: str | Path, validate: bool = True) int [source]
Count the number of factors in a file without computing the full factorization.
- Parameters:
filepath – Path to the input file
validate – Whether to perform input validation (default: True)
- Returns:
Number of factors in the factorization
- Raises:
FileNotFoundError – If the file doesn’t exist
ValueError – If file contents are invalid
- noLZSS.core.write_factors_binary_file(data: str | bytes, output_filepath: str | Path) None [source]
Factorize input and write the factors to a binary file.
- Parameters:
data – Input string or bytes to factorize
output_filepath – Path where to write the binary factors
- Raises:
ValueError – If input is invalid
TypeError – If input type is not supported
OSError – If unable to write to output file
- noLZSS.core.factorize_with_info(data: str | bytes, validate: bool = True) dict [source]
Factorize input and return both factors and additional information.
- Parameters:
data – Input string or bytes to factorize
validate – Whether to perform input validation (default: True)
- Returns:
‘factors’: List of (position, length, ref) tuples
’alphabet_info’: Alphabet analysis results
’input_size’: Size of input data
’num_factors’: Number of factors
- Return type:
Dictionary containing
Utilities
Utility functions for input validation, alphabet analysis, file I/O helpers, and visualization.
This module provides reusable utilities for the noLZSS package, including input validation, sentinel handling, alphabet analysis, binary file I/O, and plotting functions.
- exception noLZSS.utils.NoLZSSError[source]
Bases:
Exception
Base exception for noLZSS-related errors.
- exception noLZSS.utils.InvalidInputError[source]
Bases:
NoLZSSError
Raised when input data is invalid for factorization.
- noLZSS.utils.validate_input(data: str | bytes) bytes [source]
Validate and normalize input data for factorization.
- Parameters:
data – Input string or bytes to validate
- Returns:
Normalized bytes data
- Raises:
InvalidInputError – If input is invalid
TypeError – If input type is not supported
- noLZSS.utils.analyze_alphabet(data: str | bytes) Dict[str, Any] [source]
Analyze the alphabet of input data.
- Parameters:
data – Input string or bytes to analyze
- Returns:
‘size’: Number of unique characters/bytes
’characters’: Set of unique characters/bytes
’distribution’: Counter of character/byte frequencies
’entropy’: Shannon entropy of the data
’most_common’: List of (char, count) tuples for most frequent characters
- Return type:
Dictionary containing alphabet analysis
- noLZSS.utils.read_factors_binary_file(filepath: str | Path) List[Tuple[int, int, int]] [source]
Read factors from a binary file written by write_factors_binary_file.
- Parameters:
filepath – Path to the binary factors file
- Returns:
List of (position, length, ref) tuples
- Raises:
NoLZSSError – If file cannot be read or has invalid format
- noLZSS.utils.plot_factor_lengths(factors_or_file: List[Tuple[int, int, int]] | str | Path, save_path: str | Path | None = None, show_plot: bool = True) None [source]
Plot the cumulative factor lengths vs factor index.
Creates a scatter plot where: - X-axis: Cumulative sum of factor lengths - Y-axis: Factor index (number of factors)
- Parameters:
factors_or_file – Either a list of (position, length, ref) tuples or path to binary factors file
save_path – Optional path to save the plot image (e.g., ‘plot.png’)
show_plot – Whether to display the plot (default: True)
- Raises:
NoLZSSError – If binary file cannot be read
TypeError – If input type is invalid
ValueError – If no factors to plot
- Warns:
UserWarning – If matplotlib is not installed (function returns gracefully)
Main Package
noLZSS: Non-overlapping Lempel-Ziv-Storer-Szymanski factorization.
A high-performance Python package with C++ core for computing non-overlapping LZ factorizations of strings and files.
Exception Classes
NoLZSSError
InvalidInputError
- class noLZSS.InvalidInputError[source]
Bases:
NoLZSSError
Raised when input data is invalid for factorization.