Benchmarks
mzmlpy vs pymzml
Comparison of mzmlpy and pymzml 2.6.0 across common mzML parsing operations.
Both libraries use default settings: pymzml with build_index_from_scratch=False and skip_chromatogram=True; mzmlpy with default constructor arguments.
A warmup phase primes both libraries before timing so that one-time costs (OBO ontology loading, OS page cache) don't skew results.
Methodology
Each benchmark is run 10 times. The table shows mean ± standard deviation. Both libraries are verified to produce identical m/z and intensity arrays before timing begins.
- Startup — open the file and build the spectrum index.
- Iterate (no decode) — iterate all spectra, accessing only
idandms_level. - Iterate (decode) — iterate all spectra, decoding m/z and intensity arrays.
- Metadata — iterate all spectra, accessing scan time, TIC, and precursor info.
- Random access — seek to 10 random spectra by index/ID and decode arrays.
zlib-compressed DDA file (10 spectra, 527 KB)
| Benchmark | mzmlpy | pymzml | Ratio |
|---|---|---|---|
| Startup | 0.0017s | 0.0021s | 1.2x faster |
| Iterate (no decode) | 0.0059s | 0.0074s | 1.3x faster |
| Iterate (decode) | 0.0071s | 0.0087s | 1.2x faster |
| Metadata | 0.0069s | 0.0077s | 1.1x faster |
| Random access | 0.0047s | 0.0053s | 1.1x faster |
Bruker timsTOF file with ion mobility (10 spectra, 6.7 MB)
| Benchmark | mzmlpy | pymzml | Ratio |
|---|---|---|---|
| Startup | 0.012s | 0.092s | 8.0x faster |
| Iterate (no decode) | 0.040s | 0.221s | 5.5x faster |
| Iterate (decode) | 0.039s | 0.228s | 5.8x faster |
| Metadata | 0.042s | 0.226s | 5.4x faster |
| Random access | 0.012s | 0.110s | 9.2x faster |
The gap widens on larger, more complex files. The Bruker file contains ion mobility data and richer XML metadata, where mzmlpy's parser is 5--9x faster.
Running the benchmark
pip install pymzml # required dependency for comparison
# Default file
uv run python benchmarks/bench_vs_pymzml.py
# Custom file with more repeats
uv run python benchmarks/bench_vs_pymzml.py --file path/to/file.mzML --repeats 10
See benchmarks/bench_vs_pymzml.py for the full source.
Gzip mode comparison
For .mzML.gz files, the gzip_mode parameter controls how the compressed file is accessed.
Benchmarked on a 33,535-spectrum DDA file (cold start, with rapidgzip):
| Mode | Startup | Iterate (500 spectra) | Random access (5 reads) |
|---|---|---|---|
plain .mzML |
0.042s | 0.087s | 0.001s |
in_memory=True |
1.499s | 0.362s | 0.002s |
gzip_mode="extract" |
0.957s | 0.083s | 0.001s |
gzip_mode="indexed" |
6.850s | 0.135s | 0.074s |
gzip_mode="stream" |
0.089s | 0.155s | 22.8s |
"extract" pays a one-time decompression cost then matches plain .mzML speed.
"indexed" startup includes building the gzip seek index and mzML offset index on first open — both are cached alongside the file, so subsequent opens are fast.
"stream" is sequential-only — random access requires re-scanning from the start.
See benchmarks/bench_gzip_modes.py for the full source.