Skip to content

Mzml Reader

mzmlpy.run.Mzml

Mzml(
    file: str | Path | Any,
    build_index_from_scratch: bool = False,
    gzip_mode: Literal[
        "extract", "indexed", "stream"
    ] = "extract",
    in_memory: bool = True,
    extract_dir: str | Path | None = None,
    spectrum_id_regex: str | None = None,
    chromatogram_id_regex: str | None = None,
)

Reader for mzML files.

Data is lazily loaded, so only the specific sections of the XML file are parsed. The actual data and properties of objects are only parsed when accessed. Use the context manager to ensure proper file handling. The spectra and chromatograms properties return lookup objects that support iteration, indexing, and ID-based access.

Parameters:

Name Type Description Default
file str | Path | Any

Path to the mzML file (str or Path) or a file-like object.

required
build_index_from_scratch bool

Build the index from scratch instead of using an existing index.

False
gzip_mode Literal['extract', 'indexed', 'stream']

Strategy for reading gzip-compressed (.mzML.gz) files:

  • "extract" (default): Decompress to a temporary file on disk, then use standard random-access reading.
  • "indexed": Use the rapidgzip library for seekable access to the compressed file without extracting to disk. Requires pip install mzmlpy[rapidgzip].
  • "stream": Stream the file sequentially without building an index. Individual spectrum access re-scans the file from the beginning each time.
'extract'
in_memory bool

Load the entire file into memory for faster access.

True
extract_dir str | Path | None

Directory to store extracted .mzML files when using gzip_mode='extract'. If None (default), a system temp directory is used (<tmpdir>/mzmlpy/). Set this to a custom path to manage extracted files yourself — useful for batch processing where you want to extract all files to one directory and clean up afterward.

None
spectrum_id_regex str | None

Optional regex applied to spectrum IDs to create a secondary lookup key. The first capture group (or full match if no groups) becomes the simplified key. For example, r"scan=(\d+)" lets you look up spectra by scan number (reader.spectra["19"]) instead of the full native ID ("scan=19").

None
chromatogram_id_regex str | None

Optional regex applied to chromatogram IDs to create a secondary lookup key. Works identically to spectrum_id_regex but for chromatograms.

None

Initialize Mzml and parse metadata.

Source code in src/mzmlpy/run.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def __init__(
    self,
    file: str | Path | Any,
    build_index_from_scratch: bool = False,
    gzip_mode: Literal["extract", "indexed", "stream"] = "extract",
    in_memory: bool = True,
    extract_dir: str | Path | None = None,
    spectrum_id_regex: str | None = None,
    chromatogram_id_regex: str | None = None,
) -> None:
    """Initialize Mzml and parse metadata."""
    self._spectrum_id_regex = spectrum_id_regex
    self._chromatogram_id_regex = chromatogram_id_regex
    self._path: Path | None = None
    file_interface_arg: Any

    if isinstance(file, str | Path):
        self._path = Path(file)
        # Use string representation for internal helpers that expect paths
        path_str = str(self._path)
        self._encoding = _determine_file_encoding(path_str)
        file_interface_arg = path_str
    else:
        # File-like object
        if hasattr(file, "name"):
            self._path = Path(file.name)
        self._encoding = _guess_encoding(file)
        file_interface_arg = file

    # Open file
    self._file_object: FileInterface = FileInterface(
        path=file_interface_arg,
        encoding=self._encoding,
        build_index_from_scratch=build_index_from_scratch,
        gzip_mode=gzip_mode,
        in_memory=in_memory,
        extract_dir=str(extract_dir) if extract_dir is not None else None,
    )

    # Parse metadata
    self._root, self.iter, builder = self._parse_metadata()
    # Extract parsed content
    self._content: _MzMLContent = builder.build()
    self.obo_version = builder.obo_version

file_path property

file_path: Path | None

Access the file path as a Path object if available.

file_name property

file_name: str

Access the file name as a string.

spectra property

spectra: SpectrumLookup

Access spectra lookup.

chromatograms property

chromatograms: ChromatogramLookup

Access chromatograms lookup.

TIC property

TIC: Chromatogram | None

Access the Total Ion Chromatogram (TIC).

id property

id: str

Access mzML id.

version property

version: str

Access mzML version.

cvs property

cvs: dict[str, CVElement]

Access controlled vocabularies.

file_description property

file_description: FileDescription | None

Access file description.

referenceable_param_groups property

referenceable_param_groups: dict[
    str, ReferenceableParamGroup
]

Access referenceable parameter groups.

softwares property

softwares: dict[str, Software]

Access software list.

instrument_configurations property

instrument_configurations: dict[
    str, InstrumentConfiguration
]

Access instrument configurations.

data_processes property

data_processes: dict[str, DataProcessing]

Access data processing steps.

samples property

samples: dict[str, Sample]

Access sample list.

scan_settings property

scan_settings: dict[str, ScanSetting]

Access scan settings.

run property

run: Run | None

Access run information.