Mzml Reader

mzmlpy.run.Mzml

Mzml(
    file: str | Path | Any,
    build_index_from_scratch: bool = False,
    gzip_mode: Literal[
        "extract", "indexed", "stream"
    ] = "extract",
    in_memory: bool = True,
    extract_dir: str | Path | None = None,
    spectrum_id_regex: str | None = None,
    chromatogram_id_regex: str | None = None,
)

Reader for mzML files.

Data is lazily loaded, so only the specific sections of the XML file are parsed. The actual data and properties of objects are only parsed when accessed. Use the context manager to ensure proper file handling. The spectra and chromatograms properties return lookup objects that support iteration, indexing, and ID-based access.

Parameters:

Name	Type	Description	Default
`file`	`str \| Path \| Any`	Path to the mzML file (str or Path) or a file-like object.	required
`build_index_from_scratch`	`bool`	Build the index from scratch instead of using an existing index.	`False`
`gzip_mode`	`Literal['extract', 'indexed', 'stream']`	Strategy for reading gzip-compressed (`.mzML.gz`) files: `"extract"` (default): Decompress to a temporary file on disk, then use standard random-access reading. `"indexed"`: Use the `rapidgzip` library for seekable access to the compressed file without extracting to disk. Requires `pip install mzmlpy[rapidgzip]`. `"stream"`: Stream the file sequentially without building an index. Individual spectrum access re-scans the file from the beginning each time.	`'extract'`
`in_memory`	`bool`	Load the entire file into memory for faster access.	`True`
`extract_dir`	`str \| Path \| None`	Directory to store extracted `.mzML` files when using `gzip_mode='extract'`. If `None` (default), a system temp directory is used (`<tmpdir>/mzmlpy/`). Set this to a custom path to manage extracted files yourself — useful for batch processing where you want to extract all files to one directory and clean up afterward.	`None`
`spectrum_id_regex`	`str \| None`	Optional regex applied to spectrum IDs to create a secondary lookup key. The first capture group (or full match if no groups) becomes the simplified key. For example, `r"scan=(\d+)"` lets you look up spectra by scan number (`reader.spectra["19"]`) instead of the full native ID (`"scan=19"`).	`None`
`chromatogram_id_regex`	`str \| None`	Optional regex applied to chromatogram IDs to create a secondary lookup key. Works identically to `spectrum_id_regex` but for chromatograms.	`None`

Initialize Mzml and parse metadata.

Source code in src/mzmlpy/run.py

def __init__(
    self,
    file: str | Path | Any,
    build_index_from_scratch: bool = False,
    gzip_mode: Literal["extract", "indexed", "stream"] = "extract",
    in_memory: bool = True,
    extract_dir: str | Path | None = None,
    spectrum_id_regex: str | None = None,
    chromatogram_id_regex: str | None = None,
) -> None:
    """Initialize Mzml and parse metadata."""
    self._spectrum_id_regex = spectrum_id_regex
    self._chromatogram_id_regex = chromatogram_id_regex
    self._path: Path | None = None
    file_interface_arg: Any

    if isinstance(file, str | Path):
        self._path = Path(file)
        # Use string representation for internal helpers that expect paths
        path_str = str(self._path)
        self._encoding = _determine_file_encoding(path_str)
        file_interface_arg = path_str
    else:
        # File-like object
        if hasattr(file, "name"):
            self._path = Path(file.name)
        self._encoding = _guess_encoding(file)
        file_interface_arg = file

    # Open file
    self._file_object: FileInterface = FileInterface(
        path=file_interface_arg,
        encoding=self._encoding,
        build_index_from_scratch=build_index_from_scratch,
        gzip_mode=gzip_mode,
        in_memory=in_memory,
        extract_dir=str(extract_dir) if extract_dir is not None else None,
    )

    # Parse metadata
    self._root, self.iter, builder = self._parse_metadata()
    # Extract parsed content
    self._content: _MzMLContent = builder.build()
    self.obo_version = builder.obo_version

file_path `property`

file_path: Path | None

Access the file path as a Path object if available.

file_name `property`

file_name: str

Access the file name as a string.

spectra `property`

spectra: SpectrumLookup

Access spectra lookup.

chromatograms `property`

chromatograms: ChromatogramLookup

Access chromatograms lookup.

TIC `property`

TIC: Chromatogram | None

Access the Total Ion Chromatogram (TIC).

id `property`

id: str

Access mzML id.

version `property`

version: str

Access mzML version.

cvs `property`

cvs: dict[str, CVElement]

Access controlled vocabularies.

file_description `property`

file_description: FileDescription | None

Access file description.

referenceable_param_groups `property`

referenceable_param_groups: dict[
    str, ReferenceableParamGroup
]

Access referenceable parameter groups.

softwares `property`

softwares: dict[str, Software]

Access software list.

instrument_configurations `property`

instrument_configurations: dict[
    str, InstrumentConfiguration
]

Access instrument configurations.

data_processes `property`

data_processes: dict[str, DataProcessing]

Access data processing steps.

samples `property`

samples: dict[str, Sample]

Access sample list.

scan_settings `property`

scan_settings: dict[str, ScanSetting]

Access scan settings.

run `property`

run: Run | None

Access run information.

Mzml Reader

mzmlpy.run.Mzml

file_path property

file_name property

spectra property

chromatograms property

TIC property

id property

version property

cvs property

file_description property

referenceable_param_groups property

softwares property

instrument_configurations property

data_processes property

samples property

scan_settings property

run property