Spectrum and MsnSpectrum
SpectrumType
class SpectrumType(StrEnum):
CENTROID = "centroid"
PROFILE = "profile"
DECONVOLUTED = "deconvoluted"
SpectrumType tags what stage the data is in. Several methods check or set this flag to prevent out-of-order operations (e.g., calling .decharge() on a non-deconvoluted spectrum raises ValueError).
Peak
@dataclass(frozen=True, slots=True)
class Peak:
mz: float
intensity: float
charge: int | None = None
im: float | None = None
score: float | None = None
A frozen dataclass representing a single detected peak. charge, im, and score are optional. Peak objects are returned by .peaks, .top_peaks(), .get_peak(), and .get_peaks() — they are read-only views, not references into the underlying arrays.
score holds the isotopic profile score (0–1) assigned during deconvolution, or None for peaks that have not been through deconvolution.
>>> peak = Peak(mz=500.1, intensity=1e5, charge=2)
>>> repr(peak)
'Peak(mz=500.1000, int=1.00e+05, z=2)'
Charge conventions
charge value |
Meaning |
|---|---|
> 0 |
Peak belongs to an assigned isotope cluster with that charge state |
-1 |
Singleton — no isotope neighbours found at any tested charge |
0 |
After .decharge() — neutral mass, charge state no longer tracked |
Spectrum
@dataclass(slots=True)
class Spectrum:
mz: NDArray[np.float64]
intensity: NDArray[np.float64]
charge: NDArray[np.int32] | None = None
im: NDArray[np.float64] | None = None
score: NDArray[np.float64] | None = None
spectrum_type: SpectrumType | str | None = None
denoised: str | None = None
normalized: str | None = None
The central data structure. mz and intensity must have the same length. charge, im, and score must also match that length when provided.
Fields:
| Field | Type | Description |
|---|---|---|
mz |
NDArray[np.float64] |
Peak m/z values, sorted ascending |
intensity |
NDArray[np.float64] |
Parallel peak intensities |
charge |
NDArray[np.int32] \| None |
Charge state per peak. None before deconvolution |
im |
NDArray[np.float64] \| None |
Ion mobility per peak. None if not acquired |
score |
NDArray[np.float64] \| None |
Per-peak isotopic profile score (0–1). Populated after deconvolute(); None otherwise. Singletons have score=0.0. |
spectrum_type |
SpectrumType \| str \| None |
Stage tag: CENTROID, PROFILE, or DECONVOLUTED |
denoised |
str \| None |
Name of the denoising method applied, or None |
normalized |
str \| None |
Name of the normalization method applied, or None |
Validation rules enforced in __post_init__:
len(charge) == len(mz)whenchargeis notNonelen(im) == len(mz)whenimis notNonelen(score) == len(mz)whenscoreis notNone- A
chargearray may only be present whenspectrum_type == DECONVOLUTED
import numpy as np
from spxtacular import Spectrum
spec = Spectrum(
mz=np.array([500.1, 800.2, 1200.5], dtype=np.float64),
intensity=np.array([1e5, 2e5, 9e4], dtype=np.float64),
)
print(spec)
# Spectrum(n_peaks=3, type=None, denoised=None, normalized=None)
Peak access
peaks property
Returns all peaks as a list of Peak objects. Iterates the full spectrum; prefer numpy operations on .mz / .intensity for performance on large spectra.
top_peaks
def top_peaks(
self,
n: int,
by: Literal["intensity", "mz", "charge", "im"] = "intensity",
reverse: bool = True,
) -> list[Peak]
Returns the top n peaks sorted by the chosen attribute.
| Parameter | Description |
|---|---|
n |
Number of peaks to return |
by |
Sort key: "intensity" (default), "mz", "charge", "im" |
reverse |
True (default) returns highest values first |
"charge" requires a charge array to be present; "im" requires an ion mobility array. Both raise ValueError otherwise.
# Five most intense peaks
top5 = spec.top_peaks(5)
# Lowest-mz three peaks
low_mz = spec.top_peaks(3, by="mz", reverse=False)
Peak finding
has_peak
def has_peak(
self,
target_mz: float,
mz_tol: float = 0.01,
mz_tol_type: Literal["Da", "ppm"] = "Da",
target_charge: int | None = None,
target_im: float | None = None,
im_tol: float = 0.01,
) -> bool
Returns True if at least one peak matches all supplied criteria.
spec.has_peak(500.1, mz_tol=0.02)
spec.has_peak(500.1, mz_tol=10, mz_tol_type="ppm", target_charge=2)
get_peak
def get_peak(
self,
target_mz: float,
mz_tol: float = 0.01,
mz_tol_type: Literal["Da", "ppm"] = "Da",
target_charge: int | None = None,
target_im: float | None = None,
im_tol: float = 0.01,
collision: Literal["largest", "closest"] = "largest",
) -> Peak | None
Returns a single matching peak, or None if no match is found. When multiple peaks fall within tolerance, collision="largest" picks the most intense; collision="closest" picks the nearest in m/z.
get_peaks
def get_peaks(
self,
target_mz: float,
mz_tol: float = 0.01,
mz_tol_type: Literal["Da", "ppm"] = "Da",
target_charge: int | None = None,
target_im: float | None = None,
im_tol: float = 0.01,
) -> list[Peak]
Returns all peaks matching the criteria (may be empty).
Filtering and processing
All processing methods accept inplace: bool = False. When inplace=False (the default) a new Spectrum is returned, leaving the original unchanged and allowing method chaining.
filter
def filter(
self,
min_mz: float | None = None,
max_mz: float | None = None,
min_intensity: float | None = None,
max_intensity: float | None = None,
min_charge: int | None = None,
max_charge: int | None = None,
min_im: float | None = None,
max_im: float | None = None,
top_n: int | None = None,
inplace: bool = False,
) -> Self
Removes peaks outside the given bounds. All parameters are optional and combinable. top_n is applied last — after all range filters — keeping the top_n most intense survivors.
Charge, ion mobility, and score filters are silently ignored if the spectrum lacks those arrays.
Score filter parameters:
| Parameter | Type | Description |
|---|---|---|
min_score |
float \| None |
Keep peaks with score >= this value. Only effective when score array is present. |
max_score |
float \| None |
Keep peaks with score <= this value. Only effective when score array is present. |
# Keep peaks between 200 and 1500 Da with intensity >= 1000
filtered = spec.filter(min_mz=200, max_mz=1500, min_intensity=1000)
# Keep only the 50 most intense peaks after m/z filtering
filtered = spec.filter(min_mz=200, top_n=50)
normalize
def normalize(
self,
method: Literal["max", "tic", "median"] = "max",
inplace: bool = False,
) -> Self
Scales all intensities so that the chosen reference equals 1.0.
method |
Normalization factor |
|---|---|
"max" (default) |
Most intense peak |
"tic" |
Total ion current (sum of all intensities) |
"median" |
Median intensity |
Calling normalize on an already-normalized spectrum emits a UserWarning and returns self unchanged.
denoise
def denoise(
self,
method: Literal["mad", "percentile", "histogram", "baseline", "iterative_median"]
| float | int = "mad",
inplace: bool = False,
) -> Self
Removes peaks below an estimated noise threshold. Peaks at or above the threshold are kept.
method |
Threshold strategy |
|---|---|
"mad" (default) |
median + 3 × 1.4826 × MAD |
"percentile" |
5th percentile of intensities |
"histogram" |
Mode of 100-bin histogram + 3 σ (FWHM-derived) |
"baseline" |
Mean + 3 σ of the bottom 25th percentile |
"iterative_median" |
Iteratively refines median/MAD estimate over 3 passes |
float or int |
Used directly as the absolute threshold |
Calling denoise on an already-denoised spectrum emits a UserWarning and returns self unchanged.
spec.denoise() # MAD (robust, recommended for most spectra)
spec.denoise("histogram") # histogram mode estimate
spec.denoise(5000.0) # fixed absolute threshold
centroid
Converts a profile-mode spectrum to centroid mode using vectorized Gaussian fitting. Detects local maxima, fits a Gaussian to each triplet of points, and returns sub-bin peak positions. Ion mobility data is preserved at the apex value.
Calling this on an already-centroided spectrum emits a UserWarning and returns self unchanged.
merge
def merge(
self,
mz_tolerance: float = 0.01,
mz_tolerance_type: Literal["ppm", "da"] = "da",
im_tolerance: float = 0.05,
im_tolerance_type: Literal["relative", "absolute"] = "relative",
inplace: bool = False,
) -> Self
Merges nearby peaks using a greedy intensity-ordered strategy. Peaks are processed from most to least intense; each unused neighbour within the tolerance window is merged into the current peak. The merged peak carries the intensity-weighted average m/z (and ion mobility if present) and the summed intensity. Charge arrays are preserved — only peaks with matching charge are merged together.
merged = spec.merge(mz_tolerance=0.02, mz_tolerance_type="da")
merged = spec.merge(mz_tolerance=5, mz_tolerance_type="ppm")
deconvolute
def deconvolute(
self,
tolerance: float = 50,
tolerance_type: Literal["ppm", "da"] = "ppm",
charge_range: tuple[int, int] = (1, 3),
intensity: Literal["base", "total"] = "total",
max_dpeaks: int = 2000,
inplace: bool = False,
) -> Self
Assigns each peak to an isotope cluster and records the charge state. Returns a spectrum with spectrum_type=DECONVOLUTED and a populated charge array.
| Parameter | Description |
|---|---|
tolerance |
Peak matching tolerance (default 50 ppm) |
tolerance_type |
"ppm" (default) or "da" |
charge_range |
(min_charge, max_charge) inclusive; default (1, 3) |
intensity |
"total" (default) sums the whole cluster; "base" uses only the monoisotopic peak |
max_dpeaks |
Maximum output peaks (default 2000) |
min_intensity |
float \| "min" — Absolute intensity floor for isotope detectability. The sentinel "min" (default) uses the spectrum's own minimum intensity as the S/N floor. |
min_score |
float — Clusters whose best isotopic profile score falls below this threshold are recorded as singletons. Default 0.0 accepts all clusters. |
After deconvolution the charge array follows the charge conventions table: > 0 for assigned clusters, -1 for singletons.
See Deconvolution for a detailed walkthrough.
decharge
Converts deconvoluted m/z values to neutral monoisotopic masses using neutral_mass = (mz × charge) - (charge × proton_mass). Singletons (charge == -1) are dropped. The resulting charge array is set to all zeros (meaning "charge unknown / neutral mass").
Raises ValueError if the spectrum is not in DECONVOLUTED state.
The
scorearray is propagated throughdecharge()— each surviving neutral-mass peak retains the score of its charged precursor.
neutral = decon.decharge()
# neutral.mz now contains neutral masses sorted ascending
# neutral.charge is all zeros
# neutral.score carries through from the deconvoluted spectrum
update
Low-level helper to create a new Spectrum with arbitrary fields replaced. Prefer the named methods above for normal use.
Compression
compress
def compress(
self,
url_safe: bool = False,
mz_precision: int | None = None,
intensity_precision: int | None = None,
im_precision: int | None = None,
compression: str = "gzip",
) -> str
Serialises the spectrum to a compact ASCII string. m/z values are delta-encoded; intensities and ion mobility use raw float32 hex encoding. The result is compressed with gzip, zlib, or brotli (requires pip install brotli) and then base85-encoded (default) or base64 URL-safe encoded when url_safe=True.
Optional *_precision parameters round the corresponding arrays before encoding, reducing compressed size at the cost of numeric precision.
Spectrum.from_compressed
Round-trips a string produced by .compress() back to a Spectrum.
Visualization
plot
def plot(
self,
title: str | None = None,
show_charges: bool = True,
show_scores: bool = True,
**layout_kwargs,
) -> Figure
Returns a Plotly Figure (stick plot). Requires plotly (pip install plotly).
| Parameter | Description |
|---|---|
title |
Plot title |
show_charges |
Colour sticks by charge state when a charge array is present |
show_scores |
Annotate scored peaks with their score value when a score array is present |
See Visualization for mirror_plot() and annotate_spectrum().
plot_table
Returns a pandas.DataFrame with one row per peak. Each row contains both the raw peak data (mz, intensity, charge, score, im) and all visual properties (color, linewidth, opacity, series, label, label_size, label_font, label_color, label_yshift, label_xanchor, hover). Modify the table freely, then render it with plot_from_table().
tbl = decon.plot_table()
tbl.loc[tbl["charge"] == 2, "color"] = "red"
tbl.loc[tbl["intensity"] > 1e5, "linewidth"] = 2.0
fig = plot_from_table(tbl, title="Custom plot")
fig.show()
annot_plot_table
def annot_plot_table(
self,
fragments,
mz_tol: float = 0.02,
mz_tol_type: Literal["Da", "ppm"] = "Da",
peak_selection: Literal["closest", "largest", "all"] = "closest",
include_sequence: bool = False,
) -> pd.DataFrame
Like plot_table() but matched peaks are coloured by ion series and labelled with their fragment identifier. Unmatched peaks are grey. Modify the returned table and call plot_from_table() to render.
tbl = spec.annot_plot_table(fragments, mz_tol=10, mz_tol_type="ppm")
tbl.loc[tbl["label"] != "", "label_size"] = 14
fig = plot_from_table(tbl, title="Annotated")
fig.show()
See Visualization — Plot table API for full column reference.
MsnSpectrum
MsnSpectrum extends Spectrum with instrument-level metadata. It is what the readers (DReader, MzmlReader) yield. All Spectrum methods are available unchanged.
@dataclass(slots=True, kw_only=True)
class MsnSpectrum(Spectrum):
# Scan identification
scan_number: int | None = None
ms_level: int | None = None
native_id: str | None = None
# Timing
rt: float | None = None # retention time, seconds
injection_time: float | None = None # ion accumulation time, ms
# Acquisition windows
mz_range: tuple[float, float] | None = None
im_range: tuple[float, float] | None = None
im_type: str | None = None # e.g. "1/K0", "drift_time_ms"
# Instrument settings
polarity: Literal["positive", "negative"] | None = None
# Optional metadata
resolution: float | None = None
analyzer: str | None = None # e.g. "TOF", "FTMS", "ITMS"
ramp_time: float | None = None
collision_energy: float | None = None
activation_type: str | None = None
precursors: list[TargetIon] | None = None
TargetIon
@dataclass(frozen=True, slots=True, kw_only=True)
class TargetIon(Peak):
is_monoisotopic: bool | None
Represents a precursor ion selected for fragmentation. Stored in MsnSpectrum.precursors.