Centroiding
timsTOF raw data is profile-like: the binary file stores one intensity value per scan per m/z index, spread across hundreds of mobility bins. Centroiding collapses that cloud of raw measurements into a compact list of peaks — each with a single m/z, intensity, and ion mobility value.
tdfpy provides two centroiding functions:
get_centroided_spectrum— high-level: reads a full frame from disk, applies optional noise filtering, and returns centroided peaks in one call.merge_peaks— low-level: centroids pre-assembled NumPy arrays of m/z, intensity, and ion mobility values. Use this when you already have the raw arrays or need fine-grained control.
In practice, most workflows should call .centroid() directly on a Frame, DiaWindow, or
PrmTransition object — that method delegates to get_centroided_spectrum internally.
Numba JIT backend
When Numba is installed (it is included in the default
tdfpy dependencies), the core clustering loop runs as a JIT-compiled native function
(_merge_peaks_numba_kernel). This is typically 5–20× faster than the pure-Python
fallback for large frames. The backend is selected automatically:
# Numba used if available (default)
peaks = merge_peaks(mz, intensity, im)
# Force the Python fallback (useful for debugging or environments without Numba)
peaks = merge_peaks(mz, intensity, im, use_numba=False)
The first call after import triggers Numba's JIT compilation — expect a few seconds of overhead. Subsequent calls use the cached compiled kernel.
get_centroided_spectrum
Reads frame frame_id from the open TimsData connection, converts m/z indices to
m/z values, assembles the raw peak arrays, optionally filters noise, and runs centroiding.
from tdfpy import timsdata_connect, get_centroided_spectrum
with timsdata_connect("experiment.d") as td:
# Default: 1/K0 ion mobility, 8 ppm m/z tolerance
peaks = get_centroided_spectrum(td, frame_id=1)
print(peaks.shape) # (N, 3) — columns: [m/z, intensity, 1/K0]
# Tighter tolerances, CCS instead of 1/K0
peaks = get_centroided_spectrum(
td,
frame_id=1,
ion_mobility_type="ccs",
mz_tolerance=5.0,
im_tolerance=0.03,
)
# Noise filtering before centroiding (string shorthand)
peaks = get_centroided_spectrum(td, frame_id=1, noise="mad")
# Hard intensity threshold
peaks = get_centroided_spectrum(td, frame_id=1, noise=500.0)
# Composed pipeline + region exclusion + tuned filter
from tdfpy import ChargeStateRegion, MadThreshold, VerticalNoiseFilter
peaks = get_centroided_spectrum(
td, frame_id=1,
exclude=ChargeStateRegion(),
noise=[VerticalNoiseFilter(min_streak_scans=5), MadThreshold(k=3)],
)
# Watershed centroider (integer-index space, no float-m/z binning)
from tdfpy import WatershedCentroider
peaks = get_centroided_spectrum(
td, frame_id=1,
centroid=WatershedCentroider(attach_scan_half_width=10, attach_mz_idx_half_width=3),
)
The noise= parameter accepts the string shorthand ("mad", "percentile",
"histogram", "baseline", "iterative_median"), a numeric absolute
threshold, or any NoiseFilter instance / list — see
Noise filters for the full hierarchy. The exclude= parameter
accepts a ChargeStateRegion. The centroid= parameter
swaps the centroiding algorithm — see
Pipeline → Centroiders.
tdfpy.get_centroided_spectrum
get_centroided_spectrum(
td: TimsData,
frame_id: int,
*,
scan_range: tuple[int, int] | None = None,
exclude: ChargeStateRegion | None = None,
smooth: Smooth | None = None,
noise: NoiseSpec = None,
ion_mobility_type: Literal[
"ook0", "ccs", "voltage"
] = "ook0",
centroid: Centroider | None = None
) -> np.ndarray
Extract a centroided spectrum for a single frame.
Thin orchestrator over the :mod:tdfpy.pipeline ops. Threads a
:class:~tdfpy.pipeline.RawSpectrum through optional scan-range
restriction, region exclusion, intensity smoothing, and noise filtering,
then hands it to the centroider — which decides whether to operate in
integer index space (e.g. :class:~tdfpy.pipeline.WatershedCentroider) or
after float conversion (e.g. :class:~tdfpy.pipeline.MergePeaksCentroider).
Default centroider is :class:~tdfpy.pipeline.MergePeaksCentroider. Pass
smooth=Smooth(...) for a position-preserving box-sum/mean smoothing
pre-step; the :class:~tdfpy.pipeline.WatershedCentroider additionally
has its own seed-stabilising smoother via its smooth_*_half_width fields.
Returns an (N, 3) array of [mz, intensity, ion_mobility] centroids.
Source code in src/tdfpy/centroiding.py
571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 | |
merge_peaks
Centroids pre-assembled arrays. The algorithm is a greedy intensity-ordered scan: starting from the highest-intensity raw peak, every neighbouring peak within the m/z and ion mobility tolerances is merged into a single centroid via intensity-weighted averaging. Merged peaks are marked as used and skipped in subsequent iterations.
| Parameter | Default | Notes |
|---|---|---|
mz_tolerance |
8.0 |
Width of the m/z matching window |
mz_tolerance_type |
"ppm" |
"ppm" or "da" |
im_tolerance |
0.1 |
Width of the ion mobility window |
im_tolerance_type |
"relative" |
"relative" (fraction of 1/K0) or "absolute" |
min_peaks |
3 |
Raw peaks required to form a centroid; set to 0 or 1 to keep all |
max_peaks |
None |
Cap on output peaks (highest-intensity first) |
use_numba |
True |
Set to False to force the Python fallback |
import numpy as np
from tdfpy import merge_peaks
mz = np.array([500.001, 500.002, 700.005, 700.006, 700.007])
inten = np.array([8000.0, 4000.0, 6000.0, 5000.0, 3000.0])
im = np.array([0.85, 0.85, 0.92, 0.92, 0.92])
peaks = merge_peaks(mz, inten, im, mz_tolerance=10.0, min_peaks=2)
print(peaks)
# shape (2, 3): two centroided peaks, columns [m/z, intensity, 1/K0]
Noise filtering vs min_peaks
The noise= parameter (available on get_centroided_spectrum,
.centroid(), and get_raw_peaks) chains noise filters before the
centroider runs — intensity thresholds, the
vertical-IM streak filter, or any
combination. Intensity-based estimators have a fundamental limitation:
they can't distinguish low-abundance real signal from electronic noise.
Methods like MadThreshold are anchored to the median of the
intensity distribution — if your sample has sparse signal, the
threshold can rise above legitimate low-abundance peaks.
A more reliable strategy is to increase min_peaks instead:
# Prefer: raise min_peaks to filter noise without discarding low-abundance signal
peaks = merge_peaks(mz, intensity, im, min_peaks=5)
# Noise arises from single scans; real peaks appear across multiple scans.
# min_peaks=5 means a centroid must be supported by at least 5 raw measurements.
Because electronic noise typically manifests as a singleton in a single
scan, requiring several supporting raw peaks is a structural filter —
it targets the origin of noise rather than its intensity. The
VerticalNoiseFilter extends this idea
to the IM axis, requiring peaks to appear as vertical streaks across
consecutive mobility scans.
Use intensity-based noise= filters only if you have a calibrated
threshold or a method validated for your acquisition; always verify
against noise=None first.
tdfpy.merge_peaks
merge_peaks(
mz_array: np.ndarray,
intensity_array: np.ndarray,
ion_mobility_array: np.ndarray,
mz_tolerance: float = 8.0,
mz_tolerance_type: Literal["ppm", "da"] = "ppm",
im_tolerance: float = 0.1,
im_tolerance_type: Literal[
"relative", "absolute"
] = "relative",
min_peaks: int = 3,
max_peaks: int | None = None,
peak_noise_filter: bool = False,
peak_noise_window: float = 0.1,
peak_noise_end_fraction: float = 0.1,
use_numba: bool = True,
) -> np.ndarray
Centroid profile-like peaks using m/z and ion mobility tolerances.
This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mz_array
|
np.ndarray
|
Array of m/z values from raw/profile-like data |
required |
intensity_array
|
np.ndarray
|
Array of intensity values |
required |
ion_mobility_array
|
np.ndarray
|
Array of ion mobility values (1/K0 or CCS) |
required |
mz_tolerance
|
float
|
Tolerance for m/z matching during centroiding |
8.0
|
mz_tolerance_type
|
Literal['ppm', 'da']
|
Type of m/z tolerance - "ppm" or "da" (daltons) |
'ppm'
|
im_tolerance
|
float
|
Tolerance for ion mobility matching during centroiding |
0.1
|
im_tolerance_type
|
Literal['relative', 'absolute']
|
Type of ion mobility tolerance - "relative" or "absolute" |
'relative'
|
min_peaks
|
int
|
Minimum number of nearby raw peaks required to form a centroid. Set to 0 or 1 to keep all peaks (no filtering). |
3
|
max_peaks
|
int | None
|
Maximum number of centroided peaks to return (keeps highest intensity) |
None
|
peak_noise_filter
|
bool
|
If True, after each centroid is formed suppress raw
points within ± |
False
|
peak_noise_window
|
float
|
Half-width in Da on each side of the anchor m/z over
which the peak-noise ramp is applied. Defaults to |
0.1
|
peak_noise_end_fraction
|
float
|
Fraction of the anchor's raw intensity used as
the suppression threshold at |
0.1
|
Returns:
| Type | Description |
|---|---|
np.ndarray
|
np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility] |
Example
mz = np.array([100.0, 100.001, 200.0])
intensity = np.array([1000.0, 500.0, 2000.0])
im = np.array([0.8, 0.8, 0.9])
peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
Source code in src/tdfpy/centroiding.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | |