Skip to content

Centroiding

timsTOF raw data is profile-like: the binary file stores one intensity value per scan per m/z index, spread across hundreds of mobility bins. Centroiding collapses that cloud of raw measurements into a compact list of peaks — each with a single m/z, intensity, and ion mobility value.

tdfpy provides two centroiding functions:

  • get_centroided_spectrum — high-level: reads a full frame from disk, applies optional noise filtering, and returns centroided peaks in one call.
  • merge_peaks — low-level: centroids pre-assembled NumPy arrays of m/z, intensity, and ion mobility values. Use this when you already have the raw arrays or need fine-grained control.

In practice, most workflows should call .centroid() directly on a Frame, DiaWindow, or PrmTransition object — that method delegates to get_centroided_spectrum internally.

Numba JIT backend

When Numba is installed (it is included in the default tdfpy dependencies), the core clustering loop runs as a JIT-compiled native function (_merge_peaks_numba_kernel). This is typically 5–20× faster than the pure-Python fallback for large frames. The backend is selected automatically:

# Numba used if available (default)
peaks = merge_peaks(mz, intensity, im)

# Force the Python fallback (useful for debugging or environments without Numba)
peaks = merge_peaks(mz, intensity, im, use_numba=False)

The first call after import triggers Numba's JIT compilation — expect a few seconds of overhead. Subsequent calls use the cached compiled kernel.


get_centroided_spectrum

Reads frame frame_id from the open TimsData connection, converts m/z indices to m/z values, assembles the raw peak arrays, optionally filters noise, and runs centroiding.

from tdfpy import timsdata_connect, get_centroided_spectrum

with timsdata_connect("experiment.d") as td:
    # Default: 1/K0 ion mobility, 8 ppm m/z tolerance
    peaks = get_centroided_spectrum(td, frame_id=1)
    print(peaks.shape)   # (N, 3) — columns: [m/z, intensity, 1/K0]

    # Tighter tolerances, CCS instead of 1/K0
    peaks = get_centroided_spectrum(
        td,
        frame_id=1,
        ion_mobility_type="ccs",
        mz_tolerance=5.0,
        im_tolerance=0.03,
    )

    # Noise filtering before centroiding (string shorthand)
    peaks = get_centroided_spectrum(td, frame_id=1, noise="mad")

    # Hard intensity threshold
    peaks = get_centroided_spectrum(td, frame_id=1, noise=500.0)

    # Composed pipeline + region exclusion + tuned filter
    from tdfpy import ChargeStateRegion, MadThreshold, VerticalNoiseFilter
    peaks = get_centroided_spectrum(
        td, frame_id=1,
        exclude=ChargeStateRegion(),
        noise=[VerticalNoiseFilter(min_streak_scans=5), MadThreshold(k=3)],
    )

    # Watershed centroider (integer-index space, no float-m/z binning)
    from tdfpy import WatershedCentroider
    peaks = get_centroided_spectrum(
        td, frame_id=1,
        centroid=WatershedCentroider(attach_scan_half_width=10, attach_mz_idx_half_width=3),
    )

The noise= parameter accepts the string shorthand ("mad", "percentile", "histogram", "baseline", "iterative_median"), a numeric absolute threshold, or any NoiseFilter instance / list — see Noise filters for the full hierarchy. The exclude= parameter accepts a ChargeStateRegion. The centroid= parameter swaps the centroiding algorithm — see Pipeline → Centroiders.

tdfpy.get_centroided_spectrum

get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    *,
    scan_range: tuple[int, int] | None = None,
    exclude: ChargeStateRegion | None = None,
    smooth: Smooth | None = None,
    noise: NoiseSpec = None,
    ion_mobility_type: Literal[
        "ook0", "ccs", "voltage"
    ] = "ook0",
    centroid: Centroider | None = None
) -> np.ndarray

Extract a centroided spectrum for a single frame.

Thin orchestrator over the :mod:tdfpy.pipeline ops. Threads a :class:~tdfpy.pipeline.RawSpectrum through optional scan-range restriction, region exclusion, intensity smoothing, and noise filtering, then hands it to the centroider — which decides whether to operate in integer index space (e.g. :class:~tdfpy.pipeline.WatershedCentroider) or after float conversion (e.g. :class:~tdfpy.pipeline.MergePeaksCentroider).

Default centroider is :class:~tdfpy.pipeline.MergePeaksCentroider. Pass smooth=Smooth(...) for a position-preserving box-sum/mean smoothing pre-step; the :class:~tdfpy.pipeline.WatershedCentroider additionally has its own seed-stabilising smoother via its smooth_*_half_width fields.

Returns an (N, 3) array of [mz, intensity, ion_mobility] centroids.

Source code in src/tdfpy/centroiding.py
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
def get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    *,
    scan_range: tuple[int, int] | None = None,
    exclude: ChargeStateRegion | None = None,
    smooth: Smooth | None = None,
    noise: NoiseSpec = None,
    ion_mobility_type: Literal["ook0", "ccs", "voltage"] = "ook0",
    centroid: Centroider | None = None,
) -> np.ndarray:
    """Extract a centroided spectrum for a single frame.

    Thin orchestrator over the :mod:`tdfpy.pipeline` ops. Threads a
    :class:`~tdfpy.pipeline.RawSpectrum` through optional scan-range
    restriction, region exclusion, intensity smoothing, and noise filtering,
    then hands it to the centroider — which decides whether to operate in
    integer index space (e.g. :class:`~tdfpy.pipeline.WatershedCentroider`) or
    after float conversion (e.g. :class:`~tdfpy.pipeline.MergePeaksCentroider`).

    Default centroider is :class:`~tdfpy.pipeline.MergePeaksCentroider`. Pass
    ``smooth=Smooth(...)`` for a position-preserving box-sum/mean smoothing
    pre-step; the :class:`~tdfpy.pipeline.WatershedCentroider` additionally
    has its own seed-stabilising smoother via its ``smooth_*_half_width`` fields.

    Returns an ``(N, 3)`` array of ``[mz, intensity, ion_mobility]`` centroids.
    """
    spectrum = read_spectrum(td, frame_id)
    if scan_range is not None:
        spectrum = subset_scans(
            spectrum, scan_num_begin=scan_range[0], scan_num_end=scan_range[1]
        )
    if exclude is not None:
        spectrum = exclude_region(spectrum, exclude, td=td, frame_id=frame_id)
    if smooth is not None:
        spectrum = smooth.apply(spectrum)
    filters = coerce_filters(noise)
    if filters:
        spectrum = apply_noise(spectrum, filters, td=td, frame_id=frame_id)

    if spectrum.empty:
        logger.warning("Frame %d has 0 peaks, returning empty spectrum", frame_id)
        return np.empty((0, 3), dtype=np.float64)

    centroider = centroid if centroid is not None else MergePeaksCentroider()
    centroids = centroider(
        spectrum, td, frame_id, ion_mobility_type=ion_mobility_type
    )
    logger.info(
        "Centroided frame %d: %d raw → %d centroids",
        frame_id, len(spectrum), len(centroids),
    )
    return centroids

merge_peaks

Centroids pre-assembled arrays. The algorithm is a greedy intensity-ordered scan: starting from the highest-intensity raw peak, every neighbouring peak within the m/z and ion mobility tolerances is merged into a single centroid via intensity-weighted averaging. Merged peaks are marked as used and skipped in subsequent iterations.

Parameter Default Notes
mz_tolerance 8.0 Width of the m/z matching window
mz_tolerance_type "ppm" "ppm" or "da"
im_tolerance 0.1 Width of the ion mobility window
im_tolerance_type "relative" "relative" (fraction of 1/K0) or "absolute"
min_peaks 3 Raw peaks required to form a centroid; set to 0 or 1 to keep all
max_peaks None Cap on output peaks (highest-intensity first)
use_numba True Set to False to force the Python fallback
import numpy as np
from tdfpy import merge_peaks

mz  = np.array([500.001, 500.002, 700.005, 700.006, 700.007])
inten = np.array([8000.0,  4000.0,  6000.0,  5000.0,  3000.0])
im  = np.array([0.85,     0.85,    0.92,    0.92,    0.92])

peaks = merge_peaks(mz, inten, im, mz_tolerance=10.0, min_peaks=2)
print(peaks)
# shape (2, 3): two centroided peaks, columns [m/z, intensity, 1/K0]

Noise filtering vs min_peaks

The noise= parameter (available on get_centroided_spectrum, .centroid(), and get_raw_peaks) chains noise filters before the centroider runs — intensity thresholds, the vertical-IM streak filter, or any combination. Intensity-based estimators have a fundamental limitation: they can't distinguish low-abundance real signal from electronic noise. Methods like MadThreshold are anchored to the median of the intensity distribution — if your sample has sparse signal, the threshold can rise above legitimate low-abundance peaks.

A more reliable strategy is to increase min_peaks instead:

# Prefer: raise min_peaks to filter noise without discarding low-abundance signal
peaks = merge_peaks(mz, intensity, im, min_peaks=5)

# Noise arises from single scans; real peaks appear across multiple scans.
# min_peaks=5 means a centroid must be supported by at least 5 raw measurements.

Because electronic noise typically manifests as a singleton in a single scan, requiring several supporting raw peaks is a structural filter — it targets the origin of noise rather than its intensity. The VerticalNoiseFilter extends this idea to the IM axis, requiring peaks to appear as vertical streaks across consecutive mobility scans.

Use intensity-based noise= filters only if you have a calibrated threshold or a method validated for your acquisition; always verify against noise=None first.

tdfpy.merge_peaks

merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    peak_noise_filter: bool = False,
    peak_noise_window: float = 0.1,
    peak_noise_end_fraction: float = 0.1,
    use_numba: bool = True,
) -> np.ndarray

Centroid profile-like peaks using m/z and ion mobility tolerances.

This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.

Parameters:

Name Type Description Default
mz_array np.ndarray

Array of m/z values from raw/profile-like data

required
intensity_array np.ndarray

Array of intensity values

required
ion_mobility_array np.ndarray

Array of ion mobility values (1/K0 or CCS)

required
mz_tolerance float

Tolerance for m/z matching during centroiding

8.0
mz_tolerance_type Literal['ppm', 'da']

Type of m/z tolerance - "ppm" or "da" (daltons)

'ppm'
im_tolerance float

Tolerance for ion mobility matching during centroiding

0.1
im_tolerance_type Literal['relative', 'absolute']

Type of ion mobility tolerance - "relative" or "absolute"

'relative'
min_peaks int

Minimum number of nearby raw peaks required to form a centroid. Set to 0 or 1 to keep all peaks (no filtering).

3
max_peaks int | None

Maximum number of centroided peaks to return (keeps highest intensity)

None
peak_noise_filter bool

If True, after each centroid is formed suppress raw points within ±peak_noise_window Da of the anchor m/z and inside the centroid's IM window whose intensity falls below a linear threshold that decays from the anchor point's raw intensity at zero distance to anchor * peak_noise_end_fraction at the window edge. This kills TOF satellite/ringing noise around bright peaks without eliminating nearby real peaks that exceed the ramp. Comparison is point-to-point against the raw anchor intensity (not the summed centroid). Defaults to False.

False
peak_noise_window float

Half-width in Da on each side of the anchor m/z over which the peak-noise ramp is applied. Defaults to 0.1 Da.

0.1
peak_noise_end_fraction float

Fraction of the anchor's raw intensity used as the suppression threshold at peak_noise_window distance. Defaults to 0.1 (10%).

0.1

Returns:

Type Description
np.ndarray

np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Example
mz = np.array([100.0, 100.001, 200.0])
intensity = np.array([1000.0, 500.0, 2000.0])
im = np.array([0.8, 0.8, 0.9])
peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
Source code in src/tdfpy/centroiding.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    peak_noise_filter: bool = False,
    peak_noise_window: float = 0.1,
    peak_noise_end_fraction: float = 0.1,
    use_numba: bool = True,
) -> np.ndarray:
    """Centroid profile-like peaks using m/z and ion mobility tolerances.

    This function implements a greedy clustering algorithm that centroids raw peaks
    (similar to profile mode data) within specified m/z and ion mobility windows.
    Peaks are processed in descending order of intensity, and nearby peaks are
    combined using intensity-weighted averaging to produce centroided peaks.

    Args:
        mz_array: Array of m/z values from raw/profile-like data
        intensity_array: Array of intensity values
        ion_mobility_array: Array of ion mobility values (1/K0 or CCS)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid.
                  Set to 0 or 1 to keep all peaks (no filtering).
        max_peaks: Maximum number of centroided peaks to return (keeps highest intensity)
        peak_noise_filter: If True, after each centroid is formed suppress raw
            points within ±``peak_noise_window`` Da of the anchor m/z and inside
            the centroid's IM window whose intensity falls below a linear
            threshold that decays from the **anchor point's raw intensity** at
            zero distance to ``anchor * peak_noise_end_fraction`` at the window
            edge. This kills TOF satellite/ringing noise around bright peaks
            without eliminating nearby real peaks that exceed the ramp.
            Comparison is point-to-point against the raw anchor intensity (not
            the summed centroid). Defaults to ``False``.
        peak_noise_window: Half-width in Da on each side of the anchor m/z over
            which the peak-noise ramp is applied. Defaults to ``0.1`` Da.
        peak_noise_end_fraction: Fraction of the anchor's raw intensity used as
            the suppression threshold at ``peak_noise_window`` distance.
            Defaults to ``0.1`` (10%).

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Example:
        ```python
        mz = np.array([100.0, 100.001, 200.0])
        intensity = np.array([1000.0, 500.0, 2000.0])
        im = np.array([0.8, 0.8, 0.9])
        peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
        ```
    """
    # Use Numba implementation if available
    if _HAS_NUMBA and use_numba:
        return _merge_peaks_numba(
            mz_array, intensity_array, ion_mobility_array,
            mz_tolerance=mz_tolerance,
            mz_tolerance_type=mz_tolerance_type,
            im_tolerance=im_tolerance,
            im_tolerance_type=im_tolerance_type,
            min_peaks=min_peaks,
            max_peaks=max_peaks,
            peak_noise_filter=peak_noise_filter,
            peak_noise_window=peak_noise_window,
            peak_noise_end_fraction=peak_noise_end_fraction,
        )

    # Fallback to Python implementation
    return _merge_peaks_python(
        mz_array,
        intensity_array,
        ion_mobility_array,
        mz_tolerance,
        mz_tolerance_type,
        im_tolerance,
        im_tolerance_type,
        min_peaks,
        max_peaks,
        peak_noise_filter,
        peak_noise_window,
        peak_noise_end_fraction,
    )