Skip to content

Centroiding

timsTOF raw data is profile-like: the binary file stores one intensity value per scan per m/z index, spread across hundreds of mobility bins. Centroiding collapses that cloud of raw measurements into a compact list of peaks — each with a single m/z, intensity, and ion mobility value.

tdfpy provides two centroiding functions:

  • get_centroided_spectrum — high-level: reads a full frame from disk, applies optional noise filtering, and returns centroided peaks in one call.
  • merge_peaks — low-level: centroids pre-assembled NumPy arrays of m/z, intensity, and ion mobility values. Use this when you already have the raw arrays or need fine-grained control.

In practice, most workflows should call .centroid() directly on a Frame, DiaWindow, or PrmTransition object — that method delegates to get_centroided_spectrum internally.

Numba JIT backend

When Numba is installed (it is included in the default tdfpy dependencies), the core clustering loop runs as a JIT-compiled native function (_merge_peaks_numba_kernel). This is typically 5–20× faster than the pure-Python fallback for large frames. The backend is selected automatically:

# Numba used if available (default)
peaks = merge_peaks(mz, intensity, im)

# Force the Python fallback (useful for debugging or environments without Numba)
peaks = merge_peaks(mz, intensity, im, use_numba=False)

The first call after import triggers Numba's JIT compilation — expect a few seconds of overhead. Subsequent calls use the cached compiled kernel.


get_centroided_spectrum

Reads frame frame_id from the open TimsData connection, converts m/z indices to m/z values, assembles the raw peak arrays, optionally filters noise, and runs centroiding.

from tdfpy import timsdata_connect, get_centroided_spectrum

with timsdata_connect("experiment.d") as td:
    # Default: 1/K0 ion mobility, 8 ppm m/z tolerance
    peaks = get_centroided_spectrum(td, frame_id=1)
    print(peaks.shape)   # (N, 3) — columns: [m/z, intensity, 1/K0]

    # Tighter tolerances, CCS instead of 1/K0
    peaks = get_centroided_spectrum(
        td,
        frame_id=1,
        ion_mobility_type="ccs",
        mz_tolerance=5.0,
        im_tolerance=0.03,
    )

    # Noise filtering before centroiding
    peaks = get_centroided_spectrum(
        td,
        frame_id=1,
        noise_filter="mad",   # median absolute deviation
    )

    # Hard intensity threshold
    peaks = get_centroided_spectrum(td, frame_id=1, noise_filter=500.0)

Available noise_filter options: "mad", "percentile", "histogram", "baseline", "iterative_median", or any float/int as a direct threshold. Pass None to skip filtering (the default).

tdfpy.get_centroided_spectrum

get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal[
        "ook0", "ccs", "voltage"
    ] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None | (
        Literal[
            "mad",
            "percentile",
            "histogram",
            "baseline",
            "iterative_median",
        ]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray

Extract a centroided MS1 spectrum for a single frame.

This function reads raw profile-like scans from the frame, converts indices to m/z values, collects all raw peaks with their ion mobility values, and applies peak centroiding based on m/z and ion mobility tolerances to produce a centroided spectrum.

Parameters:

Name Type Description Default
td TimsData

TimsData instance connected to the analysis directory

required
frame_id int

Frame ID to extract

required
spectrum_index int | None

Optional index for this spectrum (defaults to frame_id)

None
ion_mobility_type Literal['ook0', 'ccs', 'voltage']

Type of ion mobility to calculate and include for each peak - "ook0": 1/K0 (reciprocal reduced mobility) [default] - "ccs": Collision Cross Section in Ų (requires charge state estimation)

'ook0'
mz_tolerance float

Tolerance for m/z matching during centroiding

8.0
mz_tolerance_type Literal['ppm', 'da']

Type of m/z tolerance - "ppm" or "da" (daltons)

'ppm'
im_tolerance float

Tolerance for ion mobility matching during centroiding

0.1
im_tolerance_type Literal['relative', 'absolute']

Type of ion mobility tolerance - "relative" or "absolute"

'relative'
min_peaks int

Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)

3
max_peaks int | None

Maximum number of centroided peaks to return

None
noise_filter None | (Literal['mad', 'percentile', 'histogram', 'baseline', 'iterative_median'] | float | int)

Noise filtering method to apply before centroiding. Options: - None: No noise filtering (default) - "mad": Median Absolute Deviation method - "percentile": 75th percentile threshold - "histogram": Histogram mode-based estimation - "baseline": Bottom quartile statistics - "iterative_median": Iterative median filtering - float/int: Direct intensity threshold value

None

Returns:

Type Description
np.ndarray

np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Raises:

Type Description
ValueError

If the frame_id doesn't exist or is not an MS1 frame

RuntimeError

If the TimsData connection is not open

Example
with timsdata_connect('path/to/data.d') as td:
    # Get centroided spectrum with 1/K0 (default)
    peaks = get_centroided_ms1_spectrum(td, frame_id=1)
    print(f"Found {len(peaks)} centroided peaks")

    # Get spectrum with CCS values
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

    # Custom centroiding tolerances
    spectrum = get_centroided_ms1_spectrum(
        td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
    )

    # With noise filtering
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

    # With custom noise threshold
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)
Source code in src/tdfpy/centroiding.py
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
def get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal["ook0", "ccs", "voltage"] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None
    | (
        Literal["mad", "percentile", "histogram", "baseline", "iterative_median"]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray:
    """Extract a centroided MS1 spectrum for a single frame.

    This function reads raw profile-like scans from the frame, converts indices to m/z values,
    collects all raw peaks with their ion mobility values, and applies peak centroiding
    based on m/z and ion mobility tolerances to produce a centroided spectrum.

    Args:
        td: TimsData instance connected to the analysis directory
        frame_id: Frame ID to extract
        spectrum_index: Optional index for this spectrum (defaults to frame_id)
        ion_mobility_type: Type of ion mobility to calculate and include for each peak
                          - "ook0": 1/K0 (reciprocal reduced mobility) [default]
                          - "ccs": Collision Cross Section in Ų (requires charge state estimation)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)
        max_peaks: Maximum number of centroided peaks to return
        noise_filter: Noise filtering method to apply before centroiding. Options:
                     - None: No noise filtering (default)
                     - "mad": Median Absolute Deviation method
                     - "percentile": 75th percentile threshold
                     - "histogram": Histogram mode-based estimation
                     - "baseline": Bottom quartile statistics
                     - "iterative_median": Iterative median filtering
                     - float/int: Direct intensity threshold value

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Raises:
        ValueError: If the frame_id doesn't exist or is not an MS1 frame
        RuntimeError: If the TimsData connection is not open

    Example:
        ```python
        with timsdata_connect('path/to/data.d') as td:
            # Get centroided spectrum with 1/K0 (default)
            peaks = get_centroided_ms1_spectrum(td, frame_id=1)
            print(f"Found {len(peaks)} centroided peaks")

            # Get spectrum with CCS values
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

            # Custom centroiding tolerances
            spectrum = get_centroided_ms1_spectrum(
                td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
            )

            # With noise filtering
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

            # With custom noise threshold
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)
        ```
    """
    logger.debug(
        "Extracting MS1 spectrum for frame_id=%d, noise_filter=%s",
        frame_id,
        noise_filter,
    )

    raw = get_raw_peaks(td, frame_id, ion_mobility_type=ion_mobility_type)
    if len(raw) == 0:
        logger.warning("Frame %d has 0 peaks, returning empty spectrum", frame_id)
        return raw

    mz_array = raw[:, 0]
    intensity_array = raw[:, 1]
    ion_mobility_array = raw[:, 2]
    total_peaks = len(raw)

    cursor = td.conn.cursor()  # type: ignore[union-attr]
    cursor.execute("SELECT Time FROM Frames WHERE Id = ?", (frame_id,))
    retention_time_min = cursor.fetchone()[0] / 60.0

    # Apply peak centroiding
    logger.debug("Starting peak centroiding algorithm")
    peaks = merge_peaks(
        mz_array=mz_array,
        intensity_array=intensity_array,
        ion_mobility_array=ion_mobility_array,
        mz_tolerance=mz_tolerance,
        mz_tolerance_type=mz_tolerance_type,
        im_tolerance=im_tolerance,
        im_tolerance_type=im_tolerance_type,
        min_peaks=min_peaks,
        max_peaks=max_peaks,
        use_numba=use_numba,
    )

    # Apply noise filter on centroided intensities — after merging so the
    # summed cluster intensity is compared against the threshold, not individual
    # raw hit intensities.
    if noise_filter is not None and len(peaks) > 0:
        logger.debug("Applying noise filter to centroided peaks: %s", noise_filter)
        threshold = estimate_noise_level(peaks[:, 1], method=noise_filter)
        before = len(peaks)
        peaks = peaks[peaks[:, 1] >= threshold]
        logger.info(
            "Noise filter removed %d centroided peaks below threshold %.2f (%d%d)",
            before - len(peaks), threshold, before, len(peaks),
        )

    # Apply max_peaks limit if specified (post-centroiding)
    if max_peaks and len(peaks) > max_peaks:
        logger.debug("Applying max_peaks filter: %d%d", len(peaks), max_peaks)
        sort_indices = np.argsort(peaks[:, 1])[::-1][:max_peaks]
        peaks = peaks[sort_indices]

    logger.info(
        "Extracted centroided MS1 spectrum: frame_id=%d, RT=%.2f min, centroided_peaks=%d, raw_peaks=%d, ion_mobility_type=%s",
        frame_id,
        retention_time_min,
        len(peaks),
        total_peaks,
        ion_mobility_type,
    )

    return peaks

merge_peaks

Centroids pre-assembled arrays. The algorithm is a greedy intensity-ordered scan: starting from the highest-intensity raw peak, every neighbouring peak within the m/z and ion mobility tolerances is merged into a single centroid via intensity-weighted averaging. Merged peaks are marked as used and skipped in subsequent iterations.

Parameter Default Notes
mz_tolerance 8.0 Width of the m/z matching window
mz_tolerance_type "ppm" "ppm" or "da"
im_tolerance 0.1 Width of the ion mobility window
im_tolerance_type "relative" "relative" (fraction of 1/K0) or "absolute"
min_peaks 3 Raw peaks required to form a centroid; set to 0 or 1 to keep all
max_peaks None Cap on output peaks (highest-intensity first)
use_numba True Set to False to force the Python fallback
import numpy as np
from tdfpy import merge_peaks

mz  = np.array([500.001, 500.002, 700.005, 700.006, 700.007])
inten = np.array([8000.0,  4000.0,  6000.0,  5000.0,  3000.0])
im  = np.array([0.85,     0.85,    0.92,    0.92,    0.92])

peaks = merge_peaks(mz, inten, im, mz_tolerance=10.0, min_peaks=2)
print(peaks)
# shape (2, 3): two centroided peaks, columns [m/z, intensity, 1/K0]

Noise filtering vs min_peaks

The noise_filter parameter (available on both get_centroided_spectrum and .centroid()) estimates a threshold from the intensity distribution and discards centroids below it. While convenient, intensity-based noise estimation has a fundamental limitation: it cannot distinguish low-abundance real signal from electronic noise, and will discard both equally. Methods such as "mad" are anchored to the median of all centroid intensities — if your sample has sparse signal, the threshold can rise above legitimate low-abundance peaks.

A more reliable strategy is to increase min_peaks instead:

# Prefer: raise min_peaks to filter noise without discarding low-abundance signal
peaks = merge_peaks(mz, intensity, im, min_peaks=5)

# Noise arises from single scans; real peaks appear across multiple scans.
# min_peaks=5 means a centroid must be supported by at least 5 raw measurements.

Because electronic noise typically manifests as a singleton in a single scan, requiring several supporting raw peaks is a structural filter — it targets the origin of noise rather than its intensity. Low-abundance real peaks that appear consistently across scans will survive, whereas noise will not.

Use noise_filter only if you have a calibrated threshold or a specific method validated for your acquisition settings, and always verify against a noise_filter=None baseline first.

tdfpy.merge_peaks

merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    use_numba: bool = True,
) -> np.ndarray

Centroid profile-like peaks using m/z and ion mobility tolerances.

This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.

Parameters:

Name Type Description Default
mz_array np.ndarray

Array of m/z values from raw/profile-like data

required
intensity_array np.ndarray

Array of intensity values

required
ion_mobility_array np.ndarray

Array of ion mobility values (1/K0 or CCS)

required
mz_tolerance float

Tolerance for m/z matching during centroiding

8.0
mz_tolerance_type Literal['ppm', 'da']

Type of m/z tolerance - "ppm" or "da" (daltons)

'ppm'
im_tolerance float

Tolerance for ion mobility matching during centroiding

0.1
im_tolerance_type Literal['relative', 'absolute']

Type of ion mobility tolerance - "relative" or "absolute"

'relative'
min_peaks int

Minimum number of nearby raw peaks required to form a centroid. Set to 0 or 1 to keep all peaks (no filtering).

3
max_peaks int | None

Maximum number of centroided peaks to return (keeps highest intensity)

None

Returns:

Type Description
np.ndarray

np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Example
mz = np.array([100.0, 100.001, 200.0])
intensity = np.array([1000.0, 500.0, 2000.0])
im = np.array([0.8, 0.8, 0.9])
peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
Source code in src/tdfpy/centroiding.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    use_numba: bool = True,
) -> np.ndarray:
    """Centroid profile-like peaks using m/z and ion mobility tolerances.

    This function implements a greedy clustering algorithm that centroids raw peaks
    (similar to profile mode data) within specified m/z and ion mobility windows.
    Peaks are processed in descending order of intensity, and nearby peaks are
    combined using intensity-weighted averaging to produce centroided peaks.

    Args:
        mz_array: Array of m/z values from raw/profile-like data
        intensity_array: Array of intensity values
        ion_mobility_array: Array of ion mobility values (1/K0 or CCS)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid.
                  Set to 0 or 1 to keep all peaks (no filtering).
        max_peaks: Maximum number of centroided peaks to return (keeps highest intensity)

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Example:
        ```python
        mz = np.array([100.0, 100.001, 200.0])
        intensity = np.array([1000.0, 500.0, 2000.0])
        im = np.array([0.8, 0.8, 0.9])
        peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
        ```
    """
    # Use Numba implementation if available
    if _HAS_NUMBA and use_numba:
        return _merge_peaks_numba(
            mz_array, intensity_array, ion_mobility_array,
            mz_tolerance=mz_tolerance,
            mz_tolerance_type=mz_tolerance_type,
            im_tolerance=im_tolerance,
            im_tolerance_type=im_tolerance_type,
            min_peaks=min_peaks,
            max_peaks=max_peaks,
        )

    # Fallback to Python implementation
    return _merge_peaks_python(
        mz_array,
        intensity_array,
        ion_mobility_array,
        mz_tolerance,
        mz_tolerance_type,
        im_tolerance,
        im_tolerance_type,
        min_peaks,
        max_peaks,
    )