Centroiding

timsTOF raw data is profile-like: the binary file stores one intensity value per scan per m/z index, spread across hundreds of mobility bins. Centroiding collapses that cloud of raw measurements into a compact list of peaks — each with a single m/z, intensity, and ion mobility value.

tdfpy provides two centroiding functions:

get_centroided_spectrum — high-level: reads a full frame from disk, applies optional noise filtering, and returns centroided peaks in one call.
merge_peaks — low-level: centroids pre-assembled NumPy arrays of m/z, intensity, and ion mobility values. Use this when you already have the raw arrays or need fine-grained control.

In practice, most workflows should call .centroid() directly on a Frame, DiaWindow, or PrmTransition object — that method delegates to get_centroided_spectrum internally.

Numba JIT backend

When Numba is installed (it is included in the default tdfpy dependencies), the core clustering loop runs as a JIT-compiled native function (_merge_peaks_numba_kernel). This is typically 5–20× faster than the pure-Python fallback for large frames. The backend is selected automatically:

# Numba used if available (default)
peaks = merge_peaks(mz, intensity, im)

# Force the Python fallback (useful for debugging or environments without Numba)
peaks = merge_peaks(mz, intensity, im, use_numba=False)

The first call after import triggers Numba's JIT compilation — expect a few seconds of overhead. Subsequent calls use the cached compiled kernel.

`get_centroided_spectrum`

Reads frame frame_id from the open TimsData connection, converts m/z indices to m/z values, assembles the raw peak arrays, optionally filters noise, and runs centroiding.

from tdfpy import timsdata_connect, get_centroided_spectrum

with timsdata_connect("experiment.d") as td:
    # Default: 1/K0 ion mobility, 8 ppm m/z tolerance
    peaks = get_centroided_spectrum(td, frame_id=1)
    print(peaks.shape)   # (N, 3) — columns: [m/z, intensity, 1/K0]

    # Tighter tolerances, CCS instead of 1/K0
    peaks = get_centroided_spectrum(
        td,
        frame_id=1,
        ion_mobility_type="ccs",
        mz_tolerance=5.0,
        im_tolerance=0.03,
    )

    # Noise filtering before centroiding
    peaks = get_centroided_spectrum(
        td,
        frame_id=1,
        noise_filter="mad",   # median absolute deviation
    )

    # Hard intensity threshold
    peaks = get_centroided_spectrum(td, frame_id=1, noise_filter=500.0)

Available noise_filter options: "mad", "percentile", "histogram", "baseline", "iterative_median", or any float/int as a direct threshold. Pass None to skip filtering (the default).

tdfpy.get_centroided_spectrum

get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal[
        "ook0", "ccs", "voltage"
    ] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None | (
        Literal[
            "mad",
            "percentile",
            "histogram",
            "baseline",
            "iterative_median",
        ]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray

Extract a centroided MS1 spectrum for a single frame.

This function reads raw profile-like scans from the frame, converts indices to m/z values, collects all raw peaks with their ion mobility values, and applies peak centroiding based on m/z and ion mobility tolerances to produce a centroided spectrum.

Parameters:

Name	Type	Description	Default
`td`	`TimsData`	TimsData instance connected to the analysis directory	required
`frame_id`	`int`	Frame ID to extract	required
`spectrum_index`	`int \| None`	Optional index for this spectrum (defaults to frame_id)	`None`
`ion_mobility_type`	`Literal['ook0', 'ccs', 'voltage']`	Type of ion mobility to calculate and include for each peak - "ook0": 1/K0 (reciprocal reduced mobility) [default] - "ccs": Collision Cross Section in Ų (requires charge state estimation)	`'ook0'`
`mz_tolerance`	`float`	Tolerance for m/z matching during centroiding	`8.0`
`mz_tolerance_type`	`Literal['ppm', 'da']`	Type of m/z tolerance - "ppm" or "da" (daltons)	`'ppm'`
`im_tolerance`	`float`	Tolerance for ion mobility matching during centroiding	`0.1`
`im_tolerance_type`	`Literal['relative', 'absolute']`	Type of ion mobility tolerance - "relative" or "absolute"	`'relative'`
`min_peaks`	`int`	Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)	`3`
`max_peaks`	`int \| None`	Maximum number of centroided peaks to return	`None`
`noise_filter`	`None \| (Literal['mad', 'percentile', 'histogram', 'baseline', 'iterative_median'] \| float \| int)`	Noise filtering method to apply before centroiding. Options: - None: No noise filtering (default) - "mad": Median Absolute Deviation method - "percentile": 75th percentile threshold - "histogram": Histogram mode-based estimation - "baseline": Bottom quartile statistics - "iterative_median": Iterative median filtering - float/int: Direct intensity threshold value	`None`

Returns:

Type	Description
`np.ndarray`	np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Raises:

Type	Description
`ValueError`	If the frame_id doesn't exist or is not an MS1 frame
`RuntimeError`	If the TimsData connection is not open

Example

with timsdata_connect('path/to/data.d') as td:
    # Get centroided spectrum with 1/K0 (default)
    peaks = get_centroided_ms1_spectrum(td, frame_id=1)
    print(f"Found {len(peaks)} centroided peaks")

    # Get spectrum with CCS values
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

    # Custom centroiding tolerances
    spectrum = get_centroided_ms1_spectrum(
        td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
    )

    # With noise filtering
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

    # With custom noise threshold
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)

Source code in src/tdfpy/centroiding.py

def get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal["ook0", "ccs", "voltage"] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None
    | (
        Literal["mad", "percentile", "histogram", "baseline", "iterative_median"]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray:
    """Extract a centroided MS1 spectrum for a single frame.

    This function reads raw profile-like scans from the frame, converts indices to m/z values,
    collects all raw peaks with their ion mobility values, and applies peak centroiding
    based on m/z and ion mobility tolerances to produce a centroided spectrum.

    Args:
        td: TimsData instance connected to the analysis directory
        frame_id: Frame ID to extract
        spectrum_index: Optional index for this spectrum (defaults to frame_id)
        ion_mobility_type: Type of ion mobility to calculate and include for each peak
                          - "ook0": 1/K0 (reciprocal reduced mobility) [default]
                          - "ccs": Collision Cross Section in Ų (requires charge state estimation)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)
        max_peaks: Maximum number of centroided peaks to return
        noise_filter: Noise filtering method to apply before centroiding. Options:
                     - None: No noise filtering (default)
                     - "mad": Median Absolute Deviation method
                     - "percentile": 75th percentile threshold
                     - "histogram": Histogram mode-based estimation
                     - "baseline": Bottom quartile statistics
                     - "iterative_median": Iterative median filtering
                     - float/int: Direct intensity threshold value

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Raises:
        ValueError: If the frame_id doesn't exist or is not an MS1 frame
        RuntimeError: If the TimsData connection is not open

    Example:
        ```python
        with timsdata_connect('path/to/data.d') as td:
            # Get centroided spectrum with 1/K0 (default)
            peaks = get_centroided_ms1_spectrum(td, frame_id=1)
            print(f"Found {len(peaks)} centroided peaks")

            # Get spectrum with CCS values
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

            # Custom centroiding tolerances
            spectrum = get_centroided_ms1_spectrum(
                td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
            )

            # With noise filtering
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

            # With custom noise threshold
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)
        ```
    """
    logger.debug(
        "Extracting MS1 spectrum for frame_id=%d, noise_filter=%s",
        frame_id,
        noise_filter,
    )

    raw = get_raw_peaks(td, frame_id, ion_mobility_type=ion_mobility_type)
    if len(raw) == 0:
        logger.warning("Frame %d has 0 peaks, returning empty spectrum", frame_id)
        return raw

    mz_array = raw[:, 0]
    intensity_array = raw[:, 1]
    ion_mobility_array = raw[:, 2]
    total_peaks = len(raw)

    cursor = td.conn.cursor()  # type: ignore[union-attr]
    cursor.execute("SELECT Time FROM Frames WHERE Id = ?", (frame_id,))
    retention_time_min = cursor.fetchone()[0] / 60.0

    # Apply peak centroiding
    logger.debug("Starting peak centroiding algorithm")
    peaks = merge_peaks(
        mz_array=mz_array,
        intensity_array=intensity_array,
        ion_mobility_array=ion_mobility_array,
        mz_tolerance=mz_tolerance,
        mz_tolerance_type=mz_tolerance_type,
        im_tolerance=im_tolerance,
        im_tolerance_type=im_tolerance_type,
        min_peaks=min_peaks,
        max_peaks=max_peaks,
        use_numba=use_numba,
    )

    # Apply noise filter on centroided intensities — after merging so the
    # summed cluster intensity is compared against the threshold, not individual
    # raw hit intensities.
    if noise_filter is not None and len(peaks) > 0:
        logger.debug("Applying noise filter to centroided peaks: %s", noise_filter)
        threshold = estimate_noise_level(peaks[:, 1], method=noise_filter)
        before = len(peaks)
        peaks = peaks[peaks[:, 1] >= threshold]
        logger.info(
            "Noise filter removed %d centroided peaks below threshold %.2f (%d → %d)",
            before - len(peaks), threshold, before, len(peaks),
        )

    # Apply max_peaks limit if specified (post-centroiding)
    if max_peaks and len(peaks) > max_peaks:
        logger.debug("Applying max_peaks filter: %d → %d", len(peaks), max_peaks)
        sort_indices = np.argsort(peaks[:, 1])[::-1][:max_peaks]
        peaks = peaks[sort_indices]

    logger.info(
        "Extracted centroided MS1 spectrum: frame_id=%d, RT=%.2f min, centroided_peaks=%d, raw_peaks=%d, ion_mobility_type=%s",
        frame_id,
        retention_time_min,
        len(peaks),
        total_peaks,
        ion_mobility_type,
    )

    return peaks

`merge_peaks`

Centroids pre-assembled arrays. The algorithm is a greedy intensity-ordered scan: starting from the highest-intensity raw peak, every neighbouring peak within the m/z and ion mobility tolerances is merged into a single centroid via intensity-weighted averaging. Merged peaks are marked as used and skipped in subsequent iterations.

Parameter	Default	Notes
`mz_tolerance`	`8.0`	Width of the m/z matching window
`mz_tolerance_type`	`"ppm"`	`"ppm"` or `"da"`
`im_tolerance`	`0.1`	Width of the ion mobility window
`im_tolerance_type`	`"relative"`	`"relative"` (fraction of 1/K0) or `"absolute"`
`min_peaks`	`3`	Raw peaks required to form a centroid; set to `0` or `1` to keep all
`max_peaks`	`None`	Cap on output peaks (highest-intensity first)
`use_numba`	`True`	Set to `False` to force the Python fallback

import numpy as np
from tdfpy import merge_peaks

mz  = np.array([500.001, 500.002, 700.005, 700.006, 700.007])
inten = np.array([8000.0,  4000.0,  6000.0,  5000.0,  3000.0])
im  = np.array([0.85,     0.85,    0.92,    0.92,    0.92])

peaks = merge_peaks(mz, inten, im, mz_tolerance=10.0, min_peaks=2)
print(peaks)
# shape (2, 3): two centroided peaks, columns [m/z, intensity, 1/K0]

Noise filtering vs `min_peaks`

The noise_filter parameter (available on both get_centroided_spectrum and .centroid()) estimates a threshold from the intensity distribution and discards centroids below it. While convenient, intensity-based noise estimation has a fundamental limitation: it cannot distinguish low-abundance real signal from electronic noise, and will discard both equally. Methods such as "mad" are anchored to the median of all centroid intensities — if your sample has sparse signal, the threshold can rise above legitimate low-abundance peaks.

A more reliable strategy is to increase min_peaks instead:

# Prefer: raise min_peaks to filter noise without discarding low-abundance signal
peaks = merge_peaks(mz, intensity, im, min_peaks=5)

# Noise arises from single scans; real peaks appear across multiple scans.
# min_peaks=5 means a centroid must be supported by at least 5 raw measurements.

Because electronic noise typically manifests as a singleton in a single scan, requiring several supporting raw peaks is a structural filter — it targets the origin of noise rather than its intensity. Low-abundance real peaks that appear consistently across scans will survive, whereas noise will not.

Use noise_filter only if you have a calibrated threshold or a specific method validated for your acquisition settings, and always verify against a noise_filter=None baseline first.

tdfpy.merge_peaks

merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.1,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    use_numba: bool = True,
) -> np.ndarray

Centroid profile-like peaks using m/z and ion mobility tolerances.

This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.