Centroiding

Functions for extracting and centroiding timsTOF mass spectra. Both functions use a Rust-backed implementation when available, with an automatic Python fallback.

tdfpy.get_centroided_spectrum

get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal[
        "ook0", "ccs", "voltage"
    ] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.05,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None | (
        Literal[
            "mad",
            "percentile",
            "histogram",
            "baseline",
            "iterative_median",
        ]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray

Extract a centroided MS1 spectrum for a single frame.

This function reads raw profile-like scans from the frame, converts indices to m/z values, collects all raw peaks with their ion mobility values, and applies peak centroiding based on m/z and ion mobility tolerances to produce a centroided spectrum.

Parameters:

Name	Type	Description	Default
`td`	`TimsData`	TimsData instance connected to the analysis directory	required
`frame_id`	`int`	Frame ID to extract	required
`spectrum_index`	`int \| None`	Optional index for this spectrum (defaults to frame_id)	`None`
`ion_mobility_type`	`Literal['ook0', 'ccs', 'voltage']`	Type of ion mobility to calculate and include for each peak - "ook0": 1/K0 (reciprocal reduced mobility) [default] - "ccs": Collision Cross Section in Ų (requires charge state estimation)	`'ook0'`
`mz_tolerance`	`float`	Tolerance for m/z matching during centroiding	`8.0`
`mz_tolerance_type`	`Literal['ppm', 'da']`	Type of m/z tolerance - "ppm" or "da" (daltons)	`'ppm'`
`im_tolerance`	`float`	Tolerance for ion mobility matching during centroiding	`0.05`
`im_tolerance_type`	`Literal['relative', 'absolute']`	Type of ion mobility tolerance - "relative" or "absolute"	`'relative'`
`min_peaks`	`int`	Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)	`3`
`max_peaks`	`int \| None`	Maximum number of centroided peaks to return	`None`
`noise_filter`	`None \| (Literal['mad', 'percentile', 'histogram', 'baseline', 'iterative_median'] \| float \| int)`	Noise filtering method to apply before centroiding. Options: - None: No noise filtering (default) - "mad": Median Absolute Deviation method - "percentile": 75th percentile threshold - "histogram": Histogram mode-based estimation - "baseline": Bottom quartile statistics - "iterative_median": Iterative median filtering - float/int: Direct intensity threshold value	`None`

Returns:

Type	Description
`np.ndarray`	np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Raises:

Type	Description
`ValueError`	If the frame_id doesn't exist or is not an MS1 frame
`RuntimeError`	If the TimsData connection is not open

Example

with timsdata_connect('path/to/data.d') as td:
    # Get centroided spectrum with 1/K0 (default)
    peaks = get_centroided_ms1_spectrum(td, frame_id=1)
    print(f"Found {len(peaks)} centroided peaks")

    # Get spectrum with CCS values
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

    # Custom centroiding tolerances
    spectrum = get_centroided_ms1_spectrum(
        td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
    )

    # With noise filtering
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

    # With custom noise threshold
    spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)

Source code in src/tdfpy/centroiding.py

def get_centroided_spectrum(
    td: TimsData,
    frame_id: int,
    spectrum_index: int | None = None,
    ion_mobility_type: Literal["ook0", "ccs", "voltage"] = "ook0",
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.05,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    noise_filter: None
    | (
        Literal["mad", "percentile", "histogram", "baseline", "iterative_median"]
        | float
        | int
    ) = None,
    use_numba: bool = True,
) -> np.ndarray:
    """Extract a centroided MS1 spectrum for a single frame.

    This function reads raw profile-like scans from the frame, converts indices to m/z values,
    collects all raw peaks with their ion mobility values, and applies peak centroiding
    based on m/z and ion mobility tolerances to produce a centroided spectrum.

    Args:
        td: TimsData instance connected to the analysis directory
        frame_id: Frame ID to extract
        spectrum_index: Optional index for this spectrum (defaults to frame_id)
        ion_mobility_type: Type of ion mobility to calculate and include for each peak
                          - "ook0": 1/K0 (reciprocal reduced mobility) [default]
                          - "ccs": Collision Cross Section in Ų (requires charge state estimation)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all)
        max_peaks: Maximum number of centroided peaks to return
        noise_filter: Noise filtering method to apply before centroiding. Options:
                     - None: No noise filtering (default)
                     - "mad": Median Absolute Deviation method
                     - "percentile": 75th percentile threshold
                     - "histogram": Histogram mode-based estimation
                     - "baseline": Bottom quartile statistics
                     - "iterative_median": Iterative median filtering
                     - float/int: Direct intensity threshold value

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Raises:
        ValueError: If the frame_id doesn't exist or is not an MS1 frame
        RuntimeError: If the TimsData connection is not open

    Example:
        ```python
        with timsdata_connect('path/to/data.d') as td:
            # Get centroided spectrum with 1/K0 (default)
            peaks = get_centroided_ms1_spectrum(td, frame_id=1)
            print(f"Found {len(peaks)} centroided peaks")

            # Get spectrum with CCS values
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")

            # Custom centroiding tolerances
            spectrum = get_centroided_ms1_spectrum(
                td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
            )

            # With noise filtering
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")

            # With custom noise threshold
            spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)
        ```
    """
    logger.debug(
        "Extracting MS1 spectrum for frame_id=%d, noise_filter=%s",
        frame_id,
        noise_filter,
    )

    if td.conn is None:
        logger.error("TimsData connection is not open")
        raise RuntimeError("TimsData connection is not open")

    # Get frame metadata from the database
    cursor = td.conn.cursor()
    cursor.execute(
        "SELECT Time, NumScans, MsMsType FROM Frames WHERE Id = ?", (frame_id,)
    )
    result = cursor.fetchone()

    if result is None:
        logger.error("Frame %d not found in database", frame_id)
        raise ValueError(f"Frame {frame_id} not found in database")

    retention_time_sec, num_scans, msms_type = result
    logger.debug(
        "Frame %d metadata: RT=%.2fs, NumScans=%d, MsMsType=%d",
        frame_id,
        retention_time_sec,
        num_scans,
        msms_type,
    )

    # if msms_type != 0:
    #    logger.error("Frame %d is not an MS1 frame (MsMsType=%d)", frame_id, msms_type)
    #    raise ValueError(f"Frame {frame_id} is not an MS1 frame (MsMsType={msms_type})")

    retention_time_min = retention_time_sec / 60.0

    if num_scans == 0:
        logger.warning("Frame %d has 0 scans, returning empty spectrum", frame_id)
        return np.empty((0, 3), dtype=np.float64)

    # Pre-compute ion mobility values for each scan (always required)
    logger.debug(
        "Computing %s ion mobility values for %d scans", ion_mobility_type, num_scans
    )
    ion_mobility = td.scanNumToOneOverK0(frame_id, np.arange(0, num_scans))  # type: ignore[call-arg]

    # Read all scans at once
    logger.debug("Reading %d scans from frame %d", num_scans, frame_id)
    results = td.readScans(frame_id, 0, num_scans)

    # Pre-allocate arrays with estimated size
    total_peaks = sum(len(idx) for idx, _ in results)
    logger.debug(
        "Frame %d contains %d total raw peaks across %d scans",
        frame_id,
        total_peaks,
        num_scans,
    )

    if total_peaks == 0:
        logger.warning("Frame %d has 0 peaks, returning empty spectrum", frame_id)
        return np.empty((0, 3), dtype=np.float64)

    logger.debug("Pre-allocating arrays for %d peaks", total_peaks)
    mz_array = np.empty(total_peaks, dtype=np.float64)
    intensity_array = np.empty(total_peaks, dtype=np.float64)
    ion_mobility_array = np.empty(total_peaks, dtype=np.float64)

    # Collect all peaks from all scans
    offset = 0
    logger.debug("Starting scan iteration and m/z conversion (profile-like raw data)")
    for scan_index, (index_array, intensity_scan) in enumerate(results):
        n_peaks = len(index_array)
        if n_peaks == 0:
            continue

        # Convert indices to m/z in batch
        mz_values = td.indexToMz(frame_id, index_array)

        # Fill pre-allocated arrays
        mz_array[offset : offset + n_peaks] = mz_values
        intensity_array[offset : offset + n_peaks] = intensity_scan
        ion_mobility_array[offset : offset + n_peaks] = ion_mobility[scan_index]
        offset += n_peaks

    # Trim arrays to actual size
    mz_array = mz_array[:offset]
    intensity_array = intensity_array[:offset]
    ion_mobility_array = ion_mobility_array[:offset]
    logger.debug("Collected %d raw profile-like peaks from all scans", offset)

    # Apply noise filtering if requested
    if noise_filter is not None:
        logger.debug("Applying noise filter: %s", noise_filter)
        noise_threshold = estimate_noise_level(intensity_array, method=noise_filter)
        noise_mask = intensity_array >= noise_threshold

        mz_array = mz_array[noise_mask]
        intensity_array = intensity_array[noise_mask]
        ion_mobility_array = ion_mobility_array[noise_mask]

        filtered_count = offset - len(intensity_array)
        logger.info(
            "Noise filtering complete: removed %d peaks below threshold %.2f (%d → %d peaks, %.1f%% removed)",
            filtered_count,
            noise_threshold,
            offset,
            len(intensity_array),
            filtered_count / offset * 100,
        )

    # Convert to CCS if requested
    if ion_mobility_type == "ccs":
        logger.debug("Converting 1/K0 to CCS values (assuming charge +1)")
        # Import conversion function
        ccs_array = np.array(
            [
                oneOverK0ToCCSforMz(ook0, 1, mz)
                for ook0, mz in zip(ion_mobility_array, mz_array)
            ],
            dtype=np.float64,
        )
        ion_mobility_array = ccs_array
        logger.debug("Completed CCS conversion")

    if ion_mobility_type == "voltage":
        logger.debug("Converting 1/K0 to voltage values")
        # scanNumToVoltage
        voltage_array = td.scanNumToVoltage(frame_id, ion_mobility_array)
        ion_mobility_array = voltage_array
        logger.debug("Completed voltage conversion")

    # Apply peak centroiding
    logger.debug("Starting peak centroiding algorithm")
    peaks = merge_peaks(
        mz_array=mz_array,
        intensity_array=intensity_array,
        ion_mobility_array=ion_mobility_array,
        mz_tolerance=mz_tolerance,
        mz_tolerance_type=mz_tolerance_type,
        im_tolerance=im_tolerance,
        im_tolerance_type=im_tolerance_type,
        min_peaks=min_peaks,
        max_peaks=max_peaks,
        use_numba=use_numba,
    )

    # Apply max_peaks limit if specified (post-centroiding)
    if max_peaks and len(peaks) > max_peaks:
        logger.debug("Applying max_peaks filter: %d → %d", len(peaks), max_peaks)
        # Sort by intensity (column 1) and take top N
        # argsort is ascending, so we take from the end [::-1]
        sort_indices = np.argsort(peaks[:, 1])[::-1][:max_peaks]
        peaks = peaks[sort_indices]

    logger.info(
        "Extracted centroided MS1 spectrum: frame_id=%d, RT=%.2f min, centroided_peaks=%d, raw_peaks=%d, ion_mobility_type=%s",
        frame_id,
        retention_time_min,
        len(peaks),
        total_peaks,
        ion_mobility_type,
    )

    return peaks

tdfpy.merge_peaks

merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.05,
    im_tolerance_type: Literal[
        "relative", "absolute"
    ] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    use_numba: bool = True,
) -> np.ndarray

Centroid profile-like peaks using m/z and ion mobility tolerances.

This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.

Parameters:

Name	Type	Description	Default
`mz_array`	`np.ndarray`	Array of m/z values from raw/profile-like data	required
`intensity_array`	`np.ndarray`	Array of intensity values	required
`ion_mobility_array`	`np.ndarray`	Array of ion mobility values (1/K0 or CCS)	required
`mz_tolerance`	`float`	Tolerance for m/z matching during centroiding	`8.0`
`mz_tolerance_type`	`Literal['ppm', 'da']`	Type of m/z tolerance - "ppm" or "da" (daltons)	`'ppm'`
`im_tolerance`	`float`	Tolerance for ion mobility matching during centroiding	`0.05`
`im_tolerance_type`	`Literal['relative', 'absolute']`	Type of ion mobility tolerance - "relative" or "absolute"	`'relative'`
`min_peaks`	`int`	Minimum number of nearby raw peaks required to form a centroid. Set to 0 or 1 to keep all peaks (no filtering).	`3`
`max_peaks`	`int \| None`	Maximum number of centroided peaks to return (keeps highest intensity)	`None`

Returns:

Type	Description
`np.ndarray`	np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility]

Example

mz = np.array([100.0, 100.001, 200.0])
intensity = np.array([1000.0, 500.0, 2000.0])
im = np.array([0.8, 0.8, 0.9])
peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")

Source code in src/tdfpy/centroiding.py

def merge_peaks(
    mz_array: np.ndarray,
    intensity_array: np.ndarray,
    ion_mobility_array: np.ndarray,
    mz_tolerance: float = 8.0,
    mz_tolerance_type: Literal["ppm", "da"] = "ppm",
    im_tolerance: float = 0.05,
    im_tolerance_type: Literal["relative", "absolute"] = "relative",
    min_peaks: int = 3,
    max_peaks: int | None = None,
    use_numba: bool = True,
) -> np.ndarray:
    """Centroid profile-like peaks using m/z and ion mobility tolerances.

    This function implements a greedy clustering algorithm that centroids raw peaks
    (similar to profile mode data) within specified m/z and ion mobility windows.
    Peaks are processed in descending order of intensity, and nearby peaks are
    combined using intensity-weighted averaging to produce centroided peaks.

    Args:
        mz_array: Array of m/z values from raw/profile-like data
        intensity_array: Array of intensity values
        ion_mobility_array: Array of ion mobility values (1/K0 or CCS)
        mz_tolerance: Tolerance for m/z matching during centroiding
        mz_tolerance_type: Type of m/z tolerance - "ppm" or "da" (daltons)
        im_tolerance: Tolerance for ion mobility matching during centroiding
        im_tolerance_type: Type of ion mobility tolerance - "relative" or "absolute"
        min_peaks: Minimum number of nearby raw peaks required to form a centroid.
                  Set to 0 or 1 to keep all peaks (no filtering).
        max_peaks: Maximum number of centroided peaks to return (keeps highest intensity)

    Returns:
        np.ndarray: Array of shape (N, 3) containing centroided peaks.
                   Columns are: [mz, intensity, ion_mobility]

    Example:
        ```python
        mz = np.array([100.0, 100.001, 200.0])
        intensity = np.array([1000.0, 500.0, 2000.0])
        im = np.array([0.8, 0.8, 0.9])
        peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
        ```
    """
    # Use Numba implementation if available
    if _HAS_NUMBA and use_numba:
        return _merge_peaks_numba(
            mz_array, intensity_array, ion_mobility_array,
            mz_tolerance=mz_tolerance,
            mz_tolerance_type=mz_tolerance_type,
            im_tolerance=im_tolerance,
            im_tolerance_type=im_tolerance_type,
            min_peaks=min_peaks,
            max_peaks=max_peaks,
        )

    # Fallback to Python implementation
    return _merge_peaks_python(
        mz_array,
        intensity_array,
        ion_mobility_array,
        mz_tolerance,
        mz_tolerance_type,
        im_tolerance,
        im_tolerance_type,
        min_peaks,
        max_peaks,
    )