Centroiding
timsTOF raw data is profile-like: the binary file stores one intensity value per scan per m/z index, spread across hundreds of mobility bins. Centroiding collapses that cloud of raw measurements into a compact list of peaks — each with a single m/z, intensity, and ion mobility value.
tdfpy provides two centroiding functions:
get_centroided_spectrum— high-level: reads a full frame from disk, applies optional noise filtering, and returns centroided peaks in one call.merge_peaks— low-level: centroids pre-assembled NumPy arrays of m/z, intensity, and ion mobility values. Use this when you already have the raw arrays or need fine-grained control.
In practice, most workflows should call .centroid() directly on a Frame, DiaWindow, or
PrmTransition object — that method delegates to get_centroided_spectrum internally.
Numba JIT backend
When Numba is installed (it is included in the default
tdfpy dependencies), the core clustering loop runs as a JIT-compiled native function
(_merge_peaks_numba_kernel). This is typically 5–20× faster than the pure-Python
fallback for large frames. The backend is selected automatically:
# Numba used if available (default)
peaks = merge_peaks(mz, intensity, im)
# Force the Python fallback (useful for debugging or environments without Numba)
peaks = merge_peaks(mz, intensity, im, use_numba=False)
The first call after import triggers Numba's JIT compilation — expect a few seconds of overhead. Subsequent calls use the cached compiled kernel.
get_centroided_spectrum
Reads frame frame_id from the open TimsData connection, converts m/z indices to
m/z values, assembles the raw peak arrays, optionally filters noise, and runs centroiding.
from tdfpy import timsdata_connect, get_centroided_spectrum
with timsdata_connect("experiment.d") as td:
# Default: 1/K0 ion mobility, 8 ppm m/z tolerance
peaks = get_centroided_spectrum(td, frame_id=1)
print(peaks.shape) # (N, 3) — columns: [m/z, intensity, 1/K0]
# Tighter tolerances, CCS instead of 1/K0
peaks = get_centroided_spectrum(
td,
frame_id=1,
ion_mobility_type="ccs",
mz_tolerance=5.0,
im_tolerance=0.03,
)
# Noise filtering before centroiding
peaks = get_centroided_spectrum(
td,
frame_id=1,
noise_filter="mad", # median absolute deviation
)
# Hard intensity threshold
peaks = get_centroided_spectrum(td, frame_id=1, noise_filter=500.0)
Available noise_filter options: "mad", "percentile", "histogram", "baseline",
"iterative_median", or any float/int as a direct threshold. Pass None to skip
filtering (the default).
tdfpy.get_centroided_spectrum
get_centroided_spectrum(
td: TimsData,
frame_id: int,
spectrum_index: int | None = None,
ion_mobility_type: Literal[
"ook0", "ccs", "voltage"
] = "ook0",
mz_tolerance: float = 8.0,
mz_tolerance_type: Literal["ppm", "da"] = "ppm",
im_tolerance: float = 0.1,
im_tolerance_type: Literal[
"relative", "absolute"
] = "relative",
min_peaks: int = 3,
max_peaks: int | None = None,
noise_filter: None | (
Literal[
"mad",
"percentile",
"histogram",
"baseline",
"iterative_median",
]
| float
| int
) = None,
use_numba: bool = True,
) -> np.ndarray
Extract a centroided MS1 spectrum for a single frame.
This function reads raw profile-like scans from the frame, converts indices to m/z values, collects all raw peaks with their ion mobility values, and applies peak centroiding based on m/z and ion mobility tolerances to produce a centroided spectrum.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
td
|
TimsData
|
TimsData instance connected to the analysis directory |
required |
frame_id
|
int
|
Frame ID to extract |
required |
spectrum_index
|
int | None
|
Optional index for this spectrum (defaults to frame_id) |
None
|
ion_mobility_type
|
Literal['ook0', 'ccs', 'voltage']
|
Type of ion mobility to calculate and include for each peak - "ook0": 1/K0 (reciprocal reduced mobility) [default] - "ccs": Collision Cross Section in Ų (requires charge state estimation) |
'ook0'
|
mz_tolerance
|
float
|
Tolerance for m/z matching during centroiding |
8.0
|
mz_tolerance_type
|
Literal['ppm', 'da']
|
Type of m/z tolerance - "ppm" or "da" (daltons) |
'ppm'
|
im_tolerance
|
float
|
Tolerance for ion mobility matching during centroiding |
0.1
|
im_tolerance_type
|
Literal['relative', 'absolute']
|
Type of ion mobility tolerance - "relative" or "absolute" |
'relative'
|
min_peaks
|
int
|
Minimum number of nearby raw peaks required to form a centroid (0 or 1 keeps all) |
3
|
max_peaks
|
int | None
|
Maximum number of centroided peaks to return |
None
|
noise_filter
|
None | (Literal['mad', 'percentile', 'histogram', 'baseline', 'iterative_median'] | float | int)
|
Noise filtering method to apply before centroiding. Options: - None: No noise filtering (default) - "mad": Median Absolute Deviation method - "percentile": 75th percentile threshold - "histogram": Histogram mode-based estimation - "baseline": Bottom quartile statistics - "iterative_median": Iterative median filtering - float/int: Direct intensity threshold value |
None
|
Returns:
| Type | Description |
|---|---|
np.ndarray
|
np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility] |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the frame_id doesn't exist or is not an MS1 frame |
RuntimeError
|
If the TimsData connection is not open |
Example
with timsdata_connect('path/to/data.d') as td:
# Get centroided spectrum with 1/K0 (default)
peaks = get_centroided_ms1_spectrum(td, frame_id=1)
print(f"Found {len(peaks)} centroided peaks")
# Get spectrum with CCS values
spectrum = get_centroided_ms1_spectrum(td, frame_id=1, ion_mobility_type="ccs")
# Custom centroiding tolerances
spectrum = get_centroided_ms1_spectrum(
td, frame_id=1, mz_tolerance=10, im_tolerance=0.1
)
# With noise filtering
spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter="mad")
# With custom noise threshold
spectrum = get_centroided_ms1_spectrum(td, frame_id=1, noise_filter=1000.0)
Source code in src/tdfpy/centroiding.py
480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 | |
merge_peaks
Centroids pre-assembled arrays. The algorithm is a greedy intensity-ordered scan: starting from the highest-intensity raw peak, every neighbouring peak within the m/z and ion mobility tolerances is merged into a single centroid via intensity-weighted averaging. Merged peaks are marked as used and skipped in subsequent iterations.
| Parameter | Default | Notes |
|---|---|---|
mz_tolerance |
8.0 |
Width of the m/z matching window |
mz_tolerance_type |
"ppm" |
"ppm" or "da" |
im_tolerance |
0.1 |
Width of the ion mobility window |
im_tolerance_type |
"relative" |
"relative" (fraction of 1/K0) or "absolute" |
min_peaks |
3 |
Raw peaks required to form a centroid; set to 0 or 1 to keep all |
max_peaks |
None |
Cap on output peaks (highest-intensity first) |
use_numba |
True |
Set to False to force the Python fallback |
import numpy as np
from tdfpy import merge_peaks
mz = np.array([500.001, 500.002, 700.005, 700.006, 700.007])
inten = np.array([8000.0, 4000.0, 6000.0, 5000.0, 3000.0])
im = np.array([0.85, 0.85, 0.92, 0.92, 0.92])
peaks = merge_peaks(mz, inten, im, mz_tolerance=10.0, min_peaks=2)
print(peaks)
# shape (2, 3): two centroided peaks, columns [m/z, intensity, 1/K0]
Noise filtering vs min_peaks
The noise_filter parameter (available on both get_centroided_spectrum and .centroid())
estimates a threshold from the intensity distribution and discards centroids below it.
While convenient, intensity-based noise estimation has a fundamental limitation: it cannot
distinguish low-abundance real signal from electronic noise, and will discard both equally.
Methods such as "mad" are anchored to the median of all centroid intensities — if your
sample has sparse signal, the threshold can rise above legitimate low-abundance peaks.
A more reliable strategy is to increase min_peaks instead:
# Prefer: raise min_peaks to filter noise without discarding low-abundance signal
peaks = merge_peaks(mz, intensity, im, min_peaks=5)
# Noise arises from single scans; real peaks appear across multiple scans.
# min_peaks=5 means a centroid must be supported by at least 5 raw measurements.
Because electronic noise typically manifests as a singleton in a single scan, requiring several supporting raw peaks is a structural filter — it targets the origin of noise rather than its intensity. Low-abundance real peaks that appear consistently across scans will survive, whereas noise will not.
Use noise_filter only if you have a calibrated threshold or a specific method validated
for your acquisition settings, and always verify against a noise_filter=None baseline first.
tdfpy.merge_peaks
merge_peaks(
mz_array: np.ndarray,
intensity_array: np.ndarray,
ion_mobility_array: np.ndarray,
mz_tolerance: float = 8.0,
mz_tolerance_type: Literal["ppm", "da"] = "ppm",
im_tolerance: float = 0.1,
im_tolerance_type: Literal[
"relative", "absolute"
] = "relative",
min_peaks: int = 3,
max_peaks: int | None = None,
use_numba: bool = True,
) -> np.ndarray
Centroid profile-like peaks using m/z and ion mobility tolerances.
This function implements a greedy clustering algorithm that centroids raw peaks (similar to profile mode data) within specified m/z and ion mobility windows. Peaks are processed in descending order of intensity, and nearby peaks are combined using intensity-weighted averaging to produce centroided peaks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mz_array
|
np.ndarray
|
Array of m/z values from raw/profile-like data |
required |
intensity_array
|
np.ndarray
|
Array of intensity values |
required |
ion_mobility_array
|
np.ndarray
|
Array of ion mobility values (1/K0 or CCS) |
required |
mz_tolerance
|
float
|
Tolerance for m/z matching during centroiding |
8.0
|
mz_tolerance_type
|
Literal['ppm', 'da']
|
Type of m/z tolerance - "ppm" or "da" (daltons) |
'ppm'
|
im_tolerance
|
float
|
Tolerance for ion mobility matching during centroiding |
0.1
|
im_tolerance_type
|
Literal['relative', 'absolute']
|
Type of ion mobility tolerance - "relative" or "absolute" |
'relative'
|
min_peaks
|
int
|
Minimum number of nearby raw peaks required to form a centroid. Set to 0 or 1 to keep all peaks (no filtering). |
3
|
max_peaks
|
int | None
|
Maximum number of centroided peaks to return (keeps highest intensity) |
None
|
Returns:
| Type | Description |
|---|---|
np.ndarray
|
np.ndarray: Array of shape (N, 3) containing centroided peaks. Columns are: [mz, intensity, ion_mobility] |
Example
mz = np.array([100.0, 100.001, 200.0])
intensity = np.array([1000.0, 500.0, 2000.0])
im = np.array([0.8, 0.8, 0.9])
peaks = merge_peaks(mz, intensity, im, mz_tolerance=10, mz_tolerance_type="ppm")
Source code in src/tdfpy/centroiding.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | |