Noise filters
Composable noise filters live in tdfpy.noise. A pipeline of filters is
applied in order; each takes raw (scan_indices, mz_indices, intensities)
and returns a boolean keep-mask. Frozen dataclasses make them hashable
(suitable for caching) and dataclasses.replace-tweakable.
from tdfpy import MadThreshold, VerticalNoiseFilter, get_raw_peaks
peaks = get_raw_peaks(
td, frame_id,
noise=[
VerticalNoiseFilter(min_streak_scans=5, num_iterations=2),
MadThreshold(k=3),
],
)
User-facing APIs (get_raw_peaks, get_centroided_spectrum,
Frame.raw_peaks, etc.) also accept the string shorthand for terseness:
noise="mad", noise="iterative_median", noise=500.0, etc. See
coerce_filters for the accepted forms.
Base class & coercion
tdfpy.NoiseFilter
Bases: ABC
Base class for raw-peak noise filters.
Subclasses are typically frozen dataclasses with their tunable knobs as
fields. They implement a single method, :meth:keep_mask, which
returns a boolean array of length len(intensities) indicating which
points to keep.
Filters operate on integer indices (TOF index + scan number) and raw intensity. Conversion to m/z and 1/K0 happens later in the pipeline so filters never need to do per-point unit conversions themselves.
keep_mask
abstractmethod
keep_mask(
scan_indices: np.ndarray,
mz_indices: np.ndarray,
intensities: np.ndarray,
*,
num_scans: int,
td: "TimsData",
frame_id: int
) -> np.ndarray
Return a boolean keep-mask of length len(intensities).
Source code in src/tdfpy/noise/__init__.py
47 48 49 50 51 52 53 54 55 56 57 58 | |
tdfpy.coerce_filters
coerce_filters(spec: NoiseSpec) -> tuple[NoiseFilter, ...]
Normalize a user-facing noise spec to a tuple of filter instances.
Accepts:
None→ empty tuple (no filtering)- a single :class:
NoiseFilterinstance → one-element tuple - a list/tuple of any of the above → flattened tuple
- a string from
"mad" | "percentile" | "histogram" | "baseline" | "iterative_median"→ an :class:IntensityThresholdsubclass with defaults - a numeric (
float/int) → :class:AbsoluteThreshold
Strings and numerics are how existing call sites stay terse; the tuple
output is hashable for caching (e.g. Streamlit @cache_data).
Source code in src/tdfpy/noise/__init__.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
Intensity-threshold filters
Each subclass exposes the knobs of its estimator as dataclass fields.
tdfpy.IntensityThreshold
dataclass
IntensityThreshold()
Bases: NoiseFilter
Drop points whose intensity is below a computed threshold.
Subclasses implement :meth:compute_threshold to derive the threshold
from the intensity distribution. The keep-mask is then the simple
intensities >= threshold comparison.
compute_threshold
abstractmethod
compute_threshold(intensities: np.ndarray) -> float
Return the intensity floor for this estimator.
Source code in src/tdfpy/noise/intensity.py
32 33 34 | |
tdfpy.AbsoluteThreshold
dataclass
AbsoluteThreshold(value: float = 0.0)
tdfpy.MadThreshold
dataclass
MadThreshold(k: float = 3.0, scale: float = 1.4826)
Bases: IntensityThreshold
Median Absolute Deviation threshold: median + k · scale · MAD.
scale = 1.4826 makes MAD a consistent estimator of the standard
deviation for a Gaussian distribution.
tdfpy.PercentileThreshold
dataclass
PercentileThreshold(q: float = 75.0)
tdfpy.HistogramThreshold
dataclass
HistogramThreshold(bins: int = 100, k: float = 3.0)
Bases: IntensityThreshold
Mode-of-histogram noise floor + k standard deviations.
Bins the intensities into bins equal-width bins, takes the modal
bin as the noise mode, estimates noise std from the FWHM around it,
and returns mode + k · std.
tdfpy.BaselineThreshold
dataclass
BaselineThreshold(q: float = 25.0, k: float = 3.0)
tdfpy.IterativeMedianThreshold
dataclass
IterativeMedianThreshold(
passes: int = 3,
inner_k: float = 2.0,
final_k: float = 3.0,
scale: float = 1.4826,
min_remaining: int = 100,
)
Bases: IntensityThreshold
Iteratively trim peaks above median + inner_k · scale · MAD.
Repeats up to passes times (or until fewer than min_remaining
points are left). The final threshold is median + final_k · std
of the surviving distribution.
Structural filters
tdfpy.VerticalNoiseFilter
dataclass
VerticalNoiseFilter(
mz_idx_half_width: int = 3,
min_streak_scans: int = 5,
max_gap_scans: int = 1,
min_streak_intensity: float = 50.0,
num_iterations: int = 2,
)
Bases: NoiseFilter
Keep points belonging to vertical streaks in (scan, TOF_index) space.
A real ion produces an intensity streak along the ion-mobility axis at roughly the same TOF index across many consecutive scans. Noise tends to be isolated single hits or short streaks. This filter walks each TOF index, builds the IM intensity profile in a small m/z window, finds gap-closed runs of occupied scans, and keeps points whose scan falls inside a run that's long enough and intense enough.
Iterated passes (num_iterations > 1) feed each pass the survivors
of the previous one — points that only just survived because they sat
next to barely-thick noise get dropped on a later pass once that noise
is gone.
See apps/ALGORITHM.md for the full write-up.
keep_mask
keep_mask(
scan_indices: np.ndarray,
mz_indices: np.ndarray,
intensities: np.ndarray,
*,
num_scans: int,
td: "TimsData",
frame_id: int
) -> np.ndarray
Source code in src/tdfpy/noise/structural.py
306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
run
run(
scan_indices: np.ndarray,
mz_indices: np.ndarray,
intensities: np.ndarray,
*,
num_scans: int,
diagnostics: bool = False
) -> "np.ndarray | VerticalNoiseDiagnostics"
Run the filter on raw arrays.
When diagnostics is False (default) returns the keep-mask only.
When True returns a :class:VerticalNoiseDiagnostics carrying the mask
and per-pass telemetry — used by the timsTOF viewer's IM-filter page.
Source code in src/tdfpy/noise/structural.py
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
tdfpy.noise.VerticalNoiseDiagnostics
dataclass
VerticalNoiseDiagnostics(
keep_point_mask: np.ndarray,
num_columns_evaluated: int,
num_columns_with_kept_runs: int,
num_kept_points: int,
feature_span_intensities: np.ndarray,
per_pass_kept: list[int] = list(),
)
Diagnostics from a single or iterated pass of :class:VerticalNoiseFilter.
Fields are populated from the final pass when num_iterations > 1,
except for :attr:per_pass_kept which traces all passes.
tdfpy.HorizontalHaloFilter
dataclass
HorizontalHaloFilter(
peak_fraction: float = 0.15,
mz_idx_half_width: int = 100,
scan_half_width: int = 2,
)
Bases: NoiseFilter
Remove the weak m/z halo flanking bright peaks — left/right only.
High-intensity ions are flanked by a halo of weak peaks — likely from charge interactions within the fragment-ion cloud or detector effects — that are not resolvable to high-precision m/z values. A real ion forms a vertical streak along the ion-mobility axis (the same TOF index across many consecutive mobility scans), so this filter only removes peaks to the left and right (offset in TOF/m-z index) of a bright neighbour and never above or below (offset in ion mobility at the same index).
For each peak it computes a local reference intensity — the maximum
intensity in the surrounding box (±scan_half_width scans,
±mz_idx_half_width TOF indices) excluding the peak's own m/z column
— and drops the peak if its intensity falls below peak_fraction of
that reference. Excluding the peak's own column is what guarantees the
vertical streak is never used against it: a bright peak directly above or
below sits in the same column and so can never raise the threshold; only a
genuine left/right neighbour can. A peak with no off-column neighbours in
its box is always kept. Set scan_half_width=0 for strictly per-row
(same ion-mobility scan) behaviour.
Operates entirely in integer (scan, TOF index) space — no unit
conversion. The box max is JIT-compiled with Numba, with a pure-Python
fallback.
keep_mask
keep_mask(
scan_indices: np.ndarray,
mz_indices: np.ndarray,
intensities: np.ndarray,
*,
num_scans: int,
td: "TimsData",
frame_id: int
) -> np.ndarray
Source code in src/tdfpy/noise/structural.py
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 | |