Skip to content

Noise filters

Composable noise filters live in tdfpy.noise. A pipeline of filters is applied in order; each takes raw (scan_indices, mz_indices, intensities) and returns a boolean keep-mask. Frozen dataclasses make them hashable (suitable for caching) and dataclasses.replace-tweakable.

from tdfpy import MadThreshold, VerticalNoiseFilter, get_raw_peaks

peaks = get_raw_peaks(
    td, frame_id,
    noise=[
        VerticalNoiseFilter(min_streak_scans=5, num_iterations=2),
        MadThreshold(k=3),
    ],
)

User-facing APIs (get_raw_peaks, get_centroided_spectrum, Frame.raw_peaks, etc.) also accept the string shorthand for terseness: noise="mad", noise="iterative_median", noise=500.0, etc. See coerce_filters for the accepted forms.


Base class & coercion

tdfpy.NoiseFilter

Bases: ABC

Base class for raw-peak noise filters.

Subclasses are typically frozen dataclasses with their tunable knobs as fields. They implement a single method, :meth:keep_mask, which returns a boolean array of length len(intensities) indicating which points to keep.

Filters operate on integer indices (TOF index + scan number) and raw intensity. Conversion to m/z and 1/K0 happens later in the pipeline so filters never need to do per-point unit conversions themselves.

keep_mask abstractmethod

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Return a boolean keep-mask of length len(intensities).

Source code in src/tdfpy/noise/__init__.py
47
48
49
50
51
52
53
54
55
56
57
58
@abstractmethod
def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    """Return a boolean keep-mask of length ``len(intensities)``."""

tdfpy.coerce_filters

coerce_filters(spec: NoiseSpec) -> tuple[NoiseFilter, ...]

Normalize a user-facing noise spec to a tuple of filter instances.

Accepts:

  • None → empty tuple (no filtering)
  • a single :class:NoiseFilter instance → one-element tuple
  • a list/tuple of any of the above → flattened tuple
  • a string from "mad" | "percentile" | "histogram" | "baseline" | "iterative_median" → an :class:IntensityThreshold subclass with defaults
  • a numeric (float / int) → :class:AbsoluteThreshold

Strings and numerics are how existing call sites stay terse; the tuple output is hashable for caching (e.g. Streamlit @cache_data).

Source code in src/tdfpy/noise/__init__.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def coerce_filters(spec: NoiseSpec) -> tuple[NoiseFilter, ...]:
    """Normalize a user-facing noise spec to a tuple of filter instances.

    Accepts:

    - ``None`` → empty tuple (no filtering)
    - a single :class:`NoiseFilter` instance → one-element tuple
    - a list/tuple of any of the above → flattened tuple
    - a string from ``"mad" | "percentile" | "histogram" | "baseline" |
      "iterative_median"`` → an :class:`IntensityThreshold` subclass with
      defaults
    - a numeric (``float`` / ``int``) → :class:`AbsoluteThreshold`

    Strings and numerics are how existing call sites stay terse; the tuple
    output is hashable for caching (e.g. Streamlit ``@cache_data``).
    """
    if spec is None:
        return ()
    if isinstance(spec, NoiseFilter):
        return (spec,)
    if isinstance(spec, str):
        try:
            cls = _STRING_ALIASES[spec]
        except KeyError as exc:
            raise ValueError(
                f"Unknown noise filter name {spec!r}. "
                f"Valid names: {sorted(_STRING_ALIASES)}"
            ) from exc
        return (cls(),)
    if isinstance(spec, (int, float)) and not isinstance(spec, bool):
        return (AbsoluteThreshold(value=float(spec)),)
    if isinstance(spec, (list, tuple)):
        out: list[NoiseFilter] = []
        for item in spec:
            out.extend(coerce_filters(item))
        return tuple(out)
    raise TypeError(
        f"Cannot coerce {type(spec).__name__} to a noise filter. "
        "Expected NoiseFilter, str, float, list/tuple, or None."
    )

Intensity-threshold filters

Each subclass exposes the knobs of its estimator as dataclass fields.

tdfpy.IntensityThreshold dataclass

IntensityThreshold()

Bases: NoiseFilter

Drop points whose intensity is below a computed threshold.

Subclasses implement :meth:compute_threshold to derive the threshold from the intensity distribution. The keep-mask is then the simple intensities >= threshold comparison.

compute_threshold abstractmethod

compute_threshold(intensities: np.ndarray) -> float

Return the intensity floor for this estimator.

Source code in src/tdfpy/noise/intensity.py
32
33
34
@abstractmethod
def compute_threshold(self, intensities: np.ndarray) -> float:
    """Return the intensity floor for this estimator."""

tdfpy.AbsoluteThreshold dataclass

AbsoluteThreshold(value: float = 0.0)

Bases: IntensityThreshold

Constant intensity floor, ignored estimator.

tdfpy.MadThreshold dataclass

MadThreshold(k: float = 3.0, scale: float = 1.4826)

Bases: IntensityThreshold

Median Absolute Deviation threshold: median + k · scale · MAD.

scale = 1.4826 makes MAD a consistent estimator of the standard deviation for a Gaussian distribution.

tdfpy.PercentileThreshold dataclass

PercentileThreshold(q: float = 75.0)

Bases: IntensityThreshold

Drop everything below the q-th percentile of intensities.

tdfpy.HistogramThreshold dataclass

HistogramThreshold(bins: int = 100, k: float = 3.0)

Bases: IntensityThreshold

Mode-of-histogram noise floor + k standard deviations.

Bins the intensities into bins equal-width bins, takes the modal bin as the noise mode, estimates noise std from the FWHM around it, and returns mode + k · std.

tdfpy.BaselineThreshold dataclass

BaselineThreshold(q: float = 25.0, k: float = 3.0)

Bases: IntensityThreshold

Bottom-quartile baseline: mean + k · std of the lowest q %.

tdfpy.IterativeMedianThreshold dataclass

IterativeMedianThreshold(
    passes: int = 3,
    inner_k: float = 2.0,
    final_k: float = 3.0,
    scale: float = 1.4826,
    min_remaining: int = 100,
)

Bases: IntensityThreshold

Iteratively trim peaks above median + inner_k · scale · MAD.

Repeats up to passes times (or until fewer than min_remaining points are left). The final threshold is median + final_k · std of the surviving distribution.


Structural filters

tdfpy.VerticalNoiseFilter dataclass

VerticalNoiseFilter(
    mz_idx_half_width: int = 3,
    min_streak_scans: int = 5,
    max_gap_scans: int = 1,
    min_streak_intensity: float = 50.0,
    num_iterations: int = 2,
)

Bases: NoiseFilter

Keep points belonging to vertical streaks in (scan, TOF_index) space.

A real ion produces an intensity streak along the ion-mobility axis at roughly the same TOF index across many consecutive scans. Noise tends to be isolated single hits or short streaks. This filter walks each TOF index, builds the IM intensity profile in a small m/z window, finds gap-closed runs of occupied scans, and keeps points whose scan falls inside a run that's long enough and intense enough.

Iterated passes (num_iterations > 1) feed each pass the survivors of the previous one — points that only just survived because they sat next to barely-thick noise get dropped on a later pass once that noise is gone.

See apps/ALGORITHM.md for the full write-up.

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray
Source code in src/tdfpy/noise/structural.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    return self.run(
        scan_indices, mz_indices, intensities, num_scans=num_scans,
        diagnostics=False,
    )

run

run(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: bool = False
) -> "np.ndarray | VerticalNoiseDiagnostics"

Run the filter on raw arrays.

When diagnostics is False (default) returns the keep-mask only. When True returns a :class:VerticalNoiseDiagnostics carrying the mask and per-pass telemetry — used by the timsTOF viewer's IM-filter page.

Source code in src/tdfpy/noise/structural.py
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
def run(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: bool = False,
) -> "np.ndarray | VerticalNoiseDiagnostics":
    """Run the filter on raw arrays.

    When ``diagnostics`` is False (default) returns the keep-mask only.
    When True returns a :class:`VerticalNoiseDiagnostics` carrying the mask
    and per-pass telemetry — used by the timsTOF viewer's IM-filter page.
    """
    n = scan_indices.size
    if n == 0:
        empty = np.zeros(0, dtype=bool)
        if not diagnostics:
            return empty
        return VerticalNoiseDiagnostics(
            keep_point_mask=empty,
            num_columns_evaluated=0,
            num_columns_with_kept_runs=0,
            num_kept_points=0,
            feature_span_intensities=np.zeros(0, dtype=np.float64),
            per_pass_kept=[0],
        )

    cumulative = np.ones(n, dtype=bool)
    per_pass_kept = [n]
    last_n_cols = 0
    last_n_cols_kept = 0
    last_span_intensities = np.zeros(0, dtype=np.float64)

    for _ in range(int(self.num_iterations)):
        active = np.nonzero(cumulative)[0]
        if active.size == 0:
            break
        mask, n_cols, n_cols_kept, spans = _single_pass_filter(
            scan_indices[cumulative],
            mz_indices[cumulative],
            intensities[cumulative],
            num_scans,
            mz_idx_half_width=self.mz_idx_half_width,
            min_streak_scans=self.min_streak_scans,
            max_gap_scans=self.max_gap_scans,
            min_streak_intensity=self.min_streak_intensity,
            collect_span_intensities=diagnostics,
        )
        kept = active[mask]
        cumulative = np.zeros(n, dtype=bool)
        cumulative[kept] = True
        per_pass_kept.append(int(cumulative.sum()))
        last_n_cols = n_cols
        last_n_cols_kept = n_cols_kept
        last_span_intensities = spans
        if not cumulative.any():
            break

    if not diagnostics:
        return cumulative
    return VerticalNoiseDiagnostics(
        keep_point_mask=cumulative,
        num_columns_evaluated=last_n_cols,
        num_columns_with_kept_runs=last_n_cols_kept,
        num_kept_points=int(cumulative.sum()),
        feature_span_intensities=last_span_intensities,
        per_pass_kept=per_pass_kept,
    )

tdfpy.noise.VerticalNoiseDiagnostics dataclass

VerticalNoiseDiagnostics(
    keep_point_mask: np.ndarray,
    num_columns_evaluated: int,
    num_columns_with_kept_runs: int,
    num_kept_points: int,
    feature_span_intensities: np.ndarray,
    per_pass_kept: list[int] = list(),
)

Diagnostics from a single or iterated pass of :class:VerticalNoiseFilter.

Fields are populated from the final pass when num_iterations > 1, except for :attr:per_pass_kept which traces all passes.

tdfpy.HorizontalHaloFilter dataclass

HorizontalHaloFilter(
    peak_fraction: float = 0.15,
    mz_idx_half_width: int = 100,
    scan_half_width: int = 2,
)

Bases: NoiseFilter

Remove the weak m/z halo flanking bright peaks — left/right only.

High-intensity ions are flanked by a halo of weak peaks — likely from charge interactions within the fragment-ion cloud or detector effects — that are not resolvable to high-precision m/z values. A real ion forms a vertical streak along the ion-mobility axis (the same TOF index across many consecutive mobility scans), so this filter only removes peaks to the left and right (offset in TOF/m-z index) of a bright neighbour and never above or below (offset in ion mobility at the same index).

For each peak it computes a local reference intensity — the maximum intensity in the surrounding box (±scan_half_width scans, ±mz_idx_half_width TOF indices) excluding the peak's own m/z column — and drops the peak if its intensity falls below peak_fraction of that reference. Excluding the peak's own column is what guarantees the vertical streak is never used against it: a bright peak directly above or below sits in the same column and so can never raise the threshold; only a genuine left/right neighbour can. A peak with no off-column neighbours in its box is always kept. Set scan_half_width=0 for strictly per-row (same ion-mobility scan) behaviour.

Operates entirely in integer (scan, TOF index) space — no unit conversion. The box max is JIT-compiled with Numba, with a pure-Python fallback.

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray
Source code in src/tdfpy/noise/structural.py
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    n = intensities.size
    if n == 0:
        return np.ones(0, dtype=bool)
    # Sort by (scan, mz_idx) so each scan is a contiguous, m/z-sorted block.
    order = np.lexsort((mz_indices, scan_indices))
    scan_s = np.ascontiguousarray(scan_indices[order], dtype=np.int64)
    mz_s = np.ascontiguousarray(mz_indices[order], dtype=np.int64)
    int_s = np.ascontiguousarray(intensities[order], dtype=np.float64)

    block_start = np.zeros(num_scans, dtype=np.int64)
    block_len = np.zeros(num_scans, dtype=np.int64)
    uniq, starts, counts = np.unique(
        scan_s, return_index=True, return_counts=True
    )
    block_start[uniq] = starts
    block_len[uniq] = counts

    ref_sorted = _offcol_box_max(
        scan_s, mz_s, int_s, block_start, block_len,
        int(self.scan_half_width), int(self.mz_idx_half_width),
    )
    ref = np.empty(n, dtype=np.float64)
    ref[order] = ref_sorted
    # Keep unless strictly below the fraction of the left/right max.
    # Peaks with no off-column neighbours (ref == 0) are always kept.
    return intensities >= self.peak_fraction * ref