Noise filters

Composable noise filters live in tdfpy.noise. A pipeline of filters is applied in order; each takes raw (scan_indices, mz_indices, intensities) and returns a boolean keep-mask. Frozen dataclasses make them hashable (suitable for caching) and dataclasses.replace-tweakable.

from tdfpy import MadThreshold, VerticalNoiseFilter, get_raw_peaks

peaks = get_raw_peaks(
    td, frame_id,
    noise=[
        VerticalNoiseFilter(min_streak_scans=5, num_iterations=2),
        MadThreshold(k=3),
    ],
)

User-facing APIs (get_raw_peaks, get_centroided_spectrum, Frame.raw_peaks, etc.) also accept the string shorthand for terseness: noise="mad", noise="iterative_median", noise=500.0, etc. See coerce_filters for the accepted forms.

Base class & coercion

tdfpy.NoiseFilter

Bases: ABC

Base class for raw-peak noise filters.

Subclasses are typically frozen dataclasses with their tunable knobs as fields. They implement a single method, :meth:keep_mask, which returns a boolean array of length len(intensities) indicating which points to keep.

Filters operate on integer indices (TOF index + scan number) and raw intensity. Conversion to m/z and 1/K0 happens later in the pipeline so filters never need to do per-point unit conversions themselves.

keep_mask `abstractmethod`

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Return a boolean keep-mask of length len(intensities).

Source code in src/tdfpy/noise/__init__.py

@abstractmethod
def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    """Return a boolean keep-mask of length ``len(intensities)``."""

tdfpy.coerce_filters

coerce_filters(spec: NoiseSpec) -> tuple[NoiseFilter, ...]

Normalize a user-facing noise spec to a tuple of filter instances.

Accepts:

None → empty tuple (no filtering)
a single :class:NoiseFilter instance → one-element tuple
a list/tuple of any of the above → flattened tuple
a string from "mad" | "percentile" | "histogram" | "baseline" | "iterative_median" → an :class:IntensityThreshold subclass with defaults
a numeric (float / int) → :class:AbsoluteThreshold

Strings and numerics are how existing call sites stay terse; the tuple output is hashable for caching (e.g. Streamlit @cache_data).

Source code in src/tdfpy/noise/__init__.py

def coerce_filters(spec: NoiseSpec) -> tuple[NoiseFilter, ...]:
    """Normalize a user-facing noise spec to a tuple of filter instances.

    Accepts:

    - ``None`` → empty tuple (no filtering)
    - a single :class:`NoiseFilter` instance → one-element tuple
    - a list/tuple of any of the above → flattened tuple
    - a string from ``"mad" | "percentile" | "histogram" | "baseline" |
      "iterative_median"`` → an :class:`IntensityThreshold` subclass with
      defaults
    - a numeric (``float`` / ``int``) → :class:`AbsoluteThreshold`

    Strings and numerics are how existing call sites stay terse; the tuple
    output is hashable for caching (e.g. Streamlit ``@cache_data``).
    """
    if spec is None:
        return ()
    if isinstance(spec, NoiseFilter):
        return (spec,)
    if isinstance(spec, str):
        try:
            cls = _STRING_ALIASES[spec]
        except KeyError as exc:
            raise ValueError(
                f"Unknown noise filter name {spec!r}. "
                f"Valid names: {sorted(_STRING_ALIASES)}"
            ) from exc
        return (cls(),)
    if isinstance(spec, (int, float)) and not isinstance(spec, bool):
        return (AbsoluteThreshold(value=float(spec)),)
    if isinstance(spec, (list, tuple)):
        out: list[NoiseFilter] = []
        for item in spec:
            out.extend(coerce_filters(item))
        return tuple(out)
    raise TypeError(
        f"Cannot coerce {type(spec).__name__} to a noise filter. "
        "Expected NoiseFilter, str, float, list/tuple, or None."
    )

Intensity-threshold filters

Each subclass exposes the knobs of its estimator as dataclass fields.

tdfpy.IntensityThreshold `dataclass`

IntensityThreshold()

Bases: NoiseFilter

Drop points whose intensity is below a computed threshold.

Subclasses implement :meth:compute_threshold to derive the threshold from the intensity distribution. The keep-mask is then the simple intensities >= threshold comparison.

compute_threshold `abstractmethod`

compute_threshold(intensities: np.ndarray) -> float

Return the intensity floor for this estimator.

Source code in src/tdfpy/noise/intensity.py

@abstractmethod
def compute_threshold(self, intensities: np.ndarray) -> float:
    """Return the intensity floor for this estimator."""

tdfpy.AbsoluteThreshold `dataclass`

AbsoluteThreshold(value: float = 0.0)

Bases: IntensityThreshold

Constant intensity floor, ignored estimator.

tdfpy.MadThreshold `dataclass`

MadThreshold(k: float = 3.0, scale: float = 1.4826)

Bases: IntensityThreshold

Median Absolute Deviation threshold: median + k · scale · MAD.

scale = 1.4826 makes MAD a consistent estimator of the standard deviation for a Gaussian distribution.

tdfpy.PercentileThreshold `dataclass`

PercentileThreshold(q: float = 75.0)

Bases: IntensityThreshold

Drop everything below the q-th percentile of intensities.

tdfpy.HistogramThreshold `dataclass`

HistogramThreshold(bins: int = 100, k: float = 3.0)

Bases: IntensityThreshold

Mode-of-histogram noise floor + k standard deviations.

Bins the intensities into bins equal-width bins, takes the modal bin as the noise mode, estimates noise std from the FWHM around it, and returns mode + k · std.

tdfpy.BaselineThreshold `dataclass`

BaselineThreshold(q: float = 25.0, k: float = 3.0)

Bases: IntensityThreshold

Bottom-quartile baseline: mean + k · std of the lowest q %.

tdfpy.IterativeMedianThreshold `dataclass`

IterativeMedianThreshold(
    passes: int = 3,
    inner_k: float = 2.0,
    final_k: float = 3.0,
    scale: float = 1.4826,
    min_remaining: int = 100,
)

Bases: IntensityThreshold

Iteratively trim peaks above median + inner_k · scale · MAD.

Repeats up to passes times (or until fewer than min_remaining points are left). The final threshold is median + final_k · std of the surviving distribution.

Structural filters

tdfpy.VerticalNoiseFilter `dataclass`

VerticalNoiseFilter(
    mz_idx_half_width: int = 3,
    min_streak_scans: int = 5,
    max_gap_scans: int = 1,
    min_streak_intensity: float = 50.0,
    num_iterations: int = 2,
)

Bases: NoiseFilter

Keep points belonging to vertical streaks in (scan, TOF_index) space.

A real ion produces an intensity streak along the ion-mobility axis at roughly the same TOF index across many consecutive scans. Noise tends to be isolated single hits or short streaks. This filter walks each TOF index, builds the IM intensity profile in a small m/z window, finds gap-closed runs of occupied scans, and keeps points whose scan falls inside a run that's long enough and intense enough.

Iterated passes (num_iterations > 1) feed each pass the survivors of the previous one — points that only just survived because they sat next to barely-thick noise get dropped on a later pass once that noise is gone.

See apps/ALGORITHM.md for the full write-up.

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Source code in src/tdfpy/noise/structural.py

def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    return self.run(
        scan_indices, mz_indices, intensities, num_scans=num_scans,
        diagnostics=False,
    )

run

run(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: Literal[False] = ...
) -> np.ndarray

run(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: Literal[True]
) -> "VerticalNoiseDiagnostics"

run(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: bool = False
) -> "np.ndarray | VerticalNoiseDiagnostics"

Run the filter on raw arrays.

When diagnostics is False (default) returns the keep-mask only. When True returns a :class:VerticalNoiseDiagnostics carrying the mask and per-pass telemetry — used by the timsTOF viewer's IM-filter page.

Source code in src/tdfpy/noise/structural.py

def run(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    diagnostics: bool = False,
) -> "np.ndarray | VerticalNoiseDiagnostics":
    """Run the filter on raw arrays.

    When ``diagnostics`` is False (default) returns the keep-mask only.
    When True returns a :class:`VerticalNoiseDiagnostics` carrying the mask
    and per-pass telemetry — used by the timsTOF viewer's IM-filter page.
    """
    n = scan_indices.size
    if n == 0:
        empty = np.zeros(0, dtype=bool)
        if not diagnostics:
            return empty
        return VerticalNoiseDiagnostics(
            keep_point_mask=empty,
            num_columns_evaluated=0,
            num_columns_with_kept_runs=0,
            num_kept_points=0,
            feature_span_intensities=np.zeros(0, dtype=np.float64),
            per_pass_kept=[0],
        )

    cumulative = np.ones(n, dtype=bool)
    per_pass_kept = [n]
    last_n_cols = 0
    last_n_cols_kept = 0
    last_span_intensities = np.zeros(0, dtype=np.float64)

    for _ in range(int(self.num_iterations)):
        active = np.nonzero(cumulative)[0]
        if active.size == 0:
            break
        mask, n_cols, n_cols_kept, spans = _single_pass_filter(
            scan_indices[cumulative],
            mz_indices[cumulative],
            intensities[cumulative],
            num_scans,
            mz_idx_half_width=self.mz_idx_half_width,
            min_streak_scans=self.min_streak_scans,
            max_gap_scans=self.max_gap_scans,
            min_streak_intensity=self.min_streak_intensity,
            collect_span_intensities=diagnostics,
        )
        kept = active[mask]
        cumulative = np.zeros(n, dtype=bool)
        cumulative[kept] = True
        per_pass_kept.append(int(cumulative.sum()))
        last_n_cols = n_cols
        last_n_cols_kept = n_cols_kept
        last_span_intensities = spans
        if not cumulative.any():
            break

    n_kept = int(cumulative.sum())
    logger.debug(
        "VerticalNoiseFilter: kept %d/%d points over %d pass(es) (per-pass: %s).",
        n_kept,
        n,
        len(per_pass_kept) - 1,
        per_pass_kept,
    )
    if n_kept == 0:
        logger.warning(
            "VerticalNoiseFilter: removed ALL %d points. The streak thresholds "
            "may be too strict (min_streak_scans=%d, min_streak_intensity=%.1f).",
            n,
            self.min_streak_scans,
            self.min_streak_intensity,
        )

    if not diagnostics:
        return cumulative
    return VerticalNoiseDiagnostics(
        keep_point_mask=cumulative,
        num_columns_evaluated=last_n_cols,
        num_columns_with_kept_runs=last_n_cols_kept,
        num_kept_points=int(cumulative.sum()),
        feature_span_intensities=last_span_intensities,
        per_pass_kept=per_pass_kept,
    )

tdfpy.noise.VerticalNoiseDiagnostics `dataclass`

VerticalNoiseDiagnostics(
    keep_point_mask: np.ndarray,
    num_columns_evaluated: int,
    num_columns_with_kept_runs: int,
    num_kept_points: int,
    feature_span_intensities: np.ndarray,
    per_pass_kept: list[int] = list(),
)

Diagnostics from a single or iterated pass of :class:VerticalNoiseFilter.

Fields are populated from the final pass when num_iterations > 1, except for :attr:per_pass_kept which traces all passes.

tdfpy.HorizontalHaloFilter `dataclass`

HorizontalHaloFilter(
    peak_fraction: float = 0.15,
    mz_idx_half_width: int = 100,
    scan_half_width: int = 2,
)

Bases: NoiseFilter

Remove the weak m/z halo flanking bright peaks — left/right only.

High-intensity ions are flanked by a halo of weak peaks — likely from charge interactions within the fragment-ion cloud or detector effects — that are not resolvable to high-precision m/z values. A real ion forms a vertical streak along the ion-mobility axis (the same TOF index across many consecutive mobility scans), so this filter only removes peaks to the left and right (offset in TOF/m-z index) of a bright neighbour and never above or below (offset in ion mobility at the same index).

For each peak it computes a local reference intensity — the maximum intensity in the surrounding box (±scan_half_width scans, ±mz_idx_half_width TOF indices) excluding the peak's own m/z column — and drops the peak if its intensity falls below peak_fraction of that reference. Excluding the peak's own column is what guarantees the vertical streak is never used against it: a bright peak directly above or below sits in the same column and so can never raise the threshold; only a genuine left/right neighbour can. A peak with no off-column neighbours in its box is always kept. Set scan_half_width=0 for strictly per-row (same ion-mobility scan) behaviour.

Operates entirely in integer (scan, TOF index) space — no unit conversion. The box max is JIT-compiled with Numba, with a pure-Python fallback.

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Source code in src/tdfpy/noise/structural.py

def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    n = intensities.size
    if n == 0:
        return np.ones(0, dtype=bool)
    # Sort by (scan, mz_idx) so each scan is a contiguous, m/z-sorted block.
    order = np.lexsort((mz_indices, scan_indices))
    scan_s = np.ascontiguousarray(scan_indices[order], dtype=np.int64)
    mz_s = np.ascontiguousarray(mz_indices[order], dtype=np.int64)
    int_s = np.ascontiguousarray(intensities[order], dtype=np.float64)

    block_start = np.zeros(num_scans, dtype=np.int64)
    block_len = np.zeros(num_scans, dtype=np.int64)
    uniq, starts, counts = np.unique(
        scan_s, return_index=True, return_counts=True
    )
    block_start[uniq] = starts
    block_len[uniq] = counts

    ref_sorted = _offcol_box_max(
        scan_s, mz_s, int_s, block_start, block_len,
        int(self.scan_half_width), int(self.mz_idx_half_width),
    )
    ref = np.empty(n, dtype=np.float64)
    ref[order] = ref_sorted
    # Keep unless strictly below the fraction of the left/right max.
    # Peaks with no off-column neighbours (ref == 0) are always kept.
    return intensities >= self.peak_fraction * ref

Precursor-space gates

Acquisition-aware MS1-only gates that drop signal the instrument never schedules for fragmentation (so it can never become an identification). Each reads the relevant region from analysis.tdf, converts it once to per-scan integer TOF-index intervals via the run calibration, and tests membership with a vectorised binary search. Both are no-ops (keep everything) when the run carries no region, so they are safe to include unconditionally.

from tdfpy import SelectionPolygonGate, DiaMs1WindowGate, MadThreshold, get_raw_peaks

# ddaPASEF: keep only MS1 inside the PASEF selection polygon, then denoise.
peaks = get_raw_peaks(td, frame_id, noise=[SelectionPolygonGate(), MadThreshold(k=3)])

# diaPASEF: keep only MS1 inside the union of isolation windows.
peaks = get_raw_peaks(td, frame_id, noise=[DiaMs1WindowGate()])

tdfpy.SelectionPolygonGate `dataclass`

SelectionPolygonGate(
    mz_pad: float = 5.0, im_pad: float = 0.05
)

Bases: NoiseFilter

Keep only MS1 points inside the ddaPASEF selection polygon.

Reads the run's IMS PolygonFilter and drops MS1 signal outside it — data the instrument never schedules as a precursor. No-op (keeps everything) when the run stores no polygon, or when the run is diaPASEF (there the same property stores window-placement quads, not one selection ring, so it is not used).

Parameters (both in physical units, widening the kept region per side so an edge precursor keeps its isotopic envelope / mobility spread rather than being clipped at a hard polygon boundary): mz_pad: m/z padding in Da (default 5.0). im_pad: 1/K0 padding (default 0.05).

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Source code in src/tdfpy/noise/gates.py

def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    noop = _ms1_only_noop(td, frame_id, intensities, "SelectionPolygonGate")
    if noop is not None:
        return noop
    key = ("polygon", num_scans, self.mz_pad, self.im_pad)
    gate = _cached(
        td, key, lambda: _build_polygon_gate(td, frame_id, num_scans, self)
    )
    if gate is None:
        logger.debug(
            "SelectionPolygonGate: no usable polygon for this run; keeping all "
            "%d points.",
            intensities.size,
        )
        return np.ones(intensities.size, dtype=bool)
    return gate.keep_mask(scan_indices, mz_indices)

tdfpy.DiaMs1WindowGate `dataclass`

DiaMs1WindowGate(mz_pad: float = 5.0, im_pad: float = 0.05)

Bases: NoiseFilter

Keep only MS1 points inside the union of diaPASEF isolation windows.

Drops MS1 signal in no isolation window — precursors the method never isolates. No-op (keeps everything) on ddaPASEF / when no windows are defined.

Parameters (physical-unit padding per side, converted once via the run calibration, so an edge precursor keeps its isotopes / mobility spread): mz_pad: m/z padding in Da (default 5.0). im_pad: 1/K0 padding (default 0.05).

keep_mask

keep_mask(
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int
) -> np.ndarray

Source code in src/tdfpy/noise/gates.py

def keep_mask(
    self,
    scan_indices: np.ndarray,
    mz_indices: np.ndarray,
    intensities: np.ndarray,
    *,
    num_scans: int,
    td: "TimsData",
    frame_id: int,
) -> np.ndarray:
    noop = _ms1_only_noop(td, frame_id, intensities, "DiaMs1WindowGate")
    if noop is not None:
        return noop
    key = ("dia_ms1", num_scans, self.mz_pad, self.im_pad)
    gate = _cached(
        td, key, lambda: _build_dia_ms1_gate(td, frame_id, num_scans, self)
    )
    if gate is None:
        logger.debug(
            "DiaMs1WindowGate: no isolation windows for this run (ddaPASEF?); "
            "keeping all %d points.",
            intensities.size,
        )
        return np.ones(intensities.size, dtype=bool)
    return gate.keep_mask(scan_indices, mz_indices)

Noise filters

Base class & coercion

tdfpy.NoiseFilter

keep_mask abstractmethod

tdfpy.coerce_filters

Intensity-threshold filters

tdfpy.IntensityThreshold dataclass

compute_threshold abstractmethod

tdfpy.AbsoluteThreshold dataclass

tdfpy.MadThreshold dataclass

tdfpy.PercentileThreshold dataclass

tdfpy.HistogramThreshold dataclass

tdfpy.BaselineThreshold dataclass

tdfpy.IterativeMedianThreshold dataclass

Structural filters

tdfpy.VerticalNoiseFilter dataclass

keep_mask

run

tdfpy.noise.VerticalNoiseDiagnostics dataclass

tdfpy.HorizontalHaloFilter dataclass

keep_mask

Precursor-space gates

tdfpy.SelectionPolygonGate dataclass

keep_mask

tdfpy.DiaMs1WindowGate dataclass

keep_mask

keep_mask `abstractmethod`

tdfpy.IntensityThreshold `dataclass`

compute_threshold `abstractmethod`

tdfpy.AbsoluteThreshold `dataclass`

tdfpy.MadThreshold `dataclass`

tdfpy.PercentileThreshold `dataclass`

tdfpy.HistogramThreshold `dataclass`

tdfpy.BaselineThreshold `dataclass`

tdfpy.IterativeMedianThreshold `dataclass`

tdfpy.VerticalNoiseFilter `dataclass`

tdfpy.noise.VerticalNoiseDiagnostics `dataclass`

tdfpy.HorizontalHaloFilter `dataclass`

tdfpy.SelectionPolygonGate `dataclass`

tdfpy.DiaMs1WindowGate `dataclass`