Baseline.BuffitBaseline#

Baseline.BuffitBaseline.py

Buffer-fit (buffer-frame polyfit) baseline.

Concept and origin#

This method was proposed by GitHub Copilot (Claude Sonnet 4.5) during an interactive session on 2026-03-11, as a direct improvement over the Legacy Percentile Method (LPM).

The key insight (quoted from the session):

“LPM’s core premise is correct: buffer frames have lower intensity than protein frames. But LPM classifies buffer frames implicitly, per q-row, under low per-row SNR. The fix is to lift the classification one level up: M.sum(axis=0) collapses all q-rows into one curve, gaining √N in SNR. In that summed elution the buffer region is unambiguously flat and low. Classify buffer frames once from this high-SNR oracle, then apply that single mask to every q-row for the polyfit.”

In short: the classification question (“is frame j buffer or protein?”) is a column-level (global) question; LPM answered it row-by-row (local). Buffit answers it once, using the full-matrix aggregate, and reuses that answer across all q-rows.

Performance (7 SEC-SAXS datasets, 2026-03-11/12)#

Method mean positive_ratio (range)#

Old library (p_final fixed 10%) 0.899 Adaptive p_final / legacy sim 0.716 Buffit (threshold = 0.10) 0.578 (single dataset) Buffit (fixed 10%, all 7) 0.529 – 0.600 arpls 0.552 – 0.590 Buffit (Otsu, recommended) 0.496 – 0.522 ← recommended Ideal (perfect baseline) ~0.5

Otsu wins all 7 tested datasets. Note: XrData default baseline is 'linear'; activate buffit via ssd.set_baseline_method('buffit') or xr.get_baseline2d(method='buffit').

References

Implemented in molass-library issues #24 (buffit), #25 (Otsu threshold), and #26 (XrData default).

compute_buffit_baseline(x, y, return_also_params=False, **kwargs)#

Compute a linear baseline anchored on buffer frames only.

Parameters:
  • x (array-like) – Frame indices (jv).

  • y (array-like) – Intensity elution profile at one q-value.

  • return_also_params (bool, optional) – If True, return (baseline, params_dict).

  • **kwargs (dict) – Optional buffer_mask (bool array, shape == y.shape), pre-computed once from the summed elution by SsMatrixData. If absent or fewer than 2 True entries, falls back to a full-frame linear fit. Passing threshold without buffer_mask has no effect and raises a UserWarning.

Returns:

  • baseline (ndarray)

  • params (dict (only if return_also_params is True)) – Keys: slope, intercept, n_buffer.