Preprocessing#

The preprocess() function applies the preprocessing pipeline defined in a FitOptions to a GrowthData and returns a new GrowthData. The input is never modified.

from pykinbiont import preprocess, FitOptions, GrowthData

Smoothing#

opts = FitOptions(smooth=True, smooth_method="rolling_avg", smooth_pt_avg=5)
smoothed = preprocess(data, opts)

print(f"Original min: {data.curves.min():.4f}")
print(f"Smoothed min: {smoothed.curves.min():.4f}")

Supported smoothing methods:

smooth_method

Notes

"lowess"

Locally weighted scatterplot smoothing

"rolling_avg"

Rolling mean with window smooth_pt_avg

"gaussian"

Gaussian kernel, bandwidth = gaussian_h_mult × median_spacing

"boxcar"

Uniform boxcar filter with window boxcar_window

"none"

Pass-through (same as smooth=False)

Blank subtraction#

blank_od = 0.015  # measured from blank wells

opts = FitOptions(
    blank_subtraction=True,
    blank_value=blank_od,
    correct_negatives=True,
    negative_method="thr_correction",
    negative_threshold=0.001,
)
subtracted = preprocess(data, opts)

After subtraction some values may go below zero (noise in blank wells). Set correct_negatives=True to handle them:

  • "remove" — removes time points where OD ≤ 0

  • "thr_correction" — replaces values below negative_threshold with negative_threshold

  • "blank_correction" — adds back the blank mean to floor-clamp values

Clustering#

Clustering groups curves by shape (z-normalised k-means) and attaches cluster assignments to the returned GrowthData.

opts = FitOptions(cluster=True, n_clusters=3, kmeans_seed=42)
clustered = preprocess(data, opts)

for label, cid in zip(data.labels, clustered.clusters):
    print(f"  {label:12s}  →  cluster {cid}")

print(f"WCSS: {clustered.wcss:.4f}")
print(f"Centroid matrix shape: {clustered.centroids.shape}")

Clustering then fitting#

Because cluster=True in fit() skips model fitting entirely, the recommended pattern is to call preprocess() first for clustering and then fit() separately:

from pykinbiont import preprocess, fit, FitOptions, ModelSpec, LogLinModel

# Step 1: cluster
opts_cluster = FitOptions(cluster=True, n_clusters=3)
clustered = preprocess(data, opts_cluster)

cluster_assignments = dict(zip(clustered.labels, map(int, clustered.clusters)))

# Step 2: fit (no cluster flag here)
spec  = ModelSpec(models=[LogLinModel()], params=[[]])
opts_fit = FitOptions(smooth=True, smooth_method="rolling_avg")
results = fit(data, spec, opts_fit)

# Attach cluster info manually
df = results.to_dataframe()
df["cluster"] = df["label"].map(cluster_assignments)

Full pipeline#

opts = FitOptions(
    smooth=True,
    smooth_method="rolling_avg",
    smooth_pt_avg=5,
    blank_subtraction=True,
    blank_value=0.015,
    correct_negatives=True,
    negative_method="thr_correction",
    negative_threshold=0.001,
    cut_stationary_phase=True,
)
preprocessed = preprocess(data, opts)

The pipeline runs in the order: blank subtraction → negative correction → scattering correction → smoothing → stationary-phase trimming.