Preprocessing#
The preprocess() function applies the preprocessing pipeline defined in a FitOptions to a
GrowthData and returns a new GrowthData. The input is never modified.
from pykinbiont import preprocess, FitOptions, GrowthData
Smoothing#
opts = FitOptions(smooth=True, smooth_method="rolling_avg", smooth_pt_avg=5)
smoothed = preprocess(data, opts)
print(f"Original min: {data.curves.min():.4f}")
print(f"Smoothed min: {smoothed.curves.min():.4f}")
Supported smoothing methods:
|
Notes |
|---|---|
|
Locally weighted scatterplot smoothing |
|
Rolling mean with window |
|
Gaussian kernel, bandwidth = |
|
Uniform boxcar filter with window |
|
Pass-through (same as |
Blank subtraction#
blank_od = 0.015 # measured from blank wells
opts = FitOptions(
blank_subtraction=True,
blank_value=blank_od,
correct_negatives=True,
negative_method="thr_correction",
negative_threshold=0.001,
)
subtracted = preprocess(data, opts)
After subtraction some values may go below zero (noise in blank wells).
Set correct_negatives=True to handle them:
"remove"— removes time points where OD ≤ 0"thr_correction"— replaces values belownegative_thresholdwithnegative_threshold"blank_correction"— adds back the blank mean to floor-clamp values
Clustering#
Clustering groups curves by shape (z-normalised k-means) and attaches cluster assignments to the
returned GrowthData.
opts = FitOptions(cluster=True, n_clusters=3, kmeans_seed=42)
clustered = preprocess(data, opts)
for label, cid in zip(data.labels, clustered.clusters):
print(f" {label:12s} → cluster {cid}")
print(f"WCSS: {clustered.wcss:.4f}")
print(f"Centroid matrix shape: {clustered.centroids.shape}")
Clustering then fitting#
Because cluster=True in fit() skips model fitting entirely, the recommended pattern is
to call preprocess() first for clustering and then fit() separately:
from pykinbiont import preprocess, fit, FitOptions, ModelSpec, LogLinModel
# Step 1: cluster
opts_cluster = FitOptions(cluster=True, n_clusters=3)
clustered = preprocess(data, opts_cluster)
cluster_assignments = dict(zip(clustered.labels, map(int, clustered.clusters)))
# Step 2: fit (no cluster flag here)
spec = ModelSpec(models=[LogLinModel()], params=[[]])
opts_fit = FitOptions(smooth=True, smooth_method="rolling_avg")
results = fit(data, spec, opts_fit)
# Attach cluster info manually
df = results.to_dataframe()
df["cluster"] = df["label"].map(cluster_assignments)
Full pipeline#
opts = FitOptions(
smooth=True,
smooth_method="rolling_avg",
smooth_pt_avg=5,
blank_subtraction=True,
blank_value=0.015,
correct_negatives=True,
negative_method="thr_correction",
negative_threshold=0.001,
cut_stationary_phase=True,
)
preprocessed = preprocess(data, opts)
The pipeline runs in the order: blank subtraction → negative correction → scattering correction → smoothing → stationary-phase trimming.