Loading data#

pykinbiont represents growth curves as GrowthData (shared time grid) or IrregularGrowthData (per-curve time points).

GrowthData#

GrowthData stores a matrix of n_curves × n_timepoints OD values at a shared time grid. This matches the typical output of a plate reader.

From a CSV file#

The expected CSV layout is: first column = time, remaining columns = wells/curves, with column headers used as labels:

time,Well_A1,Well_A2,Well_B1
0.0,0.012,0.011,0.010
0.5,0.014,0.013,0.012
...

from pykinbiont import GrowthData

data = GrowthData.from_csv("plate_reader.csv")
print(f"{len(data.labels)} curves, {len(data.times)} time points")

From a pandas DataFrame#

import pandas as pd
from pykinbiont import GrowthData

df = pd.read_csv("plate_reader.csv")
data = GrowthData.from_dataframe(df)

from_dataframe treats the first column as time regardless of its name.

From NumPy arrays#

import numpy as np
from pykinbiont import GrowthData

times  = np.linspace(0, 20, 100)         # shape (100,)
curves = np.stack([curve_A1, curve_A2])  # shape (2, 100)

data = GrowthData(
    curves=curves,
    times=times,
    labels=["A1", "A2"],
)

The curves array must be 2-D with shape (n_curves, n_timepoints).

Subsetting#

Select a subset of wells by label:

subset = data[["Well_A1", "Well_B1"]]
print(subset.labels)   # ["Well_A1", "Well_B1"]

IrregularGrowthData#

Use IrregularGrowthData when curves have different time points (e.g., multiple experiments merged, or manual sampling at unequal intervals). pykinbiont automatically resamples all curves onto a shared [0, 1] union grid via linear interpolation.

import numpy as np
from pykinbiont import IrregularGrowthData

# Each curve has its own time vector
times_A = np.array([0.0, 1.0, 2.5, 5.0, 10.0, 20.0])
times_B = np.array([0.0, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0])

od_A = np.array([0.01, 0.02, 0.05, 0.20, 0.80, 1.10])
od_B = np.array([0.01, 0.015, 0.03, 0.08, 0.35, 0.90, 1.15])

igd = IrregularGrowthData(
    raw_curves=[od_A, od_B],
    raw_times=[times_A, times_B],
    labels=["Strain_A", "Strain_B"],
    step=0.01,   # union grid resolution in normalised [0,1] time
)

print(igd.curves.shape)   # (2, n_grid) — resampled
print(igd.times[:5])      # normalised [0,1] grid

The original data is preserved in igd.raw_curves and igd.raw_times. fit() and preprocess() accept IrregularGrowthData directly.

Accessing arrays#

Attribute	Shape	Description
`data.curves`	`(n, T)`	OD matrix (read-only)
`data.times`	`(T,)`	Shared time grid (read-only)
`data.labels`	`list[str]`	Curve identifiers
`data.clusters`	`(n,)` or `None`	Cluster assignments (after `preprocess`)
`data.centroids`	`(k, T)` or `None`	Cluster centroids (after `preprocess`)
`data.wcss`	`float` or `None`	Within-cluster sum of squares