Loading data#

pykinbiont represents growth curves as GrowthData (shared time grid) or IrregularGrowthData (per-curve time points).

GrowthData#

GrowthData stores a matrix of n_curves × n_timepoints OD values at a shared time grid. This matches the typical output of a plate reader.

From a CSV file#

The expected CSV layout is: first column = time, remaining columns = wells/curves, with column headers used as labels:

time,Well_A1,Well_A2,Well_B1
0.0,0.012,0.011,0.010
0.5,0.014,0.013,0.012
...
from pykinbiont import GrowthData

data = GrowthData.from_csv("plate_reader.csv")
print(f"{len(data.labels)} curves, {len(data.times)} time points")

From a pandas DataFrame#

import pandas as pd
from pykinbiont import GrowthData

df = pd.read_csv("plate_reader.csv")
data = GrowthData.from_dataframe(df)

from_dataframe treats the first column as time regardless of its name.

From NumPy arrays#

import numpy as np
from pykinbiont import GrowthData

times  = np.linspace(0, 20, 100)         # shape (100,)
curves = np.stack([curve_A1, curve_A2])  # shape (2, 100)

data = GrowthData(
    curves=curves,
    times=times,
    labels=["A1", "A2"],
)

The curves array must be 2-D with shape (n_curves, n_timepoints).

Subsetting#

Select a subset of wells by label:

subset = data[["Well_A1", "Well_B1"]]
print(subset.labels)   # ["Well_A1", "Well_B1"]

IrregularGrowthData#

Use IrregularGrowthData when curves have different time points (e.g., multiple experiments merged, or manual sampling at unequal intervals). pykinbiont automatically resamples all curves onto a shared [0, 1] union grid via linear interpolation.

import numpy as np
from pykinbiont import IrregularGrowthData

# Each curve has its own time vector
times_A = np.array([0.0, 1.0, 2.5, 5.0, 10.0, 20.0])
times_B = np.array([0.0, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0])

od_A = np.array([0.01, 0.02, 0.05, 0.20, 0.80, 1.10])
od_B = np.array([0.01, 0.015, 0.03, 0.08, 0.35, 0.90, 1.15])

igd = IrregularGrowthData(
    raw_curves=[od_A, od_B],
    raw_times=[times_A, times_B],
    labels=["Strain_A", "Strain_B"],
    step=0.01,   # union grid resolution in normalised [0,1] time
)

print(igd.curves.shape)   # (2, n_grid) — resampled
print(igd.times[:5])      # normalised [0,1] grid

The original data is preserved in igd.raw_curves and igd.raw_times. fit() and preprocess() accept IrregularGrowthData directly.

Accessing arrays#

Attribute

Shape

Description

data.curves

(n, T)

OD matrix (read-only)

data.times

(T,)

Shared time grid (read-only)

data.labels

list[str]

Curve identifiers

data.clusters

(n,) or None

Cluster assignments (after preprocess)

data.centroids

(k, T) or None

Cluster centroids (after preprocess)

data.wcss

float or None

Within-cluster sum of squares