# Loading data

pykinbiont represents growth curves as {py:class}`~pykinbiont.GrowthData` (shared time grid)
or {py:class}`~pykinbiont.IrregularGrowthData` (per-curve time points).

## GrowthData

`GrowthData` stores a matrix of `n_curves × n_timepoints` OD values at a **shared** time grid.
This matches the typical output of a plate reader.

### From a CSV file

The expected CSV layout is: **first column = time**, remaining columns = wells/curves, with
column headers used as labels:

```
time,Well_A1,Well_A2,Well_B1
0.0,0.012,0.011,0.010
0.5,0.014,0.013,0.012
...
```

```python
from pykinbiont import GrowthData

data = GrowthData.from_csv("plate_reader.csv")
print(f"{len(data.labels)} curves, {len(data.times)} time points")
```

### From a pandas DataFrame

```python
import pandas as pd
from pykinbiont import GrowthData

df = pd.read_csv("plate_reader.csv")
data = GrowthData.from_dataframe(df)
```

`from_dataframe` treats the first column as time regardless of its name.

### From NumPy arrays

```python
import numpy as np
from pykinbiont import GrowthData

times  = np.linspace(0, 20, 100)         # shape (100,)
curves = np.stack([curve_A1, curve_A2])  # shape (2, 100)

data = GrowthData(
    curves=curves,
    times=times,
    labels=["A1", "A2"],
)
```

The `curves` array must be 2-D with shape `(n_curves, n_timepoints)`.

### Subsetting

Select a subset of wells by label:

```python
subset = data[["Well_A1", "Well_B1"]]
print(subset.labels)   # ["Well_A1", "Well_B1"]
```

## IrregularGrowthData

Use `IrregularGrowthData` when curves have **different time points** (e.g., multiple experiments
merged, or manual sampling at unequal intervals). pykinbiont automatically resamples all curves
onto a shared `[0, 1]` union grid via linear interpolation.

```python
import numpy as np
from pykinbiont import IrregularGrowthData

# Each curve has its own time vector
times_A = np.array([0.0, 1.0, 2.5, 5.0, 10.0, 20.0])
times_B = np.array([0.0, 0.5, 1.0, 2.0, 4.0, 8.0, 16.0])

od_A = np.array([0.01, 0.02, 0.05, 0.20, 0.80, 1.10])
od_B = np.array([0.01, 0.015, 0.03, 0.08, 0.35, 0.90, 1.15])

igd = IrregularGrowthData(
    raw_curves=[od_A, od_B],
    raw_times=[times_A, times_B],
    labels=["Strain_A", "Strain_B"],
    step=0.01,   # union grid resolution in normalised [0,1] time
)

print(igd.curves.shape)   # (2, n_grid) — resampled
print(igd.times[:5])      # normalised [0,1] grid
```

The original data is preserved in `igd.raw_curves` and `igd.raw_times`.
`fit()` and `preprocess()` accept `IrregularGrowthData` directly.

## Accessing arrays

| Attribute | Shape | Description |
|---|---|---|
| `data.curves` | `(n, T)` | OD matrix (read-only) |
| `data.times` | `(T,)` | Shared time grid (read-only) |
| `data.labels` | `list[str]` | Curve identifiers |
| `data.clusters` | `(n,)` or `None` | Cluster assignments (after `preprocess`) |
| `data.centroids` | `(k, T)` or `None` | Cluster centroids (after `preprocess`) |
| `data.wcss` | `float` or `None` | Within-cluster sum of squares |