Project Idea: Biological Trajectory Clustering

Background

The analysis of connectivity due to dispersal of biological particles (e.g. larvae, eggs, pathogens) involves the physical simulation of pathways followed by passive particles as they drift with ocean currents, but also means modelling the under-way development of these particles (e.g. growth or death due to availability or shortness of food or due to temperature changes along the way). While physical mechanisms governing dispersal are generally well understood, there is a lot of uncertainty in our biological understanding. Hence, we'd often like to test many different sets of biological parameters based on the same set of physical trajectories.

As a tool for this work, we'd like to be able to provide experts in the biological mechanisms with a way to test their understanding by exploring a physical dispersal simulation. A simple example of such a product is shown below.

Problem

We have of the order of hundreds of Gigabytes of trajectories of simulated particles in the North Sea region. These trajectories consist of multiple time series describing the geographic location, $z(t)$, $y(t)$, $x(t)$, and ambient conditions such as temperature, $T(t)$, salinity, $S(t)$.

$$\mathrm{traj}_n = {t\in[t_{ns}, t_{ne}]: z_n(t), y_n(t), x_n(t), T_n(t), S_n(t)}$$

We split the North Sea region into hexagons $h$ of approximately 10km radius and split all trajectories into groups $\mathcal{T}_{h_0, h_1}$ according to the hexagon $h_0$ they start in and the hexagon $h_1$ they end in.

$$\mathcal{T}_{h_0, h_1} = {\mathrm{traj}_n: (x_n(t_{ns}), y_n(t_{ns})) \in h_0, (x_n(t_{ne}), y_n(t_{ne})) \in h_1}$$

Then, for each trajectory, we apply our biological model and, for example, estimate the probability $p(\mathrm{traj}_n)$ that a biological particle following the trajectory actually survives the journey and could settle in the final location:

$$p(\mathrm{traj}_n) = \mathrm{biology}(z_n(t), y_n(t), x_n(t), T_n(t), S_n(t))$$

Finally, we calculate a connection probability between $h_0$ and $h_1$ as by averaging the survival probabilities of all trajectories connecting $h_0$ and $h_1$:

$$p_{h_0, h_1} = \frac{\sum_n p(\mathrm{traj}_n)}{\sum_n 1}$$

However, handling the raw trajectory data each time a new biological model needs to be tested means repeated processing hundreds of Gigabyes of data. On the other hand, we can assume that the number of degrees of freedom in $\mathcal{T}_{h_0, h_1}$ is much smaller than its size. Or in other words, the number of trajectories that are significantly different from the perspective of given classes of biological models is a lot smaller than the total number of trajectories connecting $h_0$ and $h_1$.

Possible solutions

Identifying the relevant degrees of freedom in $\mathcal{T}_ {h_0, h_1}$ can be understood as an unsupervised-learning task. If we can identify clusters $C$ of trajectories which lead to the same (or sufficiently similar) survival probabilities under a given class of biological models, we can reduce the effort for estimating $p_{h_0, h_1}$ to

$$p_{h_0, h_1} \approx \frac{\sum_{C} w_C p(\mathrm{traj}_C)}{\sum_C w_C}$$

where $w_C$ is the size of weight of the cluster $C$ and $\mathrm{traj}_C$ is a typical trajectory from the cluster $C$.

Data

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Idea: Biological Trajectory Clustering

Background

Problem

Possible solutions

Data

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

geomar-od-lagrange/2025_Project-Idea-Biological-Trajectory-Clustering

Folders and files

Latest commit

History

Repository files navigation

Project Idea: Biological Trajectory Clustering

Background

Problem

Possible solutions

Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages