Skip to content

Commit def0c2b

Browse files
authored
Add files via upload
1 parent fe97c41 commit def0c2b

File tree

2 files changed

+4
-21
lines changed

2 files changed

+4
-21
lines changed

Jacques_Murphy.qmd

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Count data are often treated as continuous data and therefore modeled by a
2525
Gaussian distribution, this assumption is particularly poor when the
2626
measured counts are low. Instead, we use the reference
2727
distribution for count data which is the Poisson distribution
28-
[@Agresti_2002; @Inouye1998].
28+
[@Agresti_2002; @Inouye2017].
2929

3030
When a data set is heterogeneous, clustering allows to extract
3131
homogeneous subsets from the whole data set. Many clustering methods,
@@ -830,9 +830,7 @@ selection is conducted on this regression, if the regression model has any
830830
predictor variables, then the variable is called "redundant" and if the
831831
regression model has no predictor variables, then the variable is called
832832
"irrelevant".
833-
@fig-Running-result1 shows the cluster mean for each variable, where the
834-
label indicates if the variable is irrelevant for clustering ("I"),
835-
redundant ("R") or useful (the label is then the cluster number).
833+
@fig-Running-result1 shows the cluster mean for each variable, where the label indicates if the variable is irrelevant for clustering (“I”), redundant (“R”) or useful (then the point is unlabelled).
836834

837835
```{r plotrunning, echo=FALSE, fig.align='center', cache = TRUE}
838836
#| label: fig-Running-result1
@@ -970,11 +968,11 @@ slightly better results than Cluster 6, starting more carefully, and
970968
managing to run until the end of the race, even if the pace of the last
971969
hours is not very constant.
972970

973-
Finally, @fig-Running-result2 shows boxplots of the total number of loops covered by the runners in each of the clusters.
971+
Finally, @fig-Running-result2 shows boxplots of the total number of loops covered by the runners of each of the clusters.
974972

975973
```{r ARIrunning, echo = FALSE, message = FALSE, warning=FALSE, fig.align='center'}
976974
#| label: fig-Running-result2
977-
#| fig-cap: "Number of loops covered by the runners of each clusters."
975+
#| fig-cap: "Number of loops covered by the runners of each of the clusters."
978976
df <- data.frame(
979977
total_laps = rowSums(x),
980978
cluster = as.factor(map(Z))

references.bib

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -29,21 +29,6 @@ @article{computo
2929
year = {2020},
3030
}
3131

32-
@article{Inouye1998,
33-
author = {Inouye, David I. and Yang, Eunho and Allen, Genevera I. and Ravikumar, Pradeep},
34-
title = {A review of multivariate distributions for count data derived from the Poisson distribution},
35-
journal = {WIREs Computational Statistics},
36-
volume = {9},
37-
number = {3},
38-
pages = {e1398},
39-
keywords = {Poisson, Multivariate, Graphical models, Copulas, High dimensional},
40-
doi = {https://doi.org/10.1002/wics.1398},
41-
url = {https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wics.1398},
42-
eprint = {https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wics.1398},
43-
abstract = {The Poisson distribution has been widely studied and used for modeling univariate count-valued data. However, multivariate generalizations of the Poisson distribution that permit dependencies have been far less popular. Yet, real-world, high-dimensional, count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: (1) where the marginal distributions are Poisson, (2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and (3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent Discussion section. WIREs Comput Stat 2017, 9:e1398. doi: 10.1002/wics.1398 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Multivariate Analysis},
44-
year = {2017}
45-
}
46-
4732
@article{Rau2015,
4833
author = {Rau, Andrea and Maugis-Rabusseau, Cathy and Martin-Magniette, Marie-Laure and Celeux, Gilles},
4934
title = "{Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models}",

0 commit comments

Comments
 (0)