-
Notifications
You must be signed in to change notification settings - Fork 4
Description
The build process selects a best annotated gene model for each internal node. There are various qualities that a good representative gene model has. These include a name, a meaningful description, curation, etc. The current pipeline picks node representatives when constructing the trees, and genes are decorated with their closest well annotated homolog or closest model species homolog.
This code should be updated and possibly moved to a module so it can be applied to a pruned gene tree focused on a subset of species coming from the genetrees.org api.
Also, it would be useful to mark internal nodes with aggregated functional annotations such as interpro domains, pathways, expression profiles, etc. That way, you could search for gene families using these criteria and display them.
For expression data, compute average (maximum?) expression levels of genes in the subtree from a study and store in expr__<studyId>__<sampleId> fields.