-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
Description
Wenhan found a minor issue with our process_consequences function:
"flag a minor, non‑urgent issue with gnomAD v0.8.1+...When we run process_consequence() on our VEP table and then explode & filter for transcripts whose gene_id starts with “ENSG”, we are losing some variants. It looks like those variants only have gene_id annotations from alternative sources, so they get dropped.... our hypothesis is that this is caused by this line, which find() a random gene id for each variant and annotate to the table. our current workaround was to first filter to ht = ht.annotate(vep=ht.vep.annotate(transcript_consequences=ht.vep.transcript_consequences.filter(lambda x: x.gene_id.startswith('ENSG'))) before applying the function
