Skip to content

process_consequences filters some variants unexpectedly #805

@mike-w-wilson

Description

@mike-w-wilson

Wenhan found a minor issue with our process_consequences function:

"flag a minor, non‑urgent issue with gnomAD v0.8.1+...When we run process_consequence() on our VEP table and then explode & filter for transcripts whose gene_id starts with “ENSG”, we are losing some variants. It looks like those variants only have gene_id annotations from alternative sources, so they get dropped.... our hypothesis is that this is caused by this line, which find() a random gene id for each variant and annotate to the table. our current workaround was to first filter to ht = ht.annotate(vep=ht.vep.annotate(transcript_consequences=ht.vep.transcript_consequences.filter(lambda x: x.gene_id.startswith('ENSG'))) before applying the function

Image

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions