-
Notifications
You must be signed in to change notification settings - Fork 37
VCF support for RLE and fix to same-as-reference allele normalization to RLE #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…s <= MAX_LITERAL_STATE_LENGTH, defaulted to 15
larrybabb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will defer to others for the final review on this PR. It looks comprehensive and well covered from a testing standpoint.
Great job Kyle!
|
Additional changes may be needed. See: #592 |
jsstevenson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Generally looks good, some small things/maybe stuff to talk through tomorrow
| info_field_num, | ||
| "Integer", | ||
| ( | ||
| "The repeat subunit length values from ReferenceLengthExpression states for the GA4GH VRS " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be repeat_subunit_length or repeatSubunitLength or something just to make it clear that it's a specific property
| vrs_field_data, | ||
| assembly, | ||
| vrs_data_key=data, | ||
| vrs_data_key=data, # TODO unused? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be okay to remove. It appears that this was used historically to add individual alleles as a key into an accumulated dictionary of everything in the VCF. Doesn't seem to get used anymore, and it feels clumsy to be using a tab-separated string rather than a tuple or something anyway.
| key = vrs_data_key if vrs_data_key else vcf_coords |
| # Short State.sequence will be included in output VCF if <= this value, | ||
| # otherwise output will be emitted as the "." character. | ||
| MAX_LITERAL_STATE_LENGTH = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placeholder for thinking through implications of this, briefly mentioned on slack
| # Pysam outputs "." for missing values. | ||
| record.info[k.value] = [ | ||
| value or k.default_value() for value in vrs_field_data[k.value] | ||
| None if v in ("", None) else v for v in vrs_field_data[k.value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checking -- does this convert sequences that are an empty string into None/"."?
Close #577
Close #587