Skip to content

Conversation

@theferrit32
Copy link
Contributor

@theferrit32 theferrit32 commented Nov 12, 2025

Close #577
Close #587

@theferrit32 theferrit32 self-assigned this Nov 12, 2025
@jsstevenson jsstevenson requested review from jsstevenson and removed request for jsstevenson November 12, 2025 19:25
@theferrit32 theferrit32 marked this pull request as ready for review November 18, 2025 17:46
@theferrit32 theferrit32 requested review from a team as code owners November 18, 2025 17:46
Copy link
Contributor

@larrybabb larrybabb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to others for the final review on this PR. It looks comprehensive and well covered from a testing standpoint.
Great job Kyle!

@theferrit32
Copy link
Contributor Author

Additional changes may be needed.

See: #592

Copy link
Contributor

@jsstevenson jsstevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Generally looks good, some small things/maybe stuff to talk through tomorrow

info_field_num,
"Integer",
(
"The repeat subunit length values from ReferenceLengthExpression states for the GA4GH VRS "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be repeat_subunit_length or repeatSubunitLength or something just to make it clear that it's a specific property

vrs_field_data,
assembly,
vrs_data_key=data,
vrs_data_key=data, # TODO unused?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be okay to remove. It appears that this was used historically to add individual alleles as a key into an accumulated dictionary of everything in the VCF. Doesn't seem to get used anymore, and it feels clumsy to be using a tab-separated string rather than a tuple or something anyway.

key = vrs_data_key if vrs_data_key else vcf_coords

Comment on lines +55 to +57
# Short State.sequence will be included in output VCF if <= this value,
# otherwise output will be emitted as the "." character.
MAX_LITERAL_STATE_LENGTH = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholder for thinking through implications of this, briefly mentioned on slack

# Pysam outputs "." for missing values.
record.info[k.value] = [
value or k.default_value() for value in vrs_field_data[k.value]
None if v in ("", None) else v for v in vrs_field_data[k.value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double checking -- does this convert sequences that are an empty string into None/"."?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Same-as-reference Alleles should have a ReferenceLengthExpression state Add RLE params to VRS-annotated VCFs

4 participants