Skip to content

Conversation

doswal
Copy link
Collaborator

@doswal doswal commented Sep 11, 2025

This features implements set union like addition of two Dataset Objects. In simpler words, two different searches can be combined together.

Dataset holds the StudyId(s) and respective study object(s) from the response retrieved from the query.

With two or more dataset Objects, one can now Add datasets as:

C = A+B

  • Takes whole A + Non intersecting portion of B with A

A = A+B

  • Updates A with non intersecting portions of B

Multiple Study objects can be created over same StudyId in multiple concurrent searches. However, the data stored in these objects will always be same. Programmatically, this is checked before merging two dataset when an overlapping ID is found to avoid incorrect deletion.

Freehand tests added in example notebook. pyTest tests to be added soon.

…, query builders, constants. (Motivation: Will be helpful to extend code base to Pangaea) 2. SUpport multi valued parameters by changing input type to str/list[str], coercing list values to a string using |(pipe) and adding respetive {name}AndOr flag to query. 3. Set base towards using logging package. 4. Change input params with kwargs.
if not same:
log.warning(
"Dataset union: duplicate StudyID %s with differing content. "
"Keeping left-hand version.", sid
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:
(If doing C = A + B, contents of A will be kept)

for sid, study in other.studies.items():
if sid in merged.studies:
try:
same = (merged.studies[sid].to_dict() == study.to_dict())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the line to state that same refers to content and not StudyId

@khider khider self-requested a review September 11, 2025 20:23
Copy link
Member

@khider khider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add pytest for A=A+B and C=A+B.

Test with 2 datasets (do a xmlid or NOAA study ID query).
Case 1: And B have different IDs so C should contain both A and B IDs.
Case 2: A and B have the same ID so C should look like A.
Case 3: A and B have the same ID but different content, so the warning should be printed and C should still look A.

@khider khider merged commit e8f6abc into main Sep 16, 2025
1 check passed
@khider khider deleted the Dataset-add-operator branch September 16, 2025 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants