-
Notifications
You must be signed in to change notification settings - Fork 14
Feat: add origins in update db script #1409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a61c5ea
to
164b4ea
Compare
if self.related_data_only: | ||
existing_issues_ids = set( | ||
Issues.objects.using(self.dashboard_conn_name).values_list( | ||
"id", flat=True | ||
) | ||
) | ||
|
||
if len(existing_issues_ids) == 0: | ||
return [] | ||
issue_id_placeholders = ",".join(["%s"] * len(existing_issues_ids)) | ||
related_condition = f"AND issue_id IN ({issue_id_placeholders})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the query, this code seems duplicated many times. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor for a wrapper, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that I could use a function to replace
issue_id_placeholders = ",".join(["%s"] * len(existing_issues_ids))
related_condition = f"AND issue_id IN ({issue_id_placeholders})"
but the rest seems like they would need to stay. Are you thinking of something else or is that it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left another comment about it
self.origins: list[str] | ||
self.origin_condition: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not initialize or make Optional, the same as the other fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that I didn't really need to initialize them or use Optional if the first thing the command does is assigning those variables. Do you think it's better with assign or without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_interval
, end_interval
, and related_data_only
were also assigned as the first thing, and you assigned them
Since we removed the foreign key constraints, we don't need to limit the selected data to the related ones. With this option we can select if we want to fetch only related items or not.
164b4ea
to
e0fb298
Compare
self.origins: list[str] | ||
self.origin_condition: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_interval
, end_interval
, and related_data_only
were also assigned as the first thing, and you assigned them
if self.related_data_only: | ||
checkout_ids = set( | ||
( | ||
Checkouts.objects.using(self.dashboard_conn_name) | ||
.filter( | ||
field_timestamp__gte=self.start_timestamp, | ||
field_timestamp__lte=self.end_timestamp, | ||
) | ||
.values_list("id", flat=True) | ||
) | ||
.values_list("id", flat=True) | ||
) | ||
) | ||
|
||
if len(checkout_ids) == 0: | ||
return [] | ||
checkout_id_placeholders = ",".join(["%s"] * len(checkout_ids)) | ||
if len(checkout_ids) == 0: | ||
return [] | ||
checkout_id_placeholders = ",".join(["%s"] * len(checkout_ids)) | ||
related_condition = ( | ||
f"AND builds.checkout_id IN ({checkout_id_placeholders})" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a method that returns a tuple of string
and set
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will avoid duplicating code, just use as parameters an array for defining the ids, the field for related_condition
(builds.checkout_id
)
if self.related_data_only: | ||
existing_build_ids = set( | ||
Builds.objects.using(self.dashboard_conn_name) | ||
.filter( | ||
field_timestamp__gte=self.start_timestamp, | ||
field_timestamp__lte=self.end_timestamp, | ||
) | ||
.values_list("id", flat=True) | ||
) | ||
.values_list("id", flat=True) | ||
) | ||
|
||
if len(existing_build_ids) == 0: | ||
return [] | ||
build_id_placeholders = ",".join(["%s"] * len(existing_build_ids)) | ||
if len(existing_build_ids) == 0: | ||
return [] | ||
build_id_placeholders = ",".join(["%s"] * len(existing_build_ids)) | ||
related_condition = f"AND build_id IN ({build_id_placeholders})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method can be used here
if self.related_data_only: | ||
existing_issues_ids = set( | ||
Issues.objects.using(self.dashboard_conn_name).values_list( | ||
"id", flat=True | ||
) | ||
) | ||
|
||
if len(existing_issues_ids) == 0: | ||
return [] | ||
issue_id_placeholders = ",".join(["%s"] * len(existing_issues_ids)) | ||
if len(existing_issues_ids) == 0: | ||
return [] | ||
issue_id_placeholders = ",".join(["%s"] * len(existing_issues_ids)) | ||
related_condition = f"AND issue_id IN ({issue_id_placeholders})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
Description
The update_db script that moves data from kcidb to a separate db can have an optional origins filter so that we can avoid querying unnecessary data. Not only that but the current script still tries to follow the foreign key constraint by limiting results to the ones that have the related data in the target db, but that should be optional.
Changes
How to test
Use the update_db command with whatever interval; test the arguments --related-data-only (when set, the command should update less items) and --origins. Combine the origins to check if the counting is right (you can also check in kcidb directly if you want).
Closes #1401