-
Notifications
You must be signed in to change notification settings - Fork 38
Longitudinal and new qc_metrics #967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These only make sense for categorical data, since for floats this will be not very meaningful.
Cool
Comment on categorical vs numeric above applies
Cool from above applies
All of this require knowledge on categorical and numerical variables - comment above applies. What do you think about having by default a qc metrics which does only compute the things without feature type information needed; and have e.g. an argument for qc metrics to be computed that is a list, and when someone wants the fancy stuff you suggest here that requires numeric/categorical distinction, they'd need to run infer_feature_types first? |
|
Can you also while resolving merge-conflicts move the ARRAY_TYPES variable to |
…to enhancement/issue-950 after new updates from the main
for more information, see https://pre-commit.ci
…rical & numerical vars, tests applied to both advanced=True and advanced=False
…to enhancement/issue-950
eroell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some intermediate comments - not sure you already asked Andreas to review or still refining things :)
| qc_vars: Collection[str] = (), | ||
| *, | ||
| layer: str | None = None, | ||
| advanced: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be made two arguments
observation_level and variable_level, which take lists of strings and by default the lists are what the current default is?
It would seem to me a bit more readable than "advanced"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I'd try to keep the number of parameters as low as possible. Can we come up with design where these parameters do not exist? Like a scenario where some things would be skipped over unless the feature specs were calculated but it doesn't crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the function computes too many things at once where it adds even more complexity computing 8 metrics in some cases and 12 in another without changing the passed argument. What do you think? I don't have a strong opinion here
The way to address your design that I'd see would be to document well what will be computed if no feature types are found, and what will be computed additionally if they are found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah kinda like that. If possible, we should purge all and any parameters. Users rarely read API docs.
agerardy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't find any big issues, looks pretty good to me already. tests seem to cover everything as far as I understand it.
|
Eventually, I'd like to do a final review because there's a few things that I think need to be changed. Could you please update the PR description? |
for more information, see https://pre-commit.ci
…to enhancement/issue-950
fixes #950
improved qc_metrics function with new metrics:
in
_compute_obs_metrics:unique_values_absunique_values_ratioentropy_of_missingnessin
_compute_var_metrics:unique_values_absunique_values_ratioentropy_of_missingnesscoefficient_of_variationis_constantconstant_variable_ratiorange_ratioupdated tests accordingly with new metrics
TODO