-
Notifications
You must be signed in to change notification settings - Fork 1k
Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6965 +/- ##
==========================================
- Coverage 98.69% 98.69% -0.01%
==========================================
Files 79 79
Lines 14685 14676 -9
==========================================
- Hits 14494 14485 -9
Misses 191 191 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Should we just use
Line 1032 in b0ef41e
I'm not sure our case maps perfectly well onto that of {matrixStats}, as noted in the linked issue, the R itself apparently keeps going back and forth on using r-devel/r-svn@cbdae1e At a minimum, if we use |
I think there's a benefit to having consistency with the base R implementation which I guess leans me towards I'm somewhat ignorant to the performance/precision pros and cons. It could be interesting to see this quantified in any resulting PR before committing to any change. I guess my feeling is it's not something I'd rush to change but, provided there are no significant disadvantages, aiming for greater consistency with base R seems like a good thing. |
@MichaelChirico - just seen what you mean. I'd need to study the code further to grok the implications. |
Hi @MichaelChirico , @TimTaylor ,
Rationale for this approach:
If this balance doesn't align with data.table's design goals, I'm happy to: Please let me know your preferred direction. I appreciate your guidance on balancing precision and performance in this critical path. |
What I'd like to see is a numeric study of the differences. |
Thanks for your feedback. I’ll try to work on a numeric study comparing the precision and performance and will share the results here as soon as possible. |
Hi @MichaelChirico, Thanks for the direction regarding precision. I’ve completed a preliminary numeric and performance study comparing sqrt() and (double)sqrtl((long double)x) as applied in gsumm.c. Here's a quick summary:
Performance Comparison (performance_test.R)
On average, the sqrtl version was ~50-60% slower, though still well within reasonable performance for typical datasets. |
please include a repro script |
This comment was marked as off-topic.
This comment was marked as off-topic.
Lines 1055 to 1056 in cab5a5d
This (PR lines above) is the wrong approach AFAICT. You have already lost the precision of v after the first line. Something like the following looks more appropriate ansd[i] = isSD ? (double) sqrtl(v/(nna-1)) : (double) v/(nna-1); Also those performance benchmarks are a little odd. You are testing base R and Rcpp implementations not the C changes you proposed. |
Closes #6938
This PR addresses the unused HAVE_LONG_DOUBLE conditional in src/gsumm.c and simplifies the handling of square root calculations for variance and standard deviation. The following changes have been implemented:
Direct Use of sqrtl:
All tests pass locally, and there are no regressions in numerical results.
Kindly review when you have time.
Thank you!