Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965

venom1204 · 2025-05-12T17:38:03Z

This PR addresses the unused HAVE_LONG_DOUBLE conditional in src/gsumm.c and simplifies the handling of square root calculations for variance and standard deviation. The following changes have been implemented:

Removed Unused Conditional Compilation Block:
The preprocessor block checking for HAVE_LONG_DOUBLE (which was never defined in the package) has been removed to eliminate dead code and potential confusion.

Direct Use of sqrtl:

All occurrences of the SQRTL macro have been replaced with direct calls to sqrtl. This ensures that calculations use the higher-precision long double square root function, aligning with R's original intent for statistical computations.

All tests pass locally, and there are no regressions in numerical results.

Kindly review when you have time.
Thank you!

codecov · 2025-05-12T17:45:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.69%. Comparing base (17160aa) to head (cab5a5d).
Report is 12 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6965      +/-   ##
==========================================
- Coverage   98.69%   98.69%   -0.01%     
==========================================
  Files          79       79              
  Lines       14685    14676       -9     
==========================================
- Hits        14494    14485       -9     
  Misses        191      191

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MichaelChirico · 2025-05-12T21:05:28Z

Should we just use sqrt()? cc @TimTaylor

sqrt() has signature double sqrt(double arg);
that matches the type of ansd:

data.table/src/gsumm.c

Line 1032 in b0ef41e

double *ansd = REAL(ans);

I'm not sure our case maps perfectly well onto that of {matrixStats}, as noted in the linked issue, the HAVE_LONG_DOUBLE use there only concerns summed doubles, which is a lot less finicky operation as compared to sqrt()... I guess the thinking in base R is that we can maybe achieve a tiny bit better accuracy by doing (double)sqrtl((long double)x) as opposed to just sqrt(x). Could be.

R itself apparently keeps going back and forth on using sqrt() vs sqrtl(), e.g.

r-devel/r-svn@cbdae1e
r-devel/r-svn@109c6cb
r-devel/r-svn@0286a2e
r-devel/r-svn@a916f3f

At a minimum, if we use sqrtl() it would be appropriate IMO to have the cast to double explicit (as it is in cov.c).

TimTaylor · 2025-05-12T21:29:02Z

I think there's a benefit to having consistency with the base R implementation which I guess leans me towards sqrtl() (although I've not looked at the code to say whether the implementation deviates elsewhere).

I'm somewhat ignorant to the performance/precision pros and cons. It could be interesting to see this quantified in any resulting PR before committing to any change.

I guess my feeling is it's not something I'd rush to change but, provided there are no significant disadvantages, aiming for greater consistency with base R seems like a good thing.

TimTaylor · 2025-05-12T22:19:25Z

@MichaelChirico - just seen what you mean. I'd need to study the code further to grok the implications.

venom1204 · 2025-05-16T17:55:05Z

Hi @MichaelChirico , @TimTaylor ,
Thank you for the insightful discussion. Based on your feedback about precision considerations and R's implementation patterns, I've modified the approach to include explicit type casting in gsumm.c:

ansd[i] = (double)sqrtl((long double)val);

Rationale for this approach:

Precision Preservation: Maintains long double precision during sqrt calculation before downcasting to double, aligning with R's practice in src/library/stats/src/cov.c.
Type Safety: Explicit casting avoids implicit conversion warnings and ensures compatibility with REAL(ans) type requirements.

If this balance doesn't align with data.table's design goals, I'm happy to:
a) Revert to plain sqrt() with your confirmation
b) Explore alternative precision-preserving strategies

Please let me know your preferred direction. I appreciate your guidance on balancing precision and performance in this critical path.

MichaelChirico · 2025-05-16T17:58:57Z

What I'd like to see is a numeric study of the differences.

venom1204 · 2025-05-16T18:07:24Z

Thanks for your feedback. I’ll try to work on a numeric study comparing the precision and performance and will share the results here as soon as possible.

venom1204 · 2025-05-16T19:40:31Z

Hi @MichaelChirico,

Thanks for the direction regarding precision. I’ve completed a preliminary numeric and performance study comparing sqrt() and (double)sqrtl((long double)x) as applied in gsumm.c. Here's a quick summary:
Precision Comparison
I ran tests on a wide range of values including edge cases (very small and very large numbers). Results:

For typical values (e.g., 2.0, 1e-8, 1e15), no precision difference was observed. Hex representations matched exactly.
For very large values like 1.797693e+308, there was a measurable difference:
sqrt() returned a slightly smaller result than (double)sqrtl(...), with a difference of ~1.49e+138.

Performance Comparison (performance_test.R)
Using microbenchmark in R:

Unit: milliseconds
      expr      min       lq     mean   median       uq       max neval
    R_sqrt   39.54    40.97    43.15    42.93    44.71    48.79    10
 Cpp_sqrtl 49.57    53.12    64.91    59.54    62.69   128.75    10

On average, the sqrtl version was ~50-60% slower, though still well within reasonable performance for typical datasets.

MichaelChirico · 2025-05-16T19:43:02Z

please include a repro script

TimTaylor · 2025-05-16T21:57:40Z

@venom1204 / @MichaelChirico

data.table/src/gsumm.c

Lines 1055 to 1056 in cab5a5d

    
           ansd[i] = (double)v/(nna-1); 
        
           if (isSD) ansd[i] = (double)sqrtl((long double)ansd[i]);

This (PR lines above) is the wrong approach AFAICT. You have already lost the precision of v after the first line.
Something like the following looks more appropriate

ansd[i] = isSD ? (double) sqrtl(v/(nna-1)) : (double) v/(nna-1);

Also those performance benchmarks are a little odd. You are testing base R and Rcpp implementations not the C changes you proposed.

reduced to sqrtl

c6cfe19

venom1204 requested a review from ben-schwen as a code owner May 12, 2025 17:38

updated sqrtl

cab5a5d

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965

Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965

Uh oh!

venom1204 commented May 12, 2025

Uh oh!

codecov bot commented May 12, 2025 •

edited

Loading

Uh oh!

MichaelChirico commented May 12, 2025 •

edited

Loading

Uh oh!

TimTaylor commented May 12, 2025

Uh oh!

TimTaylor commented May 12, 2025

Uh oh!

venom1204 commented May 16, 2025

Uh oh!

MichaelChirico commented May 16, 2025

Uh oh!

venom1204 commented May 16, 2025

Uh oh!

venom1204 commented May 16, 2025 •

edited

Loading

Uh oh!

MichaelChirico commented May 16, 2025

Uh oh!

This comment was marked as off-topic.

TimTaylor commented May 16, 2025

Uh oh!

Uh oh!

Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965

Are you sure you want to change the base?

Remove Unused HAVE_LONG_DOUBLE Conditional and Use sqrtl Directly #6965

Uh oh!

Conversation

venom1204 commented May 12, 2025

Uh oh!

codecov bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MichaelChirico commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimTaylor commented May 12, 2025

Uh oh!

TimTaylor commented May 12, 2025

Uh oh!

venom1204 commented May 16, 2025

Uh oh!

MichaelChirico commented May 16, 2025

Uh oh!

venom1204 commented May 16, 2025

Uh oh!

venom1204 commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelChirico commented May 16, 2025

Uh oh!

This comment was marked as off-topic.

TimTaylor commented May 16, 2025

Uh oh!

Uh oh!

codecov bot commented May 12, 2025 •

edited

Loading

MichaelChirico commented May 12, 2025 •

edited

Loading

venom1204 commented May 16, 2025 •

edited

Loading