Skip to content

feat(redshift) support approx count distinct #8512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pauldheinrichs
Copy link
Contributor

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Add support for redshift approx distinct calls (i think) please feel free to make changes or whatever to help get this through. I would just like to start using this for additive measures in cube and would love this to get in sooner rather than later 😅

Docs: https://docs.aws.amazon.com/redshift/latest/dg/hyperloglog-functions.html

Issue Reference this PR resolves
Fixes: #8510

Description of Changes Made (if issue reference is not provided)

[Description goes here]

@pauldheinrichs pauldheinrichs requested a review from a team as a code owner July 24, 2024 15:06
Copy link

vercel bot commented Jul 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

8 Skipped Deployments
Name Status Preview Comments Updated (UTC)
examples-angular-dashboard ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-d3 ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-dashboard ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-data-table ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-highcharts ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-material-ui ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-react-pivot-table ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm
examples-vue-query-builder ⬜️ Ignored (Inspect) Visit Preview Sep 9, 2024 4:05pm

@github-actions github-actions bot added the pr:community Contribution from Cube.js community members. label Jul 24, 2024
@igorlukanin
Copy link
Member

If this gets through @paveltiunov's review, I think changes to this docs section would be needed: https://cube.dev/docs/product/configuration/data-sources/aws-redshift#count_distinct_approx

@pauldheinrichs
Copy link
Contributor Author

Tested with pre-aggregation:

  • Successfully created pre-aggregation SQL
 - Executing Load Pre Aggregation SQL: scheduler-ba104a65-512d-4700-aaeb-67caa9743162
--
  CREATE TABLE dev_pre_aggregations.session_funnel_main20240101_ubycvf3l_wiz2w3y_1jafait AS SELECT
      "session_funnel".step "session_funnel__step", date_trunc('day', ("session_funnel".t::timestamptz AT TIME ZONE 'UTC')) "session_funnel__time_day", count("session_funnel".user_id) "session_funnel__conversions", CASE WHEN HLL("session_funnel".first_step_user_id) > 0 THEN 100.0 * count("session_funnel".user_id) / HLL("session_funnel".first_step_user_id) ELSE NULL END "session_funnel__conversions_percent", HLL("session_funnel".first_step_user_id) "session_funnel__first_step_conversions"
    FROM

@pauldheinrichs
Copy link
Contributor Author

Validated non pre-aggregated queries,


SELECT
      date_trunc('day', ("session_funnel".t::timestamptz AT TIME ZONE 'UTC')) "session_funnel__time_day", CASE WHEN APPROXIMATE COUNT(DISTINCT "session_funnel".first_step_user_id) > 0 THEN 100.0 * count("session_funnel".user_id) / APPROXIMATE COUNT(DISTINCT "session_funnel".first_step_user_id) ELSE NULL END "session_funnel__conversions_percent"
    FROM
      (
      ```

@paveltiunov
Copy link
Member

@pauldheinrichs Thanks for contributing! I guess hllInit should use https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_CREATE_SKETCH.html

@pauldheinrichs
Copy link
Contributor Author

pauldheinrichs commented Aug 24, 2024

@pauldheinrichs Thanks for contributing! I guess hllInit should use https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_CREATE_SKETCH.html

@paveltiunov I attempted that in this 71e0882 commit actually and resulting not super succesfully when testing? Unless i tested wrong? 🤔

@paveltiunov
Copy link
Member

@pauldheinrichs Hard to tell what's went wrong but according to the doc HLL already does an estimation and it isn't something we want to happen in hllInit.

This reverts commit 71e0882.
@paveltiunov
Copy link
Member

@pauldheinrichs It feels we need HLL_CARDINALITY(HLL_COMBINE(${sql})) for hllMerge and also HLL_COMBINE(${sql}) for hllMergeOnly. But this one needs to be tested.

@igorlukanin
Copy link
Member

@pauldheinrichs Given that AWS HLL format spec is proprietary and HLL support can't be also brought to Cube Store, I wonder how useful this addition to the Redshift driver is. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source driver pr:community Contribution from Cube.js community members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Redshift HLL functions/ approx_count_distinct
4 participants