Skip to content

Clickhouse string ordering and string filtering by UTF8 instead of bytes #6143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

casab
Copy link

@casab casab commented Feb 9, 2023

Clickhouse defaults to using bytes to order by and string manipulation functions such as lower, upper uses ascii. To overcome this limitation they have COLLATE keyword, and lowerUTF8, upperUTF8 functions.

Check List

  • Tests has been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

  • Replaced CONCAT SQL function with js template literal to prevent unnecessary DB function call
  • Used lowerUTF8 instead of lower to support utf8 compatible search
  • Added COLLATE ‘en’ when ordering by strings to order incasesensitive. Clickhouse orders by bytes on default.

@casab casab requested a review from a team as a code owner February 9, 2023 14:32
@github-actions github-actions bot added the pr:community Contribution from Cube.js community members. label Feb 9, 2023
@paveltiunov
Copy link
Member

Hey @casab ! Thanks for contributing! Could you please add an integration test for it?

@paveltiunov paveltiunov self-assigned this Mar 4, 2023
Copy link

vercel bot commented Jun 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

8 Skipped Deployments
Name Status Preview Comments Updated (UTC)
examples-angular-dashboard ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-d3 ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-dashboard ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-data-table ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-highcharts ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-material-ui ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-pivot-table ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-vue-query-builder ⬜️ Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm

@casab casab force-pushed the feature/clickhouse_utf8_filter_order branch from 645512c to 63b935c Compare June 7, 2024 14:56
Copy link

vercel bot commented Jun 7, 2024

@casab is attempting to deploy a commit to the Cube Dev Team on Vercel.

A member of the Team first needs to authorize it.

@igorlukanin
Copy link
Member

@casab May I kindly ask you to test with the latest release and rebase your changes on top of it? We have migrated to a new ClickHouse client library recently. Thanks in advance!

@casab
Copy link
Author

casab commented Nov 22, 2024

@igorlukanin Can you approve running the workflows?

@KSDaemon
Copy link
Member

KSDaemon commented Jan 17, 2025

Hi @casab Could you please sync with the latest master, resolve conflicts and fix warnings/errors? Ping me whenever you need to approve running workflows!
Btw, for a quick linter fixes you can run yarn run lint:fix

@casab
Copy link
Author

casab commented Jan 24, 2025

@KSDaemon of course, done it.

@KSDaemon
Copy link
Member

KSDaemon commented Jan 24, 2025

@casab Hey! Thnx for the updates! Unfortunately, some lint errors still exists:

  /home/runner/work/cube/cube/packages/cubejs-schema-compiler/src/adapter/ClickHouseQuery.ts
    178:1   error    Expected indentation of 6 spaces but found 8  indent
    178:16  warning  Strings must use singlequote                  quotes
    182:1   error    Expected indentation of 6 spaces but found 8  indent
    182:26  error    Unexpected string concatenation               prefer-template
    182:58  warning  Strings must use singlequote                  quotes
    183:1   error    Expected indentation of 6 spaces but found 8  indent
    184:1   error    Expected indentation of 6 spaces but found 8  indent
    184:16  warning  Strings must use singlequote                  quotes
    188:1   error    Expected indentation of 6 spaces but found 8  indent
    188:16  warning  Strings must use singlequote                  quotes
    192:1   error    Expected indentation of 2 spaces but found 0  indent

Maybe thats because of merge... But anyway...

Can you fix them, please?

Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.73%. Comparing base (1203291) to head (79373d6).
Report is 343 commits behind head on master.

Files with missing lines Patch % Lines
...t/cubesql/cubesql/src/compile/engine/df/wrapper.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6143      +/-   ##
==========================================
- Coverage   83.57%   80.73%   -2.84%     
==========================================
  Files         227      227              
  Lines       81618    81646      +28     
==========================================
- Hits        68210    65916    -2294     
- Misses      13408    15730    +2322     
Flag Coverage Δ
cubesql 80.73% <96.55%> (-2.84%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@casab casab requested a review from KSDaemon February 11, 2025 18:58
@casab
Copy link
Author

casab commented Feb 11, 2025

@KSDaemon @mcheshkov Hey, I have fixed all the errors. And implemented all the required parts. I would appreciate it if you can review it. Especially the rust part.

@@ -123,7 +125,7 @@ export class ClickHouseQuery extends BaseQuery {
.join(' AND ');
}

public getFieldAlias(id) {
public getField(id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make it public? Maybe private would be better?

@@ -168,6 +188,43 @@ export class ClickHouseQuery extends BaseQuery {
return `${fieldAlias} ${direction}`;
}

public getCollation() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same question here: should it be public?

Comment on lines +203 to +219
if (R.isEmpty(this.order)) {
return '';
}

const collation = this.getCollation();

const orderByString = R.pipe(
R.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
}),
R.reject(R.isNil),
R.join(', ')
)(this.order);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mind rewriting this in plain JS instead of using Ramda?
Smth like this:

Suggested change
if (R.isEmpty(this.order)) {
return '';
}
const collation = this.getCollation();
const orderByString = R.pipe(
R.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
}),
R.reject(R.isNil),
R.join(', ')
)(this.order);
if (this.order.length === 0) {
return '';
}
const collation = this.getCollation();
const orderByString = this.order
.map((order) => {
let orderString = this.orderHashToString(order);
if (collation && this.getFieldType(order) === 'string') {
orderString = `${orderString} COLLATE '${collation}'`;
}
return orderString;
})
.filter(Boolean) // Analogue `R.reject(R.isNil)`
.join(', ');

@igorlukanin igorlukanin added the driver:clickhouse Issues related to the ClickHouse driver label Jun 9, 2025
@igorlukanin
Copy link
Member

@casab We're actively looking into merging this PR. Could you please resolve the conflicts and resolve on top of master once again? Thanks in advance!

@casab
Copy link
Author

casab commented Jun 9, 2025

@igorlukanin sure, I'll look into it and let you know.

Comment on lines +1539 to +1540
//let data_type = expr.get_type(&schema.clone())?;
//let data_type = Self::generate_sql_type(generator.clone(), data_type.clone())?;
Copy link
Member

@MazterQyou MazterQyou Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove these comments? They are likely experimentation leftovers. Thanks in advance! 😊

Other than that, great PR! I feel like this can be merged once other suggested changes are resolved or agreed upon. Thanks!

Comment on lines +348 to +353
const oldOrderBy = '{% if order_by %}\nORDER BY {{ order_by | map(attribute=\'expr\') | join(\', \') }}{% endif %}';

const newOrderBy =
'{% if order_by %}\nORDER BY {% for item in order_by %}{{ item.expr }}' +
`{%- if item.data_type and item.data_type == 'string' %} COLLATE '${collation}'{% endif %}` +
'{%- if not loop.last %}, {% endif %}{% endfor %}{% endif %}';
Copy link
Member

@MazterQyou MazterQyou Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest staying away from replacing strings by putting the original ORDER BY expression here. The statements/select template might change over time in BaseQuery, and if ORDER BY condition is touched, this code would break.

I would suggest moving whole statements/select template here, then deciding the ORDER BY part based on the provided condition. This way if BaseQuery' statements/select is modified, this dialect would be untouched and would continue working regardless. We already use similar approach with overriding whole statements/select template in PinotQuery and PrestodbQuery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source driver driver:clickhouse Issues related to the ClickHouse driver pr:community Contribution from Cube.js community members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants