Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions character-set-and-collation.md
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,20 @@ To disable this error reporting, use `set @@tidb_skip_utf8_check=1;` to skip the
>
> If the character check is skipped, TiDB might fail to detect illegal UTF-8 characters written by the application, cause decoding errors when `ANALYZE` is executed, and introduce other unknown encoding issues. If your application cannot guarantee the validity of the written string, it is not recommended to skip the character check.

In certain SQL statements, comparisons might involve invalid UTF-8 characters. For example:

```sql
SELECT * FROM `t` WHERE `id` > 'a" + string([]byte{0xff}) + "a';
```

In the preceding statement, `0xff` is an invalid UTF-8 byte. When handling such characters, the behavior of TiDB depends on the collation:

* Non-binary collations (such as `utf8mb4_general_ci`): TiDB truncates the string at the invalid byte. The truncated part is excluded from the comparison.

* `gbk_bin` and `gb18030_bin`: TiDB replaces invalid bytes with the character `?` and continues with the comparison.

* Other binary collations (such as `utf8_bin`): TiDB treats invalid bytes as ordinary bytes and compares them based on their original binary values.

## Collation support framework

<CustomContent platform="tidb">
Expand Down