diff --git a/character-set-and-collation.md b/character-set-and-collation.md index ec693426aae8f..f3dc38a98a932 100644 --- a/character-set-and-collation.md +++ b/character-set-and-collation.md @@ -14,8 +14,6 @@ A character set is a set of symbols and encodings. The default character set in A collation is a set of rules for comparing characters in a character set, and the sorting order of characters. For example in a binary collation `A` and `a` do not compare as equal: -{{< copyable "sql" >}} - ```sql SET NAMES utf8mb4 COLLATE utf8mb4_bin; SELECT 'A' = 'a'; @@ -57,12 +55,45 @@ SELECT 'A' = 'a'; 1 row in set (0.00 sec) ``` +The following example demonstrates how different Unicode collations compare the German `ß` with `ss`. You can see that only the more strict Unicode collations treat them as equivalent, returning `1` (which means TRUE). + +```sql +SELECT + 'ss' COLLATE utf8mb4_general_ci = 'ß', + 'ss' COLLATE utf8mb4_unicode_ci = 'ß', + 'ss' COLLATE utf8mb4_0900_ai_ci = 'ß', + 'ss' COLLATE utf8mb4_0900_bin = 'ß' +\G +``` + +``` +*************************** 1. row *************************** +'ss' COLLATE utf8mb4_general_ci = 'ß': 0 +'ss' COLLATE utf8mb4_unicode_ci = 'ß': 1 +'ss' COLLATE utf8mb4_0900_ai_ci = 'ß': 1 + 'ss' COLLATE utf8mb4_0900_bin = 'ß': 0 +1 row in set (0.01 sec) +``` + +### Character set and collation naming + +A character set can have multiple collations, named in the `_` format. For example, the `utf8mb4` character set has a collation called `utf8mb4_bin`, which is a binary collation for `utf8mb4`. Multiple collation properties can be included in the name, separated by underscores (`_`). + +The following table shows the common collation properties and meanings. + +| Collation properties | Meaning | +|---|---| +| `_bin` | Binary | +| `_ci` | Case insensitive | +| `_ai_ci` | Accent insensitive, case insensitive | +| `_0900_bin` | Unicode UCA 9.0.0, binary | +| `_unicode_ci` | (Older) Unicode UCA collation, case insensitive | +| `_general_ci` | Less strict Unicode collation, case insensitive | + ## Character sets and collations supported by TiDB Currently, TiDB supports the following character sets: -{{< copyable "sql" >}} - ```sql SHOW CHARACTER SET; ``` @@ -88,23 +119,23 @@ SHOW COLLATION; ``` ```sql -+--------------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+--------------------+---------+------+---------+----------+---------+ -| ascii_bin | ascii | 65 | Yes | Yes | 1 | -| binary | binary | 63 | Yes | Yes | 1 | -| gbk_bin | gbk | 87 | | Yes | 1 | -| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | -| utf8_general_ci | utf8 | 33 | | Yes | 1 | -| utf8_unicode_ci | utf8 | 192 | | Yes | 1 | -| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 1 | -| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | -| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | -| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 | -+--------------------+---------+------+---------+----------+---------+ ++--------------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++--------------------+---------+-----+---------+----------+---------+---------------+ +| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | +| binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | +| utf8_general_ci | utf8 | 33 | | Yes | 1 | PAD SPACE | +| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE | +| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | NO PAD | +| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | +| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | +| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | ++--------------------+---------+-----+---------+----------+---------+---------------+ 13 rows in set (0.00 sec) ``` @@ -123,8 +154,6 @@ SHOW COLLATION; You can use the following statement to view the collations (under the [new framework for collations](#new-framework-for-collations)) that corresponds to the character set. -{{< copyable "sql" >}} - ```sql SHOW COLLATION WHERE Charset = 'utf8mb4'; ``` @@ -146,9 +175,11 @@ For details about the TiDB support of the GBK character set, see [GBK](/characte ## `utf8` and `utf8mb4` in TiDB -In MySQL, the character set `utf8` is limited to a maximum of three bytes. This is sufficient to store characters in the Basic Multilingual Plane (BMP), but not enough to store characters such as emojis. For this, it is recommended to use the character set `utf8mb4` instead. +In MySQL, the character set `utf8` is limited to a maximum of three bytes. This is sufficient to store characters in the Basic Multilingual Plane (BMP), but not enough to store characters such as emojis. For new installations, it is recommended to use `utf8mb4` and migrate away from `utf8`. + +In both MySQL and TiDB, `utf8` and `utf8mb3` are aliases for the same character set. -By default, TiDB also limits the character set `utf8` to a maximum of three bytes to ensure that data created in TiDB can still safely be restored in MySQL. You can disable it by changing the value of the system variable [`tidb_check_mb4_value_in_utf8`](/system-variables.md#tidb_check_mb4_value_in_utf8) to `OFF`. +By default, TiDB also limits the character set `utf8` to a maximum of three bytes to ensure that data created in TiDB can still safely be restored in MySQL. You can disable it by changing the value of the system variable [`tidb_check_mb4_value_in_utf8`](/system-variables.md#tidb_check_mb4_value_in_utf8) to `OFF`. However, it is recommended to use `utf8mb4` instead for full Unicode support and better compatibility. The following demonstrates the default behavior when inserting a 4-byte emoji character into a table. The `INSERT` statement fails for the `utf8` character set, but succeeds for `utf8mb4`: @@ -158,7 +189,7 @@ CREATE TABLE utf8_test ( ) CHARACTER SET utf8; ``` -```sql +``` Query OK, 0 rows affected (0.09 sec) ``` @@ -168,7 +199,7 @@ CREATE TABLE utf8m4_test ( ) CHARACTER SET utf8mb4; ``` -```sql +``` Query OK, 0 rows affected (0.09 sec) ``` @@ -176,7 +207,7 @@ Query OK, 0 rows affected (0.09 sec) INSERT INTO utf8_test VALUES ('😉'); ``` -```sql +``` ERROR 1366 (HY000): incorrect utf8 value f09f9889(😉) for column c ``` @@ -184,7 +215,7 @@ ERROR 1366 (HY000): incorrect utf8 value f09f9889(😉) for column c INSERT INTO utf8m4_test VALUES ('😉'); ``` -```sql +``` Query OK, 1 row affected (0.02 sec) ``` @@ -192,7 +223,7 @@ Query OK, 1 row affected (0.02 sec) SELECT char_length(c), length(c), c FROM utf8_test; ``` -```sql +``` Empty set (0.01 sec) ``` @@ -200,7 +231,7 @@ Empty set (0.01 sec) SELECT char_length(c), length(c), c FROM utf8m4_test; ``` -```sql +``` +----------------+-----------+------+ | char_length(c) | length(c) | c | +----------------+-----------+------+ @@ -231,8 +262,6 @@ ALTER DATABASE db_name Different databases can use different character sets and collations. Use the `character_set_database` and `collation_database` to see the character set and collation of the current database: -{{< copyable "sql" >}} - ```sql CREATE SCHEMA test1 CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; ``` @@ -241,8 +270,6 @@ CREATE SCHEMA test1 CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; Query OK, 0 rows affected (0.09 sec) ``` -{{< copyable "sql" >}} - ```sql USE test1; ``` @@ -251,8 +278,6 @@ USE test1; Database changed ``` -{{< copyable "sql" >}} - ```sql SELECT @@character_set_database, @@collation_database; ``` @@ -266,8 +291,6 @@ SELECT @@character_set_database, @@collation_database; 1 row in set (0.00 sec) ``` -{{< copyable "sql" >}} - ```sql CREATE SCHEMA test2 CHARACTER SET latin1 COLLATE latin1_bin; ``` @@ -276,8 +299,6 @@ CREATE SCHEMA test2 CHARACTER SET latin1 COLLATE latin1_bin; Query OK, 0 rows affected (0.09 sec) ``` -{{< copyable "sql" >}} - ```sql USE test2; ``` @@ -286,8 +307,6 @@ USE test2; Database changed ``` -{{< copyable "sql" >}} - ```sql SELECT @@character_set_database, @@collation_database; ``` @@ -303,8 +322,6 @@ SELECT @@character_set_database, @@collation_database; You can also see the two values in `INFORMATION_SCHEMA`: -{{< copyable "sql" >}} - ```sql SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'db_name'; @@ -326,8 +343,6 @@ ALTER TABLE tbl_name For example: -{{< copyable "sql" >}} - ```sql CREATE TABLE t1(a int) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; ``` @@ -358,16 +373,12 @@ If the column character set and collation are not specified, the table character Each string corresponds to a character set and a collation. When you use a string, this option is available: -{{< copyable "sql" >}} - ```sql [_charset_name]'string' [COLLATE collation_name] ``` Example: -{{< copyable "sql" >}} - ```sql SELECT 'string'; SELECT _utf8mb4'string'; @@ -453,8 +464,6 @@ Since v4.0, TiDB supports a [new framework for collations](#new-framework-for-co Before v4.0, you can specify most of the MySQL collations in TiDB, and these collations are processed according to the default collations, which means that the byte order determines the character order. Different from MySQL, TiDB does not handle the trailing spaces of a character, which causes the following behavior differences: -{{< copyable "sql" >}} - ```sql CREATE TABLE t(a varchar(20) charset utf8mb4 collate utf8mb4_general_ci PRIMARY KEY); ``` @@ -530,8 +539,6 @@ Under the new framework, TiDB supports the `utf8_general_ci`, `utf8mb4_general_c When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_ai_ci` and `gbk_chinese_ci` is used, the string comparison is case-insensitive and accent-insensitive. At the same time, TiDB also corrects the collation's `PADDING` behavior: -{{< copyable "sql" >}} - ```sql CREATE TABLE t(a varchar(20) charset utf8mb4 collate utf8mb4_general_ci PRIMARY KEY); ``` @@ -593,8 +600,6 @@ TiDB cannot infer the collation and reports an error in the following situations TiDB supports using the `COLLATE` clause to specify the collation of an expression. The coercibility value of this expression is `0`, which has the highest priority. See the following example: -{{< copyable "sql" >}} - ```sql SELECT 'a' = _utf8mb4 'A' collate utf8mb4_general_ci; ``` diff --git a/character-set-gbk.md b/character-set-gbk.md index 68b5d9d088d22..3abd11246f251 100644 --- a/character-set-gbk.md +++ b/character-set-gbk.md @@ -9,20 +9,29 @@ Since v5.4.0, TiDB supports the GBK character set. This document provides the Ti ```sql SHOW CHARACTER SET WHERE CHARSET = 'gbk'; +``` + +``` +---------+-------------------------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+-------------------------------------+-------------------+--------+ | gbk | Chinese Internal Code Specification | gbk_bin | 2 | +---------+-------------------------------------+-------------------+--------+ 1 row in set (0.00 sec) +``` +```sql SHOW COLLATION WHERE CHARSET = 'gbk'; -+----------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+----------------+---------+------+---------+----------+---------+ -| gbk_bin | gbk | 87 | | Yes | 1 | -+----------------+---------+------+---------+----------+---------+ -1 rows in set (0.00 sec) +``` + +``` ++----------------+---------+----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++----------------+---------+----+---------+----------+---------+---------------+ +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | ++----------------+---------+----+---------+----------+---------+---------------+ +2 rows in set (0.00 sec) ``` ## MySQL compatibility @@ -31,11 +40,11 @@ This section provides the compatibility information between MySQL and TiDB. ### Collations -The default collation of the GBK character set in MySQL is `gbk_chinese_ci`. Unlike MySQL, the default collation of the GBK character set in TiDB is `gbk_bin`. Additionally, because TiDB converts GBK to UTF8MB4 and then uses a binary collation, the `gbk_bin` collation in TiDB is not the same as the `gbk_bin` collation in MySQL. +The default collation of the GBK character set in MySQL is `gbk_chinese_ci`. Unlike MySQL, the default collation of the GBK character set in TiDB is `gbk_bin`. Additionally, because TiDB converts GBK to `utf8mb4` and then uses a binary collation, the `gbk_bin` collation in TiDB is not the same as the `gbk_bin` collation in MySQL. -To make TiDB compatible with the collations of MySQL GBK character set, when you first initialize the TiDB cluster, you need to set the TiDB option [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) to `true` to enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations). +To make TiDB compatible with the collations of MySQL GBK character set, when you first initialize the TiDB cluster, you need to set the TiDB option [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) to `true` to enable the [new framework for collations](/character-set-and-collation.md#new-framework-for-collations). This is the default setting for new deployments. @@ -49,20 +58,28 @@ After enabling the new framework for collations, if you check the collations cor ```sql SHOW CHARACTER SET WHERE CHARSET = 'gbk'; +``` + +``` +---------+-------------------------------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+-------------------------------------+-------------------+--------+ | gbk | Chinese Internal Code Specification | gbk_chinese_ci | 2 | +---------+-------------------------------------+-------------------+--------+ 1 row in set (0.00 sec) +``` +```sql SHOW COLLATION WHERE CHARSET = 'gbk'; -+----------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+----------------+---------+------+---------+----------+---------+ -| gbk_bin | gbk | 87 | | Yes | 1 | -| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | -+----------------+---------+------+---------+----------+---------+ +``` + +``` ++----------------+---------+----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++----------------+---------+----+---------+----------+---------+---------------+ +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | ++----------------+---------+----+---------+----------+---------+---------------+ 2 rows in set (0.00 sec) ``` @@ -93,6 +110,9 @@ In the above table, the result of `SELECT HEX('a');` in the `utf8mb4` byte set i CREATE TABLE t(a CHAR(10) CHARSET BINARY); Query OK, 0 rows affected (0.00 sec) INSERT INTO t VALUES (_gbk'啊'); + ``` + + ``` ERROR 1115 (42000): Unsupported character introducer: 'gbk' ``` @@ -109,3 +129,8 @@ In the above table, the result of `SELECT HEX('a');` in the `utf8mb4` byte set i - TiCDC versions earlier than v6.1.0 do not support replicating `charset=GBK` tables. No version of TiCDC supports replicating `charset=GBK` tables to TiDB clusters earlier than v6.1.0. - Backup & Restore (BR) versions earlier than v5.4.0 do not support recovering `charset=GBK` tables. No version of BR supports recovering `charset=GBK` tables to TiDB clusters earlier than v5.4.0. + +## See also + +* [`SHOW CHARACTER SET`](/sql-statements/sql-statement-show-character-set.md) +* [Character Set and Collation](/character-set-and-collation.md) \ No newline at end of file diff --git a/migrate-from-mariadb.md b/migrate-from-mariadb.md index 1050afa9302f8..afc4944f40ef8 100644 --- a/migrate-from-mariadb.md +++ b/migrate-from-mariadb.md @@ -193,24 +193,24 @@ SHOW COLLATION; ``` ```sql -+--------------------+---------+-----+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+--------------------+---------+-----+---------+----------+---------+ -| ascii_bin | ascii | 65 | Yes | Yes | 1 | -| binary | binary | 63 | Yes | Yes | 1 | -| gbk_bin | gbk | 87 | | Yes | 1 | -| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | -| utf8_general_ci | utf8 | 33 | | Yes | 1 | -| utf8_unicode_ci | utf8 | 192 | | Yes | 1 | -| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 1 | -| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | -| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | -| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 | -+--------------------+---------+-----+---------+----------+---------+ -13 rows in set (0.0012 sec) ++--------------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++--------------------+---------+-----+---------+----------+---------+---------------+ +| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | +| binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | +| utf8_general_ci | utf8 | 33 | | Yes | 1 | PAD SPACE | +| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE | +| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | NO PAD | +| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | +| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | +| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | ++--------------------+---------+-----+---------+----------+---------+---------------+ +13 rows in set (0.00 sec) ``` To check what collations the columns of your current tables are using, you can use this statement: diff --git a/sql-statements/sql-statement-show-collation.md b/sql-statements/sql-statement-show-collation.md index 1735b47ee22d1..a7d1b34e46562 100644 --- a/sql-statements/sql-statement-show-collation.md +++ b/sql-statements/sql-statement-show-collation.md @@ -25,50 +25,61 @@ ShowLikeOrWhere ::= ## Examples -When new collation framework is disabled, only binary collations are displayed. + + +When [the new collation framework](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) is enabled (the default), the example output is as follows: + + ```sql SHOW COLLATION; ``` ``` -+-------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+-------------+---------+------+---------+----------+---------+ -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | -| binary | binary | 63 | Yes | Yes | 1 | -| ascii_bin | ascii | 65 | Yes | Yes | 1 | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | -+-------------+---------+------+---------+----------+---------+ -5 rows in set (0.02 sec) ++--------------------+---------+-----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++--------------------+---------+-----+---------+----------+---------+---------------+ +| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | +| binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | +| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | +| utf8_general_ci | utf8 | 33 | | Yes | 1 | PAD SPACE | +| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE | +| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | NO PAD | +| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | +| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE | +| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | ++--------------------+---------+-----+---------+----------+---------+---------------+ +13 rows in set (0.00 sec) ``` -When new collation framework is enabled, `utf8_general_ci` and `utf8mb4_general_ci` are additionally supported. + + +When the new collation framework is disabled, only binary collations are listed. ```sql SHOW COLLATION; ``` ``` -+--------------------+---------+------+---------+----------+---------+ -| Collation | Charset | Id | Default | Compiled | Sortlen | -+--------------------+---------+------+---------+----------+---------+ -| ascii_bin | ascii | 65 | Yes | Yes | 1 | -| binary | binary | 63 | Yes | Yes | 1 | -| gbk_bin | gbk | 87 | | Yes | 1 | -| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | -| latin1_bin | latin1 | 47 | Yes | Yes | 1 | -| utf8_bin | utf8 | 83 | Yes | Yes | 1 | -| utf8_general_ci | utf8 | 33 | | Yes | 1 | -| utf8_unicode_ci | utf8 | 192 | | Yes | 1 | -| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | -| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | -| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 | -+--------------------+---------+------+---------+----------+---------+ -11 rows in set (0.001 sec) ++-------------+---------+----+---------+----------+---------+---------------+ +| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | ++-------------+---------+----+---------+----------+---------+---------------+ +| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE | +| latin1_bin | latin1 | 47 | Yes | Yes | 1 | PAD SPACE | +| binary | binary | 63 | Yes | Yes | 1 | NO PAD | +| ascii_bin | ascii | 65 | Yes | Yes | 1 | PAD SPACE | +| utf8_bin | utf8 | 83 | Yes | Yes | 1 | PAD SPACE | +| gbk_bin | gbk | 87 | Yes | Yes | 1 | PAD SPACE | ++-------------+---------+----+---------+----------+---------+---------------+ +6 rows in set (0.00 sec) ``` + + To filter on the character set, you can add a `WHERE` clause. ```sql