From abfc0fd2cde4d15b737b88b2570ae5c1ec4500d8 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 17:50:30 +0800 Subject: [PATCH 01/20] add changefeed doc --- tidb-cloud/serverless-changefeed-overview.md | 78 ++++++ ...verless-changefeed-sink-to-apache-kafka.md | 252 ++++++++++++++++++ 2 files changed, 330 insertions(+) create mode 100644 tidb-cloud/serverless-changefeed-overview.md create mode 100644 tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md new file mode 100644 index 0000000000000..9e87be81164ce --- /dev/null +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -0,0 +1,78 @@ +--- +title: Changefeed +summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. +--- + +# Changefeed (Beta) + +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka. +> **Note:** +> +> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). +> - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. +> - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. + +## View the Changefeed page + +To access the changefeed feature, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed list --cluster-id +``` + +## Create a changefeed + +To create a changefeed, refer to the tutorials: + +- [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) + +## Pause or resume a changefeed + +To pause a changefeed, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed pause --cluster-id --changefeed-id +``` + +To resume a changefeed, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed resume --cluster-id --changefeed-id +``` + +## Edit a changefeed + +> **Note:** +> +> TiDB Cloud currently only allows editing changefeeds in the paused status. + +To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed edit --cluster-id --changefeed-id --name --kafka --filter +``` + +## Delete a changefeed + +To delete a changefeed, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed delete --cluster-id --changefeed-id +``` + +## Changefeed billing + +Changefeed feature is free on beta now. + +## Changefeed states + +The state of a changefeed represents the running state of the changefeed. During the running process, changefeed might fail with errors, be manually paused or resumed. These behaviors can lead to changes of the changefeed state. + +The states are described as follows: + +- `CREATING`: the changefeed is being created. +- `CREATE_FAILED`: the changefeed creation fails, you need to delete the changefeed and create a new one. +- `RUNNING`: the changefeed runs normally and the checkpoint-ts proceeds normally. +- `PAUSED`: the changefeed is paused. +- `WARNING`: the changefeed returns a warning. The changefeed cannot continue due to some recoverable errors. The changefeed in this state keeps trying to resume until the state transfers to `RUNNING`. The changefeed in this state blocks [GC operations](https://docs.pingcap.com/tidb/stable/garbage-collection-overview). +- `RUNNING_FAILED`: the changefeed fails. Due to some errors, the changefeed cannot resume and cannot be recovered automatically. If the issues are resolved before the garbage collection (GC) of the incremental data, you can manually resume the failed changefeed. The default Time-To-Live (TTL) duration for incremental data is 24 hours, which means that the GC mechanism does not delete any data within 24 hours after the changefeed is interrupted. diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md new file mode 100644 index 0000000000000..b3e031f60c9e0 --- /dev/null +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -0,0 +1,252 @@ +--- +title: Sink to Apache Kafka +summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. +--- + +# Sink to Apache Kafka + +This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. + +## Restrictions + +- For each TiDB Cloud cluster, you can create up to 100 changefeeds. +- Currently, TiDB Cloud does not support uploading self-signed TLS certificates to connect to Kafka brokers. +- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios). +- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios. + +## Prerequisites + +Before creating a changefeed to stream data to Apache Kafka, you need to complete the following prerequisites: + +- Set up your network connection +- Add permissions for Kafka ACL authorization + +### Network + +Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB cluster can only connect to Apache Kafka through the Public IP. + +> **Note:** +> +> If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in "Apply for TiDB Cloud Serverless database audit logging" in the **Description** field and click **Submit**. + + +To provide Public IP access to your Apache Kafka service, assign Public IP addresses to all your Kafka brokers. + +### Kafka ACL authorization + +To allow TiDB Cloud changefeeds to stream data to Apache Kafka and create Kafka topics automatically, ensure that the following permissions are added in Kafka: + +- The `Create` and `Write` permissions are added for the topic resource type in Kafka. +- The `DescribeConfigs` permission is added for the cluster resource type in Kafka. + +For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources](https://docs.confluent.io/platform/current/kafka/authorization.html#resources) and [Adding ACLs](https://docs.confluent.io/platform/current/kafka/authorization.html#adding-acls) in Confluent documentation for more information. + +## Create a changefeed to stream data to Apache Kafka + +To create a changefeed to stream data from TiDB Cloud to Apache Kafka, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed create --cluster-id --name --type KAFKA --kafka --filter --start-tso +``` + +- ``: the ID of the TiDB Cloud cluster that you want to create the changefeed for. +- ``: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. +- type: the type of the changefeed, which is `KAFKA` in this case. +- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. +- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. +- start-tso: the TSO from which the changefeed starts to replicate data. If you do not specify a TSO, the current TSO is used by default. To learn more about TSO, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). + +### Filter configurations + +To get a template of `filter` configurations, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template +``` + +To get the explanation of the template, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template --explain +``` + +The configurations in the `filter` JSON string are used to filter tables and events that you want to replicate. Below is an example of a `filter` configuration: + +
+Example filter configuration + +```json +{ + "filterRule": ["test.t1", "test.t2"], + "mode": "IGNORE_NOT_SUPPORT_TABLE", + "eventFilterRule": [ + { + "matcher": ["test.t1", "test.t2"], + "ignore_event": ["all dml", "all ddl"] + } + ] +} +``` +
+ +1. **Filter Rule**: you can set `filter rules` to filter the tables that you want to replicate. See [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/) for more information about the rule syntax. +2. **Event Filter Rule**: you can set the `matcher` and `ignore_event` to ignore some events matching the rules. See [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules) to get all the supported event types. +3. **mode**: set mode to `IGNORE_NOT_SUPPORT_TABLE` to ignore the tables that do not support replication, such as the tables that do not have primary keys or unique indexes. set mode to `FORCE_SYNC` to force the changefeed to replicate all tables. + +### Kafka configurations + +To get a template of `kafka` configurations, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template +``` + +To get the explanation of the template, using the TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template --explain +``` + +The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: + +
+Example filter configuration + +```json +{ + "network_info": { + "network_type": "PUBLIC" + }, + "broker": { + "kafka_version": "VERSION_2XX", + "broker_endpoints": "broker1:9092,broker2:9092", + "tls_enable": false, + "compression": "NONE" + }, + "authentication": { + "auth_type": "DISABLE", + "user_name": "", + "password": "" + }, + "data_format": { + "protocol": "CANAL_JSON", + "enable_tidb_extension": false, + "avro_config": { + "decimal_handling_mode": "PRECISE", + "bigint_unsigned_handling_mode": "LONG", + "schema_registry": { + "schema_registry_endpoints": "", + "enable_http_auth": false, + "user_name": "", + "password": "" + } + } + }, + "topic_partition_config": { + "dispatch_type": "ONE_TOPIC", + "default_topic": "test-topic", + "topic_prefix": "_prefix", + "separator": "_", + "topic_suffix": "_suffix", + "replication_factor": 1, + "partition_num": 1, + "partition_dispatchers": [{ + "partition_type": "TABLE", + "matcher": ["*.*"], + "index_name": "index1", + "columns": ["col1", "col2"] + }] + }, + "column_selectors": [{ + "matcher": ["*.*"], + "columns": ["col1", "col2"] + }] +} +``` +
+ +The main configuration fields are as follows: + +1. **network_info**: Only `PUBLIC` network type is supported for now. This means that the TiDB cluster can connect to the Apache Kafka service through the Public IP. + +2. **broker**: Contains Kafka broker connection information: + + - `kafka_version`: The Kafka version, such as `VERSION_2XX`. + - `broker_endpoints`: Comma-separated list of broker endpoints. + - `tls_enable`: Whether to enable TLS for the connection. + - `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. + +"DISABLE", "SASL_PLAIN", "SASL_SCRAM_SHA_256", "SASL_SCRAM_SHA_512" + +3. **authentication**: Authentication settings for connecting to kafka, support `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256` and `SASL_SCRAM_SHA_512`. The `user_name` and `password` fields are required if you set the `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. + +4. **data_format.protocol**: Support `CANAL_JSON`, `AVRO`, and `OPEN_PROTOCOL`. + + - Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). + - Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). + - Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). + - Debezium is a tool for capturing database changes. It converts each captured database change into a message called an "event" and sends these events to Kafka. For more information, see [Debezium data format](https://docs.pingcap.com/tidb/stable/ticdc-debezium). + +5. **data_format.enable_tidb_extension**: if you want to add TiDB-extension fields to the Kafka message body. + + For more information about TiDB-extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). + +6. **data_format.avro_config**: If you select **Avro** as your data format, you need to set the Avro-specific configurations: + + - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: specify how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. + - `schema_registry`: the schema registry endpoint. If you enable `enable_http_auth`, the fields for user name and password are required. + +7. **topic_partition_config.dispatch_type**: Support `ONE_TOPIC`, `BY_TABLE` and `BY_DATABASE`. Controls how the changefeed creates Kafka topics, by table, by database, or creating one topic for all changelogs. + + - **Distribute changelogs by table to Kafka Topics** + + If you want the changefeed to create a dedicated Kafka topic for each table, choose this mode. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` and between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. + + For changelogs of non-row events, such as Create Schema Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. + + - **Distribute changelogs by database to Kafka Topics** + + If you want the changefeed to create a dedicated Kafka topic for each database, choose this mode. Then, all Kafka messages of a database are sent to a dedicated Kafka topic. You can customize topic names of databases by setting a `topic_prefix` and a `topic_suffix`. + + For changelogs of non-row events, such as Resolved Ts Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. + + - **Send all changelogs to one specified Kafka Topic** + + If you want the changefeed to create one Kafka topic for all changelogs, choose this mode. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. + +8. **topic_partition_config.default_topic**: The default topic name for non-row events, such as Create Schema Event and Resolved Ts Event. If you set the `dispatch_type` to `ONE_TOPIC`, this field is required. + + - `topic_prefix`: The prefix for the topic name. + - `separator`: The separator between a database name and table name in the topic name. + - `topic_suffix`: The suffix for the topic name. + +9. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. + +10. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. + +11. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. + + - **Distribute changelogs by primary key or index value to Kafka partition** + + If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `INDEX_VALUE` and set the `index_name`. The primary key or index value of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures row-level orderliness. + + - **Distribute changelogs by table to Kafka partition** + + If you want the changefeed to send Kafka messages of a table to one Kafka partition, set `partition_type` to `TABLE`. The table name of a row changelog will determine which partition the changelog is sent to. This distribution method ensures table orderliness but might cause unbalanced partitions. + + - **Distribute changelogs by timestamp to Kafka partition** + + If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. + + - **Distribute changelogs by column value to Kafka partition** + + If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelog with the same column values is send to the same partition. + + For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). + +12. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. + + - `matcher`: specify which tables the column selector applies to. For tables that do not match any rule, all columns are sent. + - `columns`: specify which columns of the matched tables will be sent to the downstream. + + For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). From 246f127df443c8bbfa3078bb1f5798f974e55c73 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 17:54:07 +0800 Subject: [PATCH 02/20] fix --- ...serverless-changefeed-sink-to-apache-kafka.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index b3e031f60c9e0..57dcf0404810b 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -187,7 +187,7 @@ The main configuration fields are as follows: - Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). - Debezium is a tool for capturing database changes. It converts each captured database change into a message called an "event" and sends these events to Kafka. For more information, see [Debezium data format](https://docs.pingcap.com/tidb/stable/ticdc-debezium). -5. **data_format.enable_tidb_extension**: if you want to add TiDB-extension fields to the Kafka message body. +5. **data_format.enable_tidb_extension**: if you want to add TiDB-extension fields to the Kafka message body with `AVRO` or `CANAL_JSON` data format. For more information about TiDB-extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). @@ -214,17 +214,21 @@ The main configuration fields are as follows: If you want the changefeed to create one Kafka topic for all changelogs, choose this mode. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. -8. **topic_partition_config.default_topic**: The default topic name for non-row events, such as Create Schema Event and Resolved Ts Event. If you set the `dispatch_type` to `ONE_TOPIC`, this field is required. +> Note +> +> If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. + +1. **topic_partition_config.default_topic**: The default topic name for non-row events, such as Create Schema Event and Resolved Ts Event. If you set the `dispatch_type` to `ONE_TOPIC`, this field is required. - `topic_prefix`: The prefix for the topic name. - `separator`: The separator between a database name and table name in the topic name. - `topic_suffix`: The suffix for the topic name. -9. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. +2. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. -10. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. +3. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. -11. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. +4. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. - **Distribute changelogs by primary key or index value to Kafka partition** @@ -244,7 +248,7 @@ The main configuration fields are as follows: For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). -12. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. +5. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. - `matcher`: specify which tables the column selector applies to. For tables that do not match any rule, all columns are sent. - `columns`: specify which columns of the matched tables will be sent to the downstream. From 963a045c4c39a60539e789bec54f3d7aba36a7f1 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 17:57:57 +0800 Subject: [PATCH 03/20] link --- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 57dcf0404810b..68fbec7a0c6ba 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -52,8 +52,8 @@ ticloud serverless changefeed create --cluster-id --name `: the ID of the TiDB Cloud cluster that you want to create the changefeed for. - ``: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. - type: the type of the changefeed, which is `KAFKA` in this case. -- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See []() for more information about the configurations. -- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See []() for more information about the configurations. +- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See [Kafka configurations](#kafka-configurations) for more information about the configurations. +- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See [Filter configurations](#filter-configurations) for more information about the configurations. - start-tso: the TSO from which the changefeed starts to replicate data. If you do not specify a TSO, the current TSO is used by default. To learn more about TSO, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). ### Filter configurations From f2069f35aab41f21cea45a3b0ad193b0bc8a69cf Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 17:59:13 +0800 Subject: [PATCH 04/20] fix --- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 68fbec7a0c6ba..e9670282f7f23 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -41,7 +41,7 @@ To allow TiDB Cloud changefeeds to stream data to Apache Kafka and create Kafka For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources](https://docs.confluent.io/platform/current/kafka/authorization.html#resources) and [Adding ACLs](https://docs.confluent.io/platform/current/kafka/authorization.html#adding-acls) in Confluent documentation for more information. -## Create a changefeed to stream data to Apache Kafka +## Create a changefeed sink to Apache Kafka with TiDB Cloud CLI To create a changefeed to stream data from TiDB Cloud to Apache Kafka, using the TiDB Cloud CLI command: @@ -178,7 +178,7 @@ The main configuration fields are as follows: "DISABLE", "SASL_PLAIN", "SASL_SCRAM_SHA_256", "SASL_SCRAM_SHA_512" -3. **authentication**: Authentication settings for connecting to kafka, support `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256` and `SASL_SCRAM_SHA_512`. The `user_name` and `password` fields are required if you set the `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. +3. **authentication**: Authentication settings for connecting to Kafka, support `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256` and `SASL_SCRAM_SHA_512`. The `user_name` and `password` fields are required if you set the `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. 4. **data_format.protocol**: Support `CANAL_JSON`, `AVRO`, and `OPEN_PROTOCOL`. From e5553a1427c6794cc5f0c87ce5504c88e4de7004 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 18:01:48 +0800 Subject: [PATCH 05/20] fix according to ai --- tidb-cloud/serverless-changefeed-overview.md | 7 ++++--- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 6 ++---- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md index 9e87be81164ce..0909da6fbee5c 100644 --- a/tidb-cloud/serverless-changefeed-overview.md +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -5,10 +5,11 @@ summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other da # Changefeed (Beta) -TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka. +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. + > **Note:** > -> - Currently, you can manager changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). +> - Currently, you can manage changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. @@ -46,7 +47,7 @@ ticloud serverless changefeed resume --cluster-id --changefeed-id < > > TiDB Cloud currently only allows editing changefeeds in the paused status. -To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit with the TiDB Cloud CLI command: +To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit it with the TiDB Cloud CLI command: ```bash ticloud serverless changefeed edit --cluster-id --changefeed-id --name --kafka --filter diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index e9670282f7f23..8d38f54eed570 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -107,10 +107,10 @@ To get the explanation of the template, using the TiDB Cloud CLI command: ticloud serverless changefeed template --explain ``` -The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `filter` configuration: +The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `kafka` configuration:
-Example filter configuration +Example kafka configuration ```json { @@ -176,8 +176,6 @@ The main configuration fields are as follows: - `tls_enable`: Whether to enable TLS for the connection. - `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. -"DISABLE", "SASL_PLAIN", "SASL_SCRAM_SHA_256", "SASL_SCRAM_SHA_512" - 3. **authentication**: Authentication settings for connecting to Kafka, support `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256` and `SASL_SCRAM_SHA_512`. The `user_name` and `password` fields are required if you set the `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. 4. **data_format.protocol**: Support `CANAL_JSON`, `AVRO`, and `OPEN_PROTOCOL`. From edbd3df3408b95c963766a67e1df0065a80550e7 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Fri, 27 Jun 2025 18:08:36 +0800 Subject: [PATCH 06/20] fix all content --- ...verless-changefeed-sink-to-apache-kafka.md | 33 ++++++++----------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 8d38f54eed570..5c76a57e83a0a 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -51,10 +51,10 @@ ticloud serverless changefeed create --cluster-id --name `: the ID of the TiDB Cloud cluster that you want to create the changefeed for. - ``: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. -- type: the type of the changefeed, which is `KAFKA` in this case. -- kafka: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See [Kafka configurations](#kafka-configurations) for more information about the configurations. -- filter: a JSON string that contains the configurations for the changefeed to filter tables and events. See [Filter configurations](#filter-configurations) for more information about the configurations. -- start-tso: the TSO from which the changefeed starts to replicate data. If you do not specify a TSO, the current TSO is used by default. To learn more about TSO, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). +- `type`: the type of the changefeed, which is `KAFKA` in this case. +- `kafka`: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See [Kafka configurations](#kafka-configurations) for more information about the configurations. +- `filter:` a JSON string that contains the configurations for the changefeed to filter tables and events. See [Filter configurations](#filter-configurations) for more information about the configurations. +- `start-tso`: the TSO from which the changefeed starts to replicate data. If you do not specify a TSO, the current TSO is used by default. To learn more about TSO, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). ### Filter configurations @@ -171,7 +171,7 @@ The main configuration fields are as follows: 2. **broker**: Contains Kafka broker connection information: - - `kafka_version`: The Kafka version, such as `VERSION_2XX`. + - `kafka_version`: The Kafka version, support `VERSION_2XX` and `VERSION_3XX`. - `broker_endpoints`: Comma-separated list of broker endpoints. - `tls_enable`: Whether to enable TLS for the connection. - `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. @@ -183,7 +183,6 @@ The main configuration fields are as follows: - Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). - Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). - Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). - - Debezium is a tool for capturing database changes. It converts each captured database change into a message called an "event" and sends these events to Kafka. For more information, see [Debezium data format](https://docs.pingcap.com/tidb/stable/ticdc-debezium). 5. **data_format.enable_tidb_extension**: if you want to add TiDB-extension fields to the Kafka message body with `AVRO` or `CANAL_JSON` data format. @@ -198,35 +197,29 @@ The main configuration fields are as follows: - **Distribute changelogs by table to Kafka Topics** - If you want the changefeed to create a dedicated Kafka topic for each table, choose this mode. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` and between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. + If you want the changefeed to create a dedicated Kafka topic for each table, set `dispatch_type` to `BY_TABLE`. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` and between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. For changelogs of non-row events, such as Create Schema Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. - **Distribute changelogs by database to Kafka Topics** - If you want the changefeed to create a dedicated Kafka topic for each database, choose this mode. Then, all Kafka messages of a database are sent to a dedicated Kafka topic. You can customize topic names of databases by setting a `topic_prefix` and a `topic_suffix`. + If you want the changefeed to create a dedicated Kafka topic for each database, set `dispatch_type` to `BY_DATABASE`. Then, all Kafka messages of a database are sent to a dedicated Kafka topic. You can customize topic names of databases by setting a `topic_prefix` and a `topic_suffix`. For changelogs of non-row events, such as Resolved Ts Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. - **Send all changelogs to one specified Kafka Topic** - If you want the changefeed to create one Kafka topic for all changelogs, choose this mode. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. + If you want the changefeed to create one Kafka topic for all changelogs, set `dispatch_type` to `ONE_TOPIC`. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. > Note > > If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. -1. **topic_partition_config.default_topic**: The default topic name for non-row events, such as Create Schema Event and Resolved Ts Event. If you set the `dispatch_type` to `ONE_TOPIC`, this field is required. +8. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. - - `topic_prefix`: The prefix for the topic name. - - `separator`: The separator between a database name and table name in the topic name. - - `topic_suffix`: The suffix for the topic name. +9. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. -2. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. - -3. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. - -4. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. +10. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. - **Distribute changelogs by primary key or index value to Kafka partition** @@ -238,7 +231,7 @@ The main configuration fields are as follows: - **Distribute changelogs by timestamp to Kafka partition** - If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`.. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. + If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. - **Distribute changelogs by column value to Kafka partition** @@ -246,7 +239,7 @@ The main configuration fields are as follows: For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). -5. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. +11. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. - `matcher`: specify which tables the column selector applies to. For tables that do not match any rule, all columns are sent. - `columns`: specify which columns of the matched tables will be sent to the downstream. From 9ac1266c84312c06d2ec1cd026bc58724f3e6cd6 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Mon, 30 Jun 2025 18:11:02 +0800 Subject: [PATCH 07/20] fix doc --- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 5c76a57e83a0a..326b248f07fed 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -191,7 +191,9 @@ The main configuration fields are as follows: 6. **data_format.avro_config**: If you select **Avro** as your data format, you need to set the Avro-specific configurations: - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: specify how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. - - `schema_registry`: the schema registry endpoint. If you enable `enable_http_auth`, the fields for user name and password are required. + - `schema_registry`: the schema registry endpoint. If auth is required, please set `enable_http_auth` to ture and full in the user name and password. Currently, TiDB Cloud only supports Confluent schema registry. If you need other schema registry, such as aws glue schema registry, pleas contact [TiDB Cloud support](https://docs.pingcap.com/tidbcloud/tidb-cloud-support). + + For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). 7. **topic_partition_config.dispatch_type**: Support `ONE_TOPIC`, `BY_TABLE` and `BY_DATABASE`. Controls how the changefeed creates Kafka topics, by table, by database, or creating one topic for all changelogs. From d6ff93e5cd4f20573029c876b7b6966854e4245b Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Mon, 30 Jun 2025 19:51:32 +0800 Subject: [PATCH 08/20] update --- .../serverless-changefeed-sink-to-apache-kafka.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 326b248f07fed..ad78fa4996c7d 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -134,11 +134,17 @@ The configurations in the `kafka` JSON string are used to configure how the chan "avro_config": { "decimal_handling_mode": "PRECISE", "bigint_unsigned_handling_mode": "LONG", - "schema_registry": { - "schema_registry_endpoints": "", + "confluent_schema_registry": { + "endpoint": "", "enable_http_auth": false, "user_name": "", "password": "" + }, + "aws_glue_schema_registry": { + "region": "", + "name": "", + "access_key_id": "", + "secret_access_key": "" } } }, @@ -191,7 +197,8 @@ The main configuration fields are as follows: 6. **data_format.avro_config**: If you select **Avro** as your data format, you need to set the Avro-specific configurations: - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: specify how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. - - `schema_registry`: the schema registry endpoint. If auth is required, please set `enable_http_auth` to ture and full in the user name and password. Currently, TiDB Cloud only supports Confluent schema registry. If you need other schema registry, such as aws glue schema registry, pleas contact [TiDB Cloud support](https://docs.pingcap.com/tidbcloud/tidb-cloud-support). + - `confluent_schema_registry`: the configuration of confluent schema registry. If auth is required, please set `enable_http_auth` to true and full in the user name and password. See [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html) for more information. + - `aws_glue_schema_registry`: the configuration of AWS Glue schema registry. If you want to use AWS Glue schema registry, you need to set the region, name, access key ID, and secret access key. See [AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html) for more information. For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). From 6a44d43f41445bd39ab3cd7883b2e7701777cfda Mon Sep 17 00:00:00 2001 From: shi yuhang <52435083+shiyuhang0@users.noreply.github.com> Date: Thu, 3 Jul 2025 14:51:28 +0800 Subject: [PATCH 09/20] Apply suggestions from code review Co-authored-by: Grace Cai --- tidb-cloud/serverless-changefeed-overview.md | 22 ++-- ...verless-changefeed-sink-to-apache-kafka.md | 110 +++++++++--------- 2 files changed, 69 insertions(+), 63 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md index 0909da6fbee5c..b6fc7fdd758b2 100644 --- a/tidb-cloud/serverless-changefeed-overview.md +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -1,19 +1,19 @@ --- -title: Changefeed +title: Changefeed for TiDB Cloud Serverless summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. --- # Changefeed (Beta) -TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Serverless. > **Note:** > -> - Currently, you can manage changefeeds only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). +> - Currently, you can manage changefeeds for TiDB Cloud Serverless only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. -## View the Changefeed page +## List the changefeeds for your cluster To access the changefeed feature, using the TiDB Cloud CLI command: @@ -23,19 +23,19 @@ ticloud serverless changefeed list --cluster-id ## Create a changefeed -To create a changefeed, refer to the tutorials: +To create a changefeed, refer to the following document: - [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) ## Pause or resume a changefeed -To pause a changefeed, using the TiDB Cloud CLI command: +To pause a changefeed, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed pause --cluster-id --changefeed-id ``` -To resume a changefeed, using the TiDB Cloud CLI command: +To resume a changefeed, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed resume --cluster-id --changefeed-id @@ -47,7 +47,7 @@ ticloud serverless changefeed resume --cluster-id --changefeed-id < > > TiDB Cloud currently only allows editing changefeeds in the paused status. -To edit a changefeed sink to kafka, you can pause the changefeed first, and then edit it with the TiDB Cloud CLI command: +To edit a changefeed sink to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed edit --cluster-id --changefeed-id --name --kafka --filter @@ -55,7 +55,7 @@ ticloud serverless changefeed edit --cluster-id --changefeed-id --changefeed-id @@ -63,7 +63,7 @@ ticloud serverless changefeed delete --cluster-id --changefeed-id < ## Changefeed billing -Changefeed feature is free on beta now. +Currently, the changefeed feature for TiDB Cloud Serverless in beta and available for free. ## Changefeed states @@ -72,7 +72,7 @@ The state of a changefeed represents the running state of the changefeed. During The states are described as follows: - `CREATING`: the changefeed is being created. -- `CREATE_FAILED`: the changefeed creation fails, you need to delete the changefeed and create a new one. +- `CREATE_FAILED`: the changefeed creation fails. You need to delete the changefeed and create a new one. - `RUNNING`: the changefeed runs normally and the checkpoint-ts proceeds normally. - `PAUSED`: the changefeed is paused. - `WARNING`: the changefeed returns a warning. The changefeed cannot continue due to some recoverable errors. The changefeed in this state keeps trying to resume until the state transfers to `RUNNING`. The changefeed in this state blocks [GC operations](https://docs.pingcap.com/tidb/stable/garbage-collection-overview). diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index ad78fa4996c7d..834e15a178564 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -1,11 +1,11 @@ --- -title: Sink to Apache Kafka -summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. +title: Sink Data from TiDB Cloud Serverless to Apache Kafka +summary: This document explains how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. --- -# Sink to Apache Kafka +# Sink Data from TiDB Cloud Serverless to Apache Kafka -This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. +This document describes how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. ## Restrictions @@ -23,14 +23,14 @@ Before creating a changefeed to stream data to Apache Kafka, you need to complet ### Network -Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB cluster can only connect to Apache Kafka through the Public IP. +Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB Serverless clusters can only connect to Apache Kafka through public IP addresses. > **Note:** > > If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in "Apply for TiDB Cloud Serverless database audit logging" in the **Description** field and click **Submit**. -To provide Public IP access to your Apache Kafka service, assign Public IP addresses to all your Kafka brokers. +To enable public IP access to your Apache Kafka service, assign public IP addresses to all Kafka brokers. ### Kafka ACL authorization @@ -41,36 +41,38 @@ To allow TiDB Cloud changefeeds to stream data to Apache Kafka and create Kafka For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources](https://docs.confluent.io/platform/current/kafka/authorization.html#resources) and [Adding ACLs](https://docs.confluent.io/platform/current/kafka/authorization.html#adding-acls) in Confluent documentation for more information. -## Create a changefeed sink to Apache Kafka with TiDB Cloud CLI +## Create a changefeed with TiDB Cloud CLI -To create a changefeed to stream data from TiDB Cloud to Apache Kafka, using the TiDB Cloud CLI command: +To create a changefeed that streams data from TiDB Cloud to Apache Kafka, use the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed create --cluster-id --name --type KAFKA --kafka --filter --start-tso ``` - ``: the ID of the TiDB Cloud cluster that you want to create the changefeed for. -- ``: the name of the changefeed, it is optional. If you do not specify a name, TiDB Cloud automatically generates a name for the changefeed. -- `type`: the type of the changefeed, which is `KAFKA` in this case. -- `kafka`: a JSON string that contains the configurations for the changefeed to stream data to Apache Kafka. See [Kafka configurations](#kafka-configurations) for more information about the configurations. -- `filter:` a JSON string that contains the configurations for the changefeed to filter tables and events. See [Filter configurations](#filter-configurations) for more information about the configurations. -- `start-tso`: the TSO from which the changefeed starts to replicate data. If you do not specify a TSO, the current TSO is used by default. To learn more about TSO, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). +- `` (optional): the name of the changefeed. If not specified, TiDB Cloud automatically generates a changefeed name. +- `type`: the changefeed type. To stream data to Apache Kafka, set it to `KAFKA`. +- `kafka`: a JSON string that contains the configuration for streaming data to Apache Kafka. For more information, see [Kafka configurations](#kafka-configurations). +- `filter`: a JSON string that specifies which tables and events to replicate. For more information, see [Filter configurations](#filter-configurations). +- `start-tso`: the TSO from which the changefeed starts to replicate data. If not specified, the current TSO is used by default. For more information, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). ### Filter configurations -To get a template of `filter` configurations, using the TiDB Cloud CLI command: +You can specify `--filter ` to filter tables and events that you want to replicate. + +To get a template of `filter` configurations, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed template ``` -To get the explanation of the template, using the TiDB Cloud CLI command: +To view explanations of the template, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed template --explain ``` -The configurations in the `filter` JSON string are used to filter tables and events that you want to replicate. Below is an example of a `filter` configuration: +The following is an example `filter` configuration:
Example filter configuration @@ -89,25 +91,29 @@ The configurations in the `filter` JSON string are used to filter tables and eve ```
-1. **Filter Rule**: you can set `filter rules` to filter the tables that you want to replicate. See [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/) for more information about the rule syntax. -2. **Event Filter Rule**: you can set the `matcher` and `ignore_event` to ignore some events matching the rules. See [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules) to get all the supported event types. -3. **mode**: set mode to `IGNORE_NOT_SUPPORT_TABLE` to ignore the tables that do not support replication, such as the tables that do not have primary keys or unique indexes. set mode to `FORCE_SYNC` to force the changefeed to replicate all tables. +- `filterRule`: filters the tables to replicate. For the detailed rule syntax, see [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/). +- `eventFilterRule`: filters specific events for the matched tables. You can use the `matcher` field to specify the target tables, and use the `ignore_event` field to list the event types to skip. For supported event types, see [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules). +- `mode`: controls the behavior for unsupported tables. You can set it to one of the following: + - `IGNORE_NOT_SUPPORT_TABLE`: skip tables that do not support replication (for example, tables without primary or unique keys). + - `FORCE_SYNC`: force replication of all tables regardless of support status. ### Kafka configurations -To get a template of `kafka` configurations, using the TiDB Cloud CLI command: +You can specify `--kafka ` to configure how the changefeed streams data to Apache Kafka. + +To get a template of `kafka` configurations, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed template ``` -To get the explanation of the template, using the TiDB Cloud CLI command: +To view explanations of the template, run the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed template --explain ``` -The configurations in the `kafka` JSON string are used to configure how the changefeed streams data to Apache Kafka. Below is an example of a `kafka` configuration: +The following is an example `kafka` configuration:
Example kafka configuration @@ -173,32 +179,32 @@ The configurations in the `kafka` JSON string are used to configure how the chan The main configuration fields are as follows: -1. **network_info**: Only `PUBLIC` network type is supported for now. This means that the TiDB cluster can connect to the Apache Kafka service through the Public IP. - -2. **broker**: Contains Kafka broker connection information: - - - `kafka_version`: The Kafka version, support `VERSION_2XX` and `VERSION_3XX`. - - `broker_endpoints`: Comma-separated list of broker endpoints. - - `tls_enable`: Whether to enable TLS for the connection. - - `compression`: The compression type for messages, support `NONE`, `GZIP`, `LZ4`, `SNAPPY`, and `ZSTD`. +- `network_info`: currently, only the `PUBLIC` network type is supported. This means that the TiDB Cloud Serverless clusters can only connect to the Apache Kafka service through public IP addresses. + +- `broker`: specifies the Kafka broker connection information. + + - `kafka_version`: the Kafka version. Supported values: `VERSION_2XX` or `VERSION_3XX`. + - `broker_endpoints`: a comma-separated list of Kafka brokers. + - `tls_enable`: whether to enable TLS for the connection. + - `compression`: the message compression type. Supported values: `NONE`, `GZIP`, `LZ4`, `SNAPPY`, or `ZSTD`. + +- `authentication`: specifies the Kafka authentication method. Supported values of `auth_type`: `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. If you set `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`, `user_name` and `password` are required. -3. **authentication**: Authentication settings for connecting to Kafka, support `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256` and `SASL_SCRAM_SHA_512`. The `user_name` and `password` fields are required if you set the `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. - -4. **data_format.protocol**: Support `CANAL_JSON`, `AVRO`, and `OPEN_PROTOCOL`. +- `data_format.protocol`: specifies the the output format. - - Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). - - Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). - - Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). + - `AVRO`: Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). + - `CANAL_JSON`: Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). + - `OPEN_PROTOCOL`: Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). -5. **data_format.enable_tidb_extension**: if you want to add TiDB-extension fields to the Kafka message body with `AVRO` or `CANAL_JSON` data format. +- `data_format.enable_tidb_extension`: controls whether to include TiDB-specific extension fields in Kafka messages when using the `AVRO` or `CANAL_JSON` format. - For more information about TiDB-extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). + For more information about TiDB extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). -6. **data_format.avro_config**: If you select **Avro** as your data format, you need to set the Avro-specific configurations: +- `data_format.avro_config`: if you select **Avro** as your data format, you need to set the Avro-specific configurations. - - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: specify how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. - - `confluent_schema_registry`: the configuration of confluent schema registry. If auth is required, please set `enable_http_auth` to true and full in the user name and password. See [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html) for more information. - - `aws_glue_schema_registry`: the configuration of AWS Glue schema registry. If you want to use AWS Glue schema registry, you need to set the region, name, access key ID, and secret access key. See [AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html) for more information. + - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: controls how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. + - `confluent_schema_registry`: the configuration for confluent schema registry. If authentication is required, set `enable_http_auth` to `true` and configure the `user_name` and `password`. For more information, see [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). + - `aws_glue_schema_registry`: the configuration for AWS Glue schema registry. If you want to use AWS Glue schema registry, set `region`, `name`, `access_key_id`, and `secret_access_key` accordingly. For more information, see [AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html). For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). @@ -224,33 +230,33 @@ The main configuration fields are as follows: > > If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. -8. **topic_partition_config.replication_factor**: controls how many Kafka servers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the number of Kafka brokers. +- `topic_partition_config.replication_factor`: controls how many Kafka brokers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the total number of Kafka brokers. -9. **topic_partition_config.partition_num**: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the number of Kafka brokers]`. +- `topic_partition_config.partition_num`: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the total number of Kafka brokers]`. -10. **topic_partition_config.partition_dispatchers**: decide which partition a Kafka message will be sent to. `partition_type` Support `TABLE`, `INDEX_VALUE`, `TS` and `COLUMN`. +- `topic_partition_config.partition_dispatchers`: controls which partition a Kafka message will be sent to. Support values: `INDEX_VALUE`, `TABLE`, `TS` and `COLUMN`. - - **Distribute changelogs by primary key or index value to Kafka partition** + - `INDEX_VALUE`: distributes changelogs by primary key or index value to Kafka partitions. If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `INDEX_VALUE` and set the `index_name`. The primary key or index value of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures row-level orderliness. - - **Distribute changelogs by table to Kafka partition** + - `TABLE`: distributes changelogs by table to Kafka partitions. If you want the changefeed to send Kafka messages of a table to one Kafka partition, set `partition_type` to `TABLE`. The table name of a row changelog will determine which partition the changelog is sent to. This distribution method ensures table orderliness but might cause unbalanced partitions. - - **Distribute changelogs by timestamp to Kafka partition** + - `TS`: distributes changelogs by timestamp to Kafka partitions. If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. - - **Distribute changelogs by column value to Kafka partition** + - `COLUMN`: distributes changelogs by column value to Kafka partitions. If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelog with the same column values is send to the same partition. For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). -11. **column_selectors**: columns from events and send only the data changes related to those columns to the downstream. +11. **column_selectors**: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. - - `matcher`: specify which tables the column selector applies to. For tables that do not match any rule, all columns are sent. - - `columns`: specify which columns of the matched tables will be sent to the downstream. + - `matcher`: specifies which tables the column selector applies to. For tables that do not match any rule, all columns are sent. + - `columns`: specifies which columns of the matched tables will be sent to the downstream. For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). From 751008f058753908eb70271f1f01e748cc37b355 Mon Sep 17 00:00:00 2001 From: shi yuhang <52435083+shiyuhang0@users.noreply.github.com> Date: Thu, 3 Jul 2025 14:56:32 +0800 Subject: [PATCH 10/20] Apply suggestions from code review Co-authored-by: Grace Cai --- .../serverless-changefeed-sink-to-apache-kafka.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 834e15a178564..c4708b55e0f5a 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -208,21 +208,21 @@ The main configuration fields are as follows: For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). -7. **topic_partition_config.dispatch_type**: Support `ONE_TOPIC`, `BY_TABLE` and `BY_DATABASE`. Controls how the changefeed creates Kafka topics, by table, by database, or creating one topic for all changelogs. +- `topic_partition_config.dispatch_type`: controls how the changefeed creates Kafka topics. Support values: `ONE_TOPIC`, `BY_TABLE`, or `BY_DATABASE`. If you use the `AVRO` data format, only the `BY_TABLE` dispatch type is supported. - - **Distribute changelogs by table to Kafka Topics** + - `BY_TABLE`: distributes changelogs by table to Kafka topics. - If you want the changefeed to create a dedicated Kafka topic for each table, set `dispatch_type` to `BY_TABLE`. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` and between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. + If you want the changefeed to create a dedicated Kafka topic for each table, set `dispatch_type` to `BY_TABLE`. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. For changelogs of non-row events, such as Create Schema Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. - - **Distribute changelogs by database to Kafka Topics** + - `BY_DATABASE`: distributes changelogs by database to Kafka topics. If you want the changefeed to create a dedicated Kafka topic for each database, set `dispatch_type` to `BY_DATABASE`. Then, all Kafka messages of a database are sent to a dedicated Kafka topic. You can customize topic names of databases by setting a `topic_prefix` and a `topic_suffix`. For changelogs of non-row events, such as Resolved Ts Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. - - **Send all changelogs to one specified Kafka Topic** + - `ONE_TOPIC`: sends all changelogs to one specified Kafka topic. If you want the changefeed to create one Kafka topic for all changelogs, set `dispatch_type` to `ONE_TOPIC`. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. From 4241a158f5443a0aad0211943f5beb4be6727ace Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Thu, 3 Jul 2025 15:00:27 +0800 Subject: [PATCH 11/20] fix according to the review --- .../serverless-changefeed-sink-to-apache-kafka.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index c4708b55e0f5a..810132ac48223 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -27,7 +27,7 @@ Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently > **Note:** > -> If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in "Apply for TiDB Cloud Serverless database audit logging" in the **Description** field and click **Submit**. +> If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in your request in the **Description** field and click **Submit**. To enable public IP access to your Apache Kafka service, assign public IP addresses to all Kafka brokers. @@ -196,6 +196,10 @@ The main configuration fields are as follows: - `CANAL_JSON`: Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). - `OPEN_PROTOCOL`: Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). +> Note +> +> If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. + - `data_format.enable_tidb_extension`: controls whether to include TiDB-specific extension fields in Kafka messages when using the `AVRO` or `CANAL_JSON` format. For more information about TiDB extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). @@ -226,10 +230,6 @@ The main configuration fields are as follows: If you want the changefeed to create one Kafka topic for all changelogs, set `dispatch_type` to `ONE_TOPIC`. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. -> Note -> -> If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. - - `topic_partition_config.replication_factor`: controls how many Kafka brokers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the total number of Kafka brokers. - `topic_partition_config.partition_num`: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the total number of Kafka brokers]`. @@ -254,7 +254,7 @@ The main configuration fields are as follows: For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). -11. **column_selectors**: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. +1. **column_selectors**: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. - `matcher`: specifies which tables the column selector applies to. For tables that do not match any rule, all columns are sent. - `columns`: specifies which columns of the matched tables will be sent to the downstream. From db27e686b638daf2c175a80fc37716c457fa838b Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Thu, 3 Jul 2025 15:07:12 +0800 Subject: [PATCH 12/20] fix according to the review --- .../serverless-changefeed-sink-to-apache-kafka.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 810132ac48223..9b2fe31a83ec0 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -74,9 +74,6 @@ ticloud serverless changefeed template --explain The following is an example `filter` configuration: -
-Example filter configuration - ```json { "filterRule": ["test.t1", "test.t2"], @@ -89,7 +86,6 @@ The following is an example `filter` configuration: ] } ``` -
- `filterRule`: filters the tables to replicate. For the detailed rule syntax, see [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/). - `eventFilterRule`: filters specific events for the matched tables. You can use the `matcher` field to specify the target tables, and use the `ignore_event` field to list the event types to skip. For supported event types, see [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules). @@ -115,9 +111,6 @@ ticloud serverless changefeed template --explain The following is an example `kafka` configuration: -
-Example kafka configuration - ```json { "network_info": { @@ -175,7 +168,6 @@ The following is an example `kafka` configuration: }] } ``` -
The main configuration fields are as follows: @@ -196,10 +188,6 @@ The main configuration fields are as follows: - `CANAL_JSON`: Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). - `OPEN_PROTOCOL`: Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). -> Note -> -> If you use `AVRO` data format, only `BY_TABLE` dispatch type is supported. - - `data_format.enable_tidb_extension`: controls whether to include TiDB-specific extension fields in Kafka messages when using the `AVRO` or `CANAL_JSON` format. For more information about TiDB extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). From c5272230f6e8b76c4d3e3b2246e219fbcd509352 Mon Sep 17 00:00:00 2001 From: shiyuhang <1136742008@qq.com> Date: Thu, 3 Jul 2025 15:15:35 +0800 Subject: [PATCH 13/20] fix verify --- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index 9b2fe31a83ec0..d71e661655282 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -29,7 +29,6 @@ Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently > > If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in your request in the **Description** field and click **Submit**. - To enable public IP access to your Apache Kafka service, assign public IP addresses to all Kafka brokers. ### Kafka ACL authorization @@ -90,6 +89,7 @@ The following is an example `filter` configuration: - `filterRule`: filters the tables to replicate. For the detailed rule syntax, see [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/). - `eventFilterRule`: filters specific events for the matched tables. You can use the `matcher` field to specify the target tables, and use the `ignore_event` field to list the event types to skip. For supported event types, see [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules). - `mode`: controls the behavior for unsupported tables. You can set it to one of the following: + - `IGNORE_NOT_SUPPORT_TABLE`: skip tables that do not support replication (for example, tables without primary or unique keys). - `FORCE_SYNC`: force replication of all tables regardless of support status. @@ -242,7 +242,7 @@ The main configuration fields are as follows: For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). -1. **column_selectors**: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. +- `column_selectors`: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. - `matcher`: specifies which tables the column selector applies to. For tables that do not match any rule, all columns are sent. - `columns`: specifies which columns of the matched tables will be sent to the downstream. From 3d954406b94a08c3b4d97a569f4f06311a96e21d Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 3 Jul 2025 17:38:50 +0800 Subject: [PATCH 14/20] fix grammar --- .../serverless-changefeed-sink-to-apache-kafka.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index d71e661655282..e29600bf22427 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -18,8 +18,8 @@ This document describes how to create a changefeed to stream data from TiDB Clou Before creating a changefeed to stream data to Apache Kafka, you need to complete the following prerequisites: -- Set up your network connection -- Add permissions for Kafka ACL authorization +- Set up your network connection. +- Add permissions for Kafka ACL authorization. ### Network @@ -182,7 +182,7 @@ The main configuration fields are as follows: - `authentication`: specifies the Kafka authentication method. Supported values of `auth_type`: `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. If you set `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`, `user_name` and `password` are required. -- `data_format.protocol`: specifies the the output format. +- `data_format.protocol`: specifies the output format. - `AVRO`: Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). - `CANAL_JSON`: Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). @@ -200,7 +200,7 @@ The main configuration fields are as follows: For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). -- `topic_partition_config.dispatch_type`: controls how the changefeed creates Kafka topics. Support values: `ONE_TOPIC`, `BY_TABLE`, or `BY_DATABASE`. If you use the `AVRO` data format, only the `BY_TABLE` dispatch type is supported. +- `topic_partition_config.dispatch_type`: controls how the changefeed creates Kafka topics. Supported values: `ONE_TOPIC`, `BY_TABLE`, or `BY_DATABASE`. If you use the `AVRO` data format, only the `BY_TABLE` dispatch type is supported. - `BY_TABLE`: distributes changelogs by table to Kafka topics. @@ -238,7 +238,7 @@ The main configuration fields are as follows: - `COLUMN`: distributes changelogs by column value to Kafka partitions. - If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelog with the same column values is send to the same partition. + If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelogs with the same column values are sent to the same partition. For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). From 849768429d0d54b521a14fcb694cb1bbd65a7900 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 3 Jul 2025 17:44:55 +0800 Subject: [PATCH 15/20] Apply suggestions from code review --- tidb-cloud/serverless-changefeed-overview.md | 22 +++++++++----------- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md index b6fc7fdd758b2..4f52372677398 100644 --- a/tidb-cloud/serverless-changefeed-overview.md +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -3,7 +3,7 @@ title: Changefeed for TiDB Cloud Serverless summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. --- -# Changefeed (Beta) +# Changefeed for TiDB Cloud Serverless (Beta) TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Serverless. @@ -13,20 +13,18 @@ TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data servic > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. -## List the changefeeds for your cluster - -To access the changefeed feature, using the TiDB Cloud CLI command: - -```bash -ticloud serverless changefeed list --cluster-id -``` - ## Create a changefeed To create a changefeed, refer to the following document: - [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) +## List the changefeeds for your cluster + +To list the changefeeds for your cluster, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed list --cluster-id ## Pause or resume a changefeed To pause a changefeed, run the following TiDB Cloud CLI command: @@ -47,7 +45,7 @@ ticloud serverless changefeed resume --cluster-id --changefeed-id < > > TiDB Cloud currently only allows editing changefeeds in the paused status. -To edit a changefeed sink to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: +To edit a changefeed to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: ```bash ticloud serverless changefeed edit --cluster-id --changefeed-id --name --kafka --filter @@ -63,11 +61,11 @@ ticloud serverless changefeed delete --cluster-id --changefeed-id < ## Changefeed billing -Currently, the changefeed feature for TiDB Cloud Serverless in beta and available for free. +Currently, the changefeed feature for TiDB Cloud Serverless is in beta and available for free. ## Changefeed states -The state of a changefeed represents the running state of the changefeed. During the running process, changefeed might fail with errors, be manually paused or resumed. These behaviors can lead to changes of the changefeed state. +The state of a changefeed represents the running state of the changefeed. During the running process, the changefeed might fail with errors, or be manually paused or resumed. These behaviors can lead to changes of the changefeed state. The states are described as follows: From 7b0edfa6b48d3bed6ebabe433ffd2f03629d7723 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 3 Jul 2025 17:45:19 +0800 Subject: [PATCH 16/20] Update tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md --- tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index e29600bf22427..b07fac5ea2db9 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -27,7 +27,7 @@ Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently > **Note:** > -> If you want to expose your Apache Kafka through a more secure method, such as private link or VPC peering, please contact us for help. To request it, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in your request in the **Description** field and click **Submit**. +> If you want to expose your Apache Kafka through a more secure method, such as Private Link or VPC peering, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in your request in the **Description** field and click **Submit**. To enable public IP access to your Apache Kafka service, assign public IP addresses to all Kafka brokers. From 87b65dfbb68173a7256adba28c623dce42618e4d Mon Sep 17 00:00:00 2001 From: Test User Date: Thu, 3 Jul 2025 18:12:12 +0800 Subject: [PATCH 17/20] update TOC and related docs --- TOC-tidb-cloud.md | 16 ++++++++++------ tidb-cloud/changefeed-overview.md | 8 ++++---- tidb-cloud/changefeed-sink-to-apache-kafka.md | 8 ++++---- tidb-cloud/changefeed-sink-to-apache-pulsar.md | 8 ++++---- tidb-cloud/changefeed-sink-to-cloud-storage.md | 8 ++++---- tidb-cloud/changefeed-sink-to-mysql.md | 8 ++++---- tidb-cloud/changefeed-sink-to-tidb-cloud.md | 6 +++--- tidb-cloud/serverless-changefeed-overview.md | 1 + ...serverless-changefeed-sink-to-apache-kafka.md | 8 ++++++-- 9 files changed, 40 insertions(+), 31 deletions(-) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 5d0e99f33ea86..4b11decc90036 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -327,12 +327,16 @@ - [Data App Configuration Files](/tidb-cloud/data-service-app-config-files.md) - [Response and Status Code](/tidb-cloud/data-service-response-and-status-code.md) - Stream Data - - [Changefeed Overview](/tidb-cloud/changefeed-overview.md) - - [To MySQL Sink](/tidb-cloud/changefeed-sink-to-mysql.md) - - [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) - - [To Pulsar Sink](/tidb-cloud/changefeed-sink-to-apache-pulsar.md) - - [To TiDB Cloud Sink](/tidb-cloud/changefeed-sink-to-tidb-cloud.md) - - [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md) + - TiDB Cloud Serverless + - [Changefeed Overview](/tidb-cloud/serverless-changefeed-overview.md) + - [To Kafka Sink](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) + - TiDB Cloud Dedicated + - [Changefeed Overview](/tidb-cloud/changefeed-overview.md) + - [To MySQL Sink](/tidb-cloud/changefeed-sink-to-mysql.md) + - [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) + - [To Pulsar Sink](/tidb-cloud/changefeed-sink-to-apache-pulsar.md) + - [To TiDB Cloud Sink](/tidb-cloud/changefeed-sink-to-tidb-cloud.md) + - [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md) - Reference - [Set Up Self-Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-aws-self-hosted-kafka-private-link-service.md) - [Set Up Self-Hosted Kafka Private Link Service in Azure](/tidb-cloud/setup-azure-self-hosted-kafka-private-link-service.md) diff --git a/tidb-cloud/changefeed-overview.md b/tidb-cloud/changefeed-overview.md index dae0f79a02cfd..500e92e91095f 100644 --- a/tidb-cloud/changefeed-overview.md +++ b/tidb-cloud/changefeed-overview.md @@ -1,17 +1,17 @@ --- -title: Changefeed +title: Changefeed for TiDB Cloud Dedicated summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. --- -# Changefeed +# Changefeed for TiDB Cloud Dedicated -TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Dedicated. Currently, TiDB Cloud Dedicated supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. > **Note:** > > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), see [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md). ## View the Changefeed page diff --git a/tidb-cloud/changefeed-sink-to-apache-kafka.md b/tidb-cloud/changefeed-sink-to-apache-kafka.md index 469568f013490..8deb3f31e22ca 100644 --- a/tidb-cloud/changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/changefeed-sink-to-apache-kafka.md @@ -1,16 +1,16 @@ --- -title: Sink to Apache Kafka +title: Stream Data from TiDB Cloud Dedicated to Apache Kafka summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. --- -# Sink to Apache Kafka +# Stream Data from TiDB Cloud Dedicated to Apache Kafka -This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to Apache Kafka. > **Note:** > > - To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), see [Stream Data from TiDB Cloud Serverless to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md). ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-apache-pulsar.md b/tidb-cloud/changefeed-sink-to-apache-pulsar.md index 4a37ba2957d0e..72619e9e2c652 100644 --- a/tidb-cloud/changefeed-sink-to-apache-pulsar.md +++ b/tidb-cloud/changefeed-sink-to-apache-pulsar.md @@ -1,16 +1,16 @@ --- -title: Sink to Apache Pulsar +title: Stream Data from TiDB Cloud Dedicated to Apache Pulsar summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Pulsar. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Pulsar. The process involves setting up network connections and configuring the changefeed specification. --- -# Sink to Apache Pulsar +# Stream Data from TiDB Cloud Dedicated to Apache Pulsar -This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Pulsar. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to Apache Pulsar. > **Note:** > > - To replicate data to Apache Pulsar using the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v7.5.1 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), streaming data to Apache Pulsar is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-cloud-storage.md b/tidb-cloud/changefeed-sink-to-cloud-storage.md index 91616d3f28de4..b01ccbc331b8e 100644 --- a/tidb-cloud/changefeed-sink-to-cloud-storage.md +++ b/tidb-cloud/changefeed-sink-to-cloud-storage.md @@ -1,16 +1,16 @@ --- -title: Sink to Cloud Storage +title: Stream Data from TiDB Cloud Dedicated to Cloud Storage summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Amazon S3 or GCS. It includes restrictions, configuration steps for the destination, replication, and specification, as well as starting the replication process. --- -# Sink to Cloud Storage +# Stream Data from TiDB Cloud Dedicated to Cloud Storage -This document describes how to create a changefeed to stream data from TiDB Cloud to cloud storage. Currently, Amazon S3 and GCS are supported. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to cloud storage. Currently, Amazon S3 and GCS are supported. > **Note:** > > - To stream data to cloud storage, make sure that your TiDB cluster version is v7.1.1 or later. To upgrade your TiDB Cloud Dedicated cluster to v7.1.1 or later, [contact TiDB Cloud Support](/tidb-cloud/tidb-cloud-support.md). -> - For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters, the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters, streaming data to cloud storage is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-mysql.md b/tidb-cloud/changefeed-sink-to-mysql.md index 005bf470b737a..b75ef3c97087e 100644 --- a/tidb-cloud/changefeed-sink-to-mysql.md +++ b/tidb-cloud/changefeed-sink-to-mysql.md @@ -1,16 +1,16 @@ --- -title: Sink to MySQL +title: Stream Data from TiDB Cloud Dedicated to MySQL summary: This document explains how to stream data from TiDB Cloud to MySQL using the Sink to MySQL changefeed. It includes restrictions, prerequisites, and steps to create a MySQL sink for data replication. The process involves setting up network connections, loading existing data to MySQL, and creating target tables in MySQL. After completing the prerequisites, users can create a MySQL sink to replicate data to MySQL. --- -# Sink to MySQL +# Stream Data from TiDB Cloud Dedicated to MySQL -This document describes how to stream data from TiDB Cloud to MySQL using the **Sink to MySQL** changefeed. +This document describes how to stream data from TiDB Cloud Dedicated to MySQL using the **Sink to MySQL** changefeed. > **Note:** > > - To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), streaming data to MySQL is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-tidb-cloud.md b/tidb-cloud/changefeed-sink-to-tidb-cloud.md index 1412c1cdf7c59..7fc2c8cfb8fa2 100644 --- a/tidb-cloud/changefeed-sink-to-tidb-cloud.md +++ b/tidb-cloud/changefeed-sink-to-tidb-cloud.md @@ -1,15 +1,15 @@ --- -title: Sink to TiDB Cloud +title: Stream Data from TiDB Cloud Dedicated to TiDB Cloud Serverless summary: This document explains how to stream data from a TiDB Cloud Dedicated cluster to a TiDB Cloud Serverless cluster. There are restrictions on the number of changefeeds and regions available for the feature. Prerequisites include extending tidb_gc_life_time, backing up data, and obtaining the start position of TiDB Cloud sink. To create a TiDB Cloud sink, navigate to the cluster overview page, establish the connection, customize table and event filters, fill in the start replication position, specify the changefeed specification, review the configuration, and create the sink. Finally, restore tidb_gc_life_time to its original value. --- -# Sink to TiDB Cloud +# Stream Data from TiDB Cloud Dedicated to TiDB Cloud Serverless This document describes how to stream data from a TiDB Cloud Dedicated cluster to a TiDB Cloud Serverless cluster. > **Note:** > -> To use the Changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. +> To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. ## Restrictions diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md index 4f52372677398..4cdd793e0a243 100644 --- a/tidb-cloud/serverless-changefeed-overview.md +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -12,6 +12,7 @@ TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data servic > - Currently, you can manage changefeeds for TiDB Cloud Serverless only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. +> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md). ## Create a changefeed diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md index b07fac5ea2db9..7f1ef6495c06b 100644 --- a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -1,12 +1,16 @@ --- -title: Sink Data from TiDB Cloud Serverless to Apache Kafka +title: Stream Data from TiDB Cloud Serverless to Apache Kafka summary: This document explains how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. --- -# Sink Data from TiDB Cloud Serverless to Apache Kafka +# Stream Data from TiDB Cloud Serverless to Apache Kafka This document describes how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. +> **Note:** +> +> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Stream Data from TiDB Cloud Dedicated to Apache Kafka](/tidb-cloud/changefeed-sink-to-apache-kafka.md). + ## Restrictions - For each TiDB Cloud cluster, you can create up to 100 changefeeds. From 7560061505e5fc4d54ec9b4780214228f9083e5e Mon Sep 17 00:00:00 2001 From: Test User Date: Fri, 4 Jul 2025 12:06:29 +0800 Subject: [PATCH 18/20] update the general docs about changefeeds --- tidb-cloud/data-streaming-concepts.md | 14 +++++++++----- tidb-cloud/serverless-limitations.md | 2 +- tidb-cloud/tidb-cloud-billing-ticdc-rcu.md | 4 ++++ 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/tidb-cloud/data-streaming-concepts.md b/tidb-cloud/data-streaming-concepts.md index 32c859f92470f..54e43be1725b4 100644 --- a/tidb-cloud/data-streaming-concepts.md +++ b/tidb-cloud/data-streaming-concepts.md @@ -5,18 +5,22 @@ summary: Learn about data streaming concepts for TiDB Cloud. # Data Streaming -TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems like Kafka, MySQL, and object storage. +TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems such as Apache Kafka, MySQL, and object storage. -Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud, and cloud storage. +- For TiDB Cloud Dedicated, you can stream data to Apache Kafka, Apache Pulsar, MySQL, TiDB Cloud Serverless, and cloud storage. +- For TiDB Cloud Serverless, you can stream data to Apache Kafka. ## Changefeed TiDB Cloud changefeed is a continuous data stream that helps you replicate data changes from TiDB Cloud to other data services. -On the **Changefeed** page in the TiDB Cloud console, you can create a changefeed, view a list of existing changefeeds, and operate the existing changefeeds (such as scaling, pausing, resuming, editing, and deleting a changefeed). +- For TiDB Cloud Dedicated, you can access the changefeed feature on the **Changefeed** page in the [TiDB Cloud console](https://tidbcloud.com/). +- For TiDB Cloud Serverless, you can use the changefeed feature in [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli). + +You can create a changefeed, view a list of existing changefeeds, and operate the existing changefeeds (such as scaling, pausing, resuming, editing, and deleting a changefeed). Replication includes only incremental data changes by default. If existing data must be replicated, it must be exported and loaded into the target system manually before starting the changefeed. -In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events like INSERT or DELETE). +In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events such as `INSERT` or `DELETE`). -For more information, see [Changefeed](/tidb-cloud/changefeed-overview.md). \ No newline at end of file +For more information, see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md) and [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md) . \ No newline at end of file diff --git a/tidb-cloud/serverless-limitations.md b/tidb-cloud/serverless-limitations.md index 40bf6a078bfbc..c4aec974342ad 100644 --- a/tidb-cloud/serverless-limitations.md +++ b/tidb-cloud/serverless-limitations.md @@ -45,7 +45,7 @@ We are constantly filling in the feature gaps between TiDB Cloud Serverless and ### Stream data -- [Changefeed](/tidb-cloud/changefeed-overview.md) is not supported for TiDB Cloud Serverless currently. +- You can manage changefeeds for TiDB Cloud Serverless only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md) currently. - [Data Migration](/tidb-cloud/migrate-from-mysql-using-data-migration.md) is not supported for TiDB Cloud Serverless currently. ### Time to live (TTL) diff --git a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md index ee5d21a3c55d2..bb374cad1c296 100644 --- a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md +++ b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md @@ -6,6 +6,10 @@ aliases: ['/tidbcloud/tidb-cloud-billing-tcu'] # Changefeed Billing +> **Note:** +> +> This document is only applicable to [TiDB Cloud Dedicated](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated). For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is currently in beta and available for free. + ## RCU cost TiDB Cloud measures the capacity of [changefeeds](/tidb-cloud/changefeed-overview.md) in TiCDC Replication Capacity Units (RCUs). When you [create a changefeed](/tidb-cloud/changefeed-overview.md#create-a-changefeed) for a cluster, you can select an appropriate specification. The higher the RCU, the better the replication performance. You will be charged for these TiCDC changefeed RCUs. From 1e186f2211f847990c4c088eda830c3e54c14366 Mon Sep 17 00:00:00 2001 From: Test User Date: Fri, 4 Jul 2025 12:09:59 +0800 Subject: [PATCH 19/20] minor format updates --- tidb-cloud/data-streaming-concepts.md | 2 +- tidb-cloud/tidb-cloud-billing-ticdc-rcu.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-cloud/data-streaming-concepts.md b/tidb-cloud/data-streaming-concepts.md index 54e43be1725b4..892c6f0b113c4 100644 --- a/tidb-cloud/data-streaming-concepts.md +++ b/tidb-cloud/data-streaming-concepts.md @@ -23,4 +23,4 @@ Replication includes only incremental data changes by default. If existing data In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events such as `INSERT` or `DELETE`). -For more information, see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md) and [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md) . \ No newline at end of file +For more information, see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md) and [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md). \ No newline at end of file diff --git a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md index bb374cad1c296..6bf9afec474ea 100644 --- a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md +++ b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md @@ -8,7 +8,7 @@ aliases: ['/tidbcloud/tidb-cloud-billing-tcu'] > **Note:** > -> This document is only applicable to [TiDB Cloud Dedicated](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated). For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is currently in beta and available for free. +> This document is only applicable to TiDB Cloud Dedicated. For TiDB Cloud Serverless, the [changefeed](/tidb-cloud/serverless-changefeed-overview.md) feature is currently in beta and available for free. ## RCU cost From fd04464407e7ba2a367dacb18477b3da70fd94a9 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 4 Jul 2025 14:09:14 +0800 Subject: [PATCH 20/20] add beta --- TOC-tidb-cloud.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 4b11decc90036..ccf0a1a8c2e8f 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -327,10 +327,10 @@ - [Data App Configuration Files](/tidb-cloud/data-service-app-config-files.md) - [Response and Status Code](/tidb-cloud/data-service-response-and-status-code.md) - Stream Data - - TiDB Cloud Serverless + - TiDB Cloud Serverless Changefeeds ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png) - [Changefeed Overview](/tidb-cloud/serverless-changefeed-overview.md) - [To Kafka Sink](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) - - TiDB Cloud Dedicated + - TiDB Cloud Dedicated Changefeeds - [Changefeed Overview](/tidb-cloud/changefeed-overview.md) - [To MySQL Sink](/tidb-cloud/changefeed-sink-to-mysql.md) - [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md)