diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 5d0e99f33ea86..ccf0a1a8c2e8f 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -327,12 +327,16 @@ - [Data App Configuration Files](/tidb-cloud/data-service-app-config-files.md) - [Response and Status Code](/tidb-cloud/data-service-response-and-status-code.md) - Stream Data - - [Changefeed Overview](/tidb-cloud/changefeed-overview.md) - - [To MySQL Sink](/tidb-cloud/changefeed-sink-to-mysql.md) - - [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) - - [To Pulsar Sink](/tidb-cloud/changefeed-sink-to-apache-pulsar.md) - - [To TiDB Cloud Sink](/tidb-cloud/changefeed-sink-to-tidb-cloud.md) - - [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md) + - TiDB Cloud Serverless Changefeeds ![BETA](/media/tidb-cloud/blank_transparent_placeholder.png) + - [Changefeed Overview](/tidb-cloud/serverless-changefeed-overview.md) + - [To Kafka Sink](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) + - TiDB Cloud Dedicated Changefeeds + - [Changefeed Overview](/tidb-cloud/changefeed-overview.md) + - [To MySQL Sink](/tidb-cloud/changefeed-sink-to-mysql.md) + - [To Kafka Sink](/tidb-cloud/changefeed-sink-to-apache-kafka.md) + - [To Pulsar Sink](/tidb-cloud/changefeed-sink-to-apache-pulsar.md) + - [To TiDB Cloud Sink](/tidb-cloud/changefeed-sink-to-tidb-cloud.md) + - [To Cloud Storage](/tidb-cloud/changefeed-sink-to-cloud-storage.md) - Reference - [Set Up Self-Hosted Kafka Private Link Service in AWS](/tidb-cloud/setup-aws-self-hosted-kafka-private-link-service.md) - [Set Up Self-Hosted Kafka Private Link Service in Azure](/tidb-cloud/setup-azure-self-hosted-kafka-private-link-service.md) diff --git a/tidb-cloud/changefeed-overview.md b/tidb-cloud/changefeed-overview.md index dae0f79a02cfd..500e92e91095f 100644 --- a/tidb-cloud/changefeed-overview.md +++ b/tidb-cloud/changefeed-overview.md @@ -1,17 +1,17 @@ --- -title: Changefeed +title: Changefeed for TiDB Cloud Dedicated summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. --- -# Changefeed +# Changefeed for TiDB Cloud Dedicated -TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Dedicated. Currently, TiDB Cloud Dedicated supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage. > **Note:** > > - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. > - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), see [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md). ## View the Changefeed page diff --git a/tidb-cloud/changefeed-sink-to-apache-kafka.md b/tidb-cloud/changefeed-sink-to-apache-kafka.md index 469568f013490..8deb3f31e22ca 100644 --- a/tidb-cloud/changefeed-sink-to-apache-kafka.md +++ b/tidb-cloud/changefeed-sink-to-apache-kafka.md @@ -1,16 +1,16 @@ --- -title: Sink to Apache Kafka +title: Stream Data from TiDB Cloud Dedicated to Apache Kafka summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. --- -# Sink to Apache Kafka +# Stream Data from TiDB Cloud Dedicated to Apache Kafka -This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Kafka. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to Apache Kafka. > **Note:** > > - To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), see [Stream Data from TiDB Cloud Serverless to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md). ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-apache-pulsar.md b/tidb-cloud/changefeed-sink-to-apache-pulsar.md index 4a37ba2957d0e..72619e9e2c652 100644 --- a/tidb-cloud/changefeed-sink-to-apache-pulsar.md +++ b/tidb-cloud/changefeed-sink-to-apache-pulsar.md @@ -1,16 +1,16 @@ --- -title: Sink to Apache Pulsar +title: Stream Data from TiDB Cloud Dedicated to Apache Pulsar summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Apache Pulsar. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Pulsar. The process involves setting up network connections and configuring the changefeed specification. --- -# Sink to Apache Pulsar +# Stream Data from TiDB Cloud Dedicated to Apache Pulsar -This document describes how to create a changefeed to stream data from TiDB Cloud to Apache Pulsar. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to Apache Pulsar. > **Note:** > > - To replicate data to Apache Pulsar using the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v7.5.1 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), streaming data to Apache Pulsar is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-cloud-storage.md b/tidb-cloud/changefeed-sink-to-cloud-storage.md index 91616d3f28de4..b01ccbc331b8e 100644 --- a/tidb-cloud/changefeed-sink-to-cloud-storage.md +++ b/tidb-cloud/changefeed-sink-to-cloud-storage.md @@ -1,16 +1,16 @@ --- -title: Sink to Cloud Storage +title: Stream Data from TiDB Cloud Dedicated to Cloud Storage summary: This document explains how to create a changefeed to stream data from TiDB Cloud to Amazon S3 or GCS. It includes restrictions, configuration steps for the destination, replication, and specification, as well as starting the replication process. --- -# Sink to Cloud Storage +# Stream Data from TiDB Cloud Dedicated to Cloud Storage -This document describes how to create a changefeed to stream data from TiDB Cloud to cloud storage. Currently, Amazon S3 and GCS are supported. +This document describes how to create a changefeed to stream data from TiDB Cloud Dedicated to cloud storage. Currently, Amazon S3 and GCS are supported. > **Note:** > > - To stream data to cloud storage, make sure that your TiDB cluster version is v7.1.1 or later. To upgrade your TiDB Cloud Dedicated cluster to v7.1.1 or later, [contact TiDB Cloud Support](/tidb-cloud/tidb-cloud-support.md). -> - For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters, the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters, streaming data to cloud storage is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-mysql.md b/tidb-cloud/changefeed-sink-to-mysql.md index 005bf470b737a..b75ef3c97087e 100644 --- a/tidb-cloud/changefeed-sink-to-mysql.md +++ b/tidb-cloud/changefeed-sink-to-mysql.md @@ -1,16 +1,16 @@ --- -title: Sink to MySQL +title: Stream Data from TiDB Cloud Dedicated to MySQL summary: This document explains how to stream data from TiDB Cloud to MySQL using the Sink to MySQL changefeed. It includes restrictions, prerequisites, and steps to create a MySQL sink for data replication. The process involves setting up network connections, loading existing data to MySQL, and creating target tables in MySQL. After completing the prerequisites, users can create a MySQL sink to replicate data to MySQL. --- -# Sink to MySQL +# Stream Data from TiDB Cloud Dedicated to MySQL -This document describes how to stream data from TiDB Cloud to MySQL using the **Sink to MySQL** changefeed. +This document describes how to stream data from TiDB Cloud Dedicated to MySQL using the **Sink to MySQL** changefeed. > **Note:** > > - To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. -> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), the changefeed feature is unavailable. +> - For [TiDB Cloud Serverless clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless), streaming data to MySQL is currently not supported. ## Restrictions diff --git a/tidb-cloud/changefeed-sink-to-tidb-cloud.md b/tidb-cloud/changefeed-sink-to-tidb-cloud.md index 1412c1cdf7c59..7fc2c8cfb8fa2 100644 --- a/tidb-cloud/changefeed-sink-to-tidb-cloud.md +++ b/tidb-cloud/changefeed-sink-to-tidb-cloud.md @@ -1,15 +1,15 @@ --- -title: Sink to TiDB Cloud +title: Stream Data from TiDB Cloud Dedicated to TiDB Cloud Serverless summary: This document explains how to stream data from a TiDB Cloud Dedicated cluster to a TiDB Cloud Serverless cluster. There are restrictions on the number of changefeeds and regions available for the feature. Prerequisites include extending tidb_gc_life_time, backing up data, and obtaining the start position of TiDB Cloud sink. To create a TiDB Cloud sink, navigate to the cluster overview page, establish the connection, customize table and event filters, fill in the start replication position, specify the changefeed specification, review the configuration, and create the sink. Finally, restore tidb_gc_life_time to its original value. --- -# Sink to TiDB Cloud +# Stream Data from TiDB Cloud Dedicated to TiDB Cloud Serverless This document describes how to stream data from a TiDB Cloud Dedicated cluster to a TiDB Cloud Serverless cluster. > **Note:** > -> To use the Changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. +> To use the changefeed feature, make sure that your TiDB Cloud Dedicated cluster version is v6.1.3 or later. ## Restrictions diff --git a/tidb-cloud/data-streaming-concepts.md b/tidb-cloud/data-streaming-concepts.md index 32c859f92470f..892c6f0b113c4 100644 --- a/tidb-cloud/data-streaming-concepts.md +++ b/tidb-cloud/data-streaming-concepts.md @@ -5,18 +5,22 @@ summary: Learn about data streaming concepts for TiDB Cloud. # Data Streaming -TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems like Kafka, MySQL, and object storage. +TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems such as Apache Kafka, MySQL, and object storage. -Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud, and cloud storage. +- For TiDB Cloud Dedicated, you can stream data to Apache Kafka, Apache Pulsar, MySQL, TiDB Cloud Serverless, and cloud storage. +- For TiDB Cloud Serverless, you can stream data to Apache Kafka. ## Changefeed TiDB Cloud changefeed is a continuous data stream that helps you replicate data changes from TiDB Cloud to other data services. -On the **Changefeed** page in the TiDB Cloud console, you can create a changefeed, view a list of existing changefeeds, and operate the existing changefeeds (such as scaling, pausing, resuming, editing, and deleting a changefeed). +- For TiDB Cloud Dedicated, you can access the changefeed feature on the **Changefeed** page in the [TiDB Cloud console](https://tidbcloud.com/). +- For TiDB Cloud Serverless, you can use the changefeed feature in [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli). + +You can create a changefeed, view a list of existing changefeeds, and operate the existing changefeeds (such as scaling, pausing, resuming, editing, and deleting a changefeed). Replication includes only incremental data changes by default. If existing data must be replicated, it must be exported and loaded into the target system manually before starting the changefeed. -In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events like INSERT or DELETE). +In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events such as `INSERT` or `DELETE`). -For more information, see [Changefeed](/tidb-cloud/changefeed-overview.md). \ No newline at end of file +For more information, see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md) and [Changefeed for TiDB Cloud Serverless](/tidb-cloud/serverless-changefeed-overview.md). \ No newline at end of file diff --git a/tidb-cloud/serverless-changefeed-overview.md b/tidb-cloud/serverless-changefeed-overview.md new file mode 100644 index 0000000000000..4cdd793e0a243 --- /dev/null +++ b/tidb-cloud/serverless-changefeed-overview.md @@ -0,0 +1,78 @@ +--- +title: Changefeed for TiDB Cloud Serverless +summary: TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. +--- + +# Changefeed for TiDB Cloud Serverless (Beta) + +TiDB Cloud changefeed helps you stream data from TiDB Cloud to other data services. This document provides an overview of the changefeed feature for TiDB Cloud Serverless. + +> **Note:** +> +> - Currently, you can manage changefeeds for TiDB Cloud Serverless only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md). +> - Currently, TiDB Cloud only allows up to 100 changefeeds per cluster. +> - Currently, TiDB Cloud only allows up to 100 table filter rules per changefeed. +> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Changefeed for TiDB Cloud Dedicated](/tidb-cloud/changefeed-overview.md). + +## Create a changefeed + +To create a changefeed, refer to the following document: + +- [Sink to Apache Kafka](/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md) + +## List the changefeeds for your cluster + +To list the changefeeds for your cluster, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed list --cluster-id +## Pause or resume a changefeed + +To pause a changefeed, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed pause --cluster-id --changefeed-id +``` + +To resume a changefeed, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed resume --cluster-id --changefeed-id +``` + +## Edit a changefeed + +> **Note:** +> +> TiDB Cloud currently only allows editing changefeeds in the paused status. + +To edit a changefeed to kafka, you need to pause the changefeed first, and then edit it with the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed edit --cluster-id --changefeed-id --name --kafka --filter +``` + +## Delete a changefeed + +To delete a changefeed, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed delete --cluster-id --changefeed-id +``` + +## Changefeed billing + +Currently, the changefeed feature for TiDB Cloud Serverless is in beta and available for free. + +## Changefeed states + +The state of a changefeed represents the running state of the changefeed. During the running process, the changefeed might fail with errors, or be manually paused or resumed. These behaviors can lead to changes of the changefeed state. + +The states are described as follows: + +- `CREATING`: the changefeed is being created. +- `CREATE_FAILED`: the changefeed creation fails. You need to delete the changefeed and create a new one. +- `RUNNING`: the changefeed runs normally and the checkpoint-ts proceeds normally. +- `PAUSED`: the changefeed is paused. +- `WARNING`: the changefeed returns a warning. The changefeed cannot continue due to some recoverable errors. The changefeed in this state keeps trying to resume until the state transfers to `RUNNING`. The changefeed in this state blocks [GC operations](https://docs.pingcap.com/tidb/stable/garbage-collection-overview). +- `RUNNING_FAILED`: the changefeed fails. Due to some errors, the changefeed cannot resume and cannot be recovered automatically. If the issues are resolved before the garbage collection (GC) of the incremental data, you can manually resume the failed changefeed. The default Time-To-Live (TTL) duration for incremental data is 24 hours, which means that the GC mechanism does not delete any data within 24 hours after the changefeed is interrupted. diff --git a/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md new file mode 100644 index 0000000000000..7f1ef6495c06b --- /dev/null +++ b/tidb-cloud/serverless-changefeed-sink-to-apache-kafka.md @@ -0,0 +1,254 @@ +--- +title: Stream Data from TiDB Cloud Serverless to Apache Kafka +summary: This document explains how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. It includes restrictions, prerequisites, and steps to configure the changefeed for Apache Kafka. The process involves setting up network connections, adding permissions for Kafka ACL authorization, and configuring the changefeed specification. +--- + +# Stream Data from TiDB Cloud Serverless to Apache Kafka + +This document describes how to create a changefeed to stream data from TiDB Cloud Serverless to Apache Kafka. + +> **Note:** +> +> - For [TiDB Cloud Dedicated clusters](/tidb-cloud/select-cluster-tier.md#tidb-cloud-dedicated), see [Stream Data from TiDB Cloud Dedicated to Apache Kafka](/tidb-cloud/changefeed-sink-to-apache-kafka.md). + +## Restrictions + +- For each TiDB Cloud cluster, you can create up to 100 changefeeds. +- Currently, TiDB Cloud does not support uploading self-signed TLS certificates to connect to Kafka brokers. +- Because TiDB Cloud uses TiCDC to establish changefeeds, it has the same [restrictions as TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview#unsupported-scenarios). +- If the table to be replicated does not have a primary key or a non-null unique index, the absence of a unique constraint during replication could result in duplicated data being inserted downstream in some retry scenarios. + +## Prerequisites + +Before creating a changefeed to stream data to Apache Kafka, you need to complete the following prerequisites: + +- Set up your network connection. +- Add permissions for Kafka ACL authorization. + +### Network + +Ensure that your TiDB cluster can connect to the Apache Kafka service. Currently, TiDB Serverless clusters can only connect to Apache Kafka through public IP addresses. + +> **Note:** +> +> If you want to expose your Apache Kafka through a more secure method, such as Private Link or VPC peering, click **?** in the lower-right corner of the [TiDB Cloud console](https://tidbcloud.com) and click **Request Support**. Then, fill in your request in the **Description** field and click **Submit**. + +To enable public IP access to your Apache Kafka service, assign public IP addresses to all Kafka brokers. + +### Kafka ACL authorization + +To allow TiDB Cloud changefeeds to stream data to Apache Kafka and create Kafka topics automatically, ensure that the following permissions are added in Kafka: + +- The `Create` and `Write` permissions are added for the topic resource type in Kafka. +- The `DescribeConfigs` permission is added for the cluster resource type in Kafka. + +For example, if your Kafka cluster is in Confluent Cloud, you can see [Resources](https://docs.confluent.io/platform/current/kafka/authorization.html#resources) and [Adding ACLs](https://docs.confluent.io/platform/current/kafka/authorization.html#adding-acls) in Confluent documentation for more information. + +## Create a changefeed with TiDB Cloud CLI + +To create a changefeed that streams data from TiDB Cloud to Apache Kafka, use the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed create --cluster-id --name --type KAFKA --kafka --filter --start-tso +``` + +- ``: the ID of the TiDB Cloud cluster that you want to create the changefeed for. +- `` (optional): the name of the changefeed. If not specified, TiDB Cloud automatically generates a changefeed name. +- `type`: the changefeed type. To stream data to Apache Kafka, set it to `KAFKA`. +- `kafka`: a JSON string that contains the configuration for streaming data to Apache Kafka. For more information, see [Kafka configurations](#kafka-configurations). +- `filter`: a JSON string that specifies which tables and events to replicate. For more information, see [Filter configurations](#filter-configurations). +- `start-tso`: the TSO from which the changefeed starts to replicate data. If not specified, the current TSO is used by default. For more information, see [TSO in TiDB](https://docs.pingcap.com/tidb/stable/tso/). + +### Filter configurations + +You can specify `--filter ` to filter tables and events that you want to replicate. + +To get a template of `filter` configurations, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template +``` + +To view explanations of the template, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template --explain +``` + +The following is an example `filter` configuration: + +```json +{ + "filterRule": ["test.t1", "test.t2"], + "mode": "IGNORE_NOT_SUPPORT_TABLE", + "eventFilterRule": [ + { + "matcher": ["test.t1", "test.t2"], + "ignore_event": ["all dml", "all ddl"] + } + ] +} +``` + +- `filterRule`: filters the tables to replicate. For the detailed rule syntax, see [Table Filter](https://docs.pingcap.com/tidb/stable/table-filter/). +- `eventFilterRule`: filters specific events for the matched tables. You can use the `matcher` field to specify the target tables, and use the `ignore_event` field to list the event types to skip. For supported event types, see [Event filter rules](https://docs.pingcap.com/tidb/stable/ticdc-filter/#event-filter-rules). +- `mode`: controls the behavior for unsupported tables. You can set it to one of the following: + + - `IGNORE_NOT_SUPPORT_TABLE`: skip tables that do not support replication (for example, tables without primary or unique keys). + - `FORCE_SYNC`: force replication of all tables regardless of support status. + +### Kafka configurations + +You can specify `--kafka ` to configure how the changefeed streams data to Apache Kafka. + +To get a template of `kafka` configurations, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template +``` + +To view explanations of the template, run the following TiDB Cloud CLI command: + +```bash +ticloud serverless changefeed template --explain +``` + +The following is an example `kafka` configuration: + +```json +{ + "network_info": { + "network_type": "PUBLIC" + }, + "broker": { + "kafka_version": "VERSION_2XX", + "broker_endpoints": "broker1:9092,broker2:9092", + "tls_enable": false, + "compression": "NONE" + }, + "authentication": { + "auth_type": "DISABLE", + "user_name": "", + "password": "" + }, + "data_format": { + "protocol": "CANAL_JSON", + "enable_tidb_extension": false, + "avro_config": { + "decimal_handling_mode": "PRECISE", + "bigint_unsigned_handling_mode": "LONG", + "confluent_schema_registry": { + "endpoint": "", + "enable_http_auth": false, + "user_name": "", + "password": "" + }, + "aws_glue_schema_registry": { + "region": "", + "name": "", + "access_key_id": "", + "secret_access_key": "" + } + } + }, + "topic_partition_config": { + "dispatch_type": "ONE_TOPIC", + "default_topic": "test-topic", + "topic_prefix": "_prefix", + "separator": "_", + "topic_suffix": "_suffix", + "replication_factor": 1, + "partition_num": 1, + "partition_dispatchers": [{ + "partition_type": "TABLE", + "matcher": ["*.*"], + "index_name": "index1", + "columns": ["col1", "col2"] + }] + }, + "column_selectors": [{ + "matcher": ["*.*"], + "columns": ["col1", "col2"] + }] +} +``` + +The main configuration fields are as follows: + +- `network_info`: currently, only the `PUBLIC` network type is supported. This means that the TiDB Cloud Serverless clusters can only connect to the Apache Kafka service through public IP addresses. + +- `broker`: specifies the Kafka broker connection information. + + - `kafka_version`: the Kafka version. Supported values: `VERSION_2XX` or `VERSION_3XX`. + - `broker_endpoints`: a comma-separated list of Kafka brokers. + - `tls_enable`: whether to enable TLS for the connection. + - `compression`: the message compression type. Supported values: `NONE`, `GZIP`, `LZ4`, `SNAPPY`, or `ZSTD`. + +- `authentication`: specifies the Kafka authentication method. Supported values of `auth_type`: `DISABLE`, `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`. If you set `auth_type` to `SASL_PLAIN`, `SASL_SCRAM_SHA_256`, or `SASL_SCRAM_SHA_512`, `user_name` and `password` are required. + +- `data_format.protocol`: specifies the output format. + + - `AVRO`: Avro is a compact, fast, and binary data format with rich data structures, which is widely used in various flow systems. For more information, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). + - `CANAL_JSON`: Canal-JSON is a plain JSON text format, which is easy to parse. For more information, see [Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json). + - `OPEN_PROTOCOL`: Open Protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between different databases. For more information, see [Open Protocol data format](https://docs.pingcap.com/tidb/stable/ticdc-open-protocol). + +- `data_format.enable_tidb_extension`: controls whether to include TiDB-specific extension fields in Kafka messages when using the `AVRO` or `CANAL_JSON` format. + + For more information about TiDB extension fields, see [TiDB extension fields in Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol#tidb-extension-fields) and [TiDB extension fields in Canal-JSON data format](https://docs.pingcap.com/tidb/stable/ticdc-canal-json#tidb-extension-field). + +- `data_format.avro_config`: if you select **Avro** as your data format, you need to set the Avro-specific configurations. + + - `decimal_handling_mode` and `bigint_unsigned_handling_mode`: controls how TiDB Cloud handles the decimal and unsigned bigint data types in Kafka messages. + - `confluent_schema_registry`: the configuration for confluent schema registry. If authentication is required, set `enable_http_auth` to `true` and configure the `user_name` and `password`. For more information, see [Confluent Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). + - `aws_glue_schema_registry`: the configuration for AWS Glue schema registry. If you want to use AWS Glue schema registry, set `region`, `name`, `access_key_id`, and `secret_access_key` accordingly. For more information, see [AWS Glue Schema Registry](https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html). + + For more information about the Avro configurations, see [Avro data format](https://docs.pingcap.com/tidb/stable/ticdc-avro-protocol). + +- `topic_partition_config.dispatch_type`: controls how the changefeed creates Kafka topics. Supported values: `ONE_TOPIC`, `BY_TABLE`, or `BY_DATABASE`. If you use the `AVRO` data format, only the `BY_TABLE` dispatch type is supported. + + - `BY_TABLE`: distributes changelogs by table to Kafka topics. + + If you want the changefeed to create a dedicated Kafka topic for each table, set `dispatch_type` to `BY_TABLE`. Then, all Kafka messages of a table are sent to a dedicated Kafka topic. You can customize topic names for tables by setting a `topic_prefix`, a `separator` between a database name and table name, and a `topic_suffix`. For example, if you set the separator as `_`, the topic names are in the format of `_`. + + For changelogs of non-row events, such as Create Schema Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. + + - `BY_DATABASE`: distributes changelogs by database to Kafka topics. + + If you want the changefeed to create a dedicated Kafka topic for each database, set `dispatch_type` to `BY_DATABASE`. Then, all Kafka messages of a database are sent to a dedicated Kafka topic. You can customize topic names of databases by setting a `topic_prefix` and a `topic_suffix`. + + For changelogs of non-row events, such as Resolved Ts Event, you can specify a topic name in the `default_topic` field. The changefeed will create a topic accordingly to collect such changelogs. + + - `ONE_TOPIC`: sends all changelogs to one specified Kafka topic. + + If you want the changefeed to create one Kafka topic for all changelogs, set `dispatch_type` to `ONE_TOPIC`. Then, all Kafka messages in the changefeed will be sent to one Kafka topic. You can define the topic name in the `default_topic` field. + +- `topic_partition_config.replication_factor`: controls how many Kafka brokers each Kafka message is replicated to. The valid value ranges from [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) to the total number of Kafka brokers. + +- `topic_partition_config.partition_num`: controls how many partitions exist in a topic. The valid value range is `[1, 10 * the total number of Kafka brokers]`. + +- `topic_partition_config.partition_dispatchers`: controls which partition a Kafka message will be sent to. Support values: `INDEX_VALUE`, `TABLE`, `TS` and `COLUMN`. + + - `INDEX_VALUE`: distributes changelogs by primary key or index value to Kafka partitions. + + If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `INDEX_VALUE` and set the `index_name`. The primary key or index value of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures row-level orderliness. + + - `TABLE`: distributes changelogs by table to Kafka partitions. + + If you want the changefeed to send Kafka messages of a table to one Kafka partition, set `partition_type` to `TABLE`. The table name of a row changelog will determine which partition the changelog is sent to. This distribution method ensures table orderliness but might cause unbalanced partitions. + + - `TS`: distributes changelogs by timestamp to Kafka partitions. + + If you want the changefeed to send Kafka messages to different Kafka partitions randomly, set `partition_type` to `TS`. The commitTs of a row changelog will determine which partition the changelog is sent to. This distribution method provides a better partition balance and ensures orderliness in each partition. However, multiple changes of a data item might be sent to different partitions and the consumer progress of different consumers might be different, which might cause data inconsistency. Therefore, the consumer needs to sort the data from multiple partitions by commitTs before consuming. + + - `COLUMN`: distributes changelogs by column value to Kafka partitions. + + If you want the changefeed to send Kafka messages of a table to different partitions, set `partition_type` to `COLUMN` and set the `columns`. The specified column values of a row changelog will determine which partition the changelog is sent to. This distribution method ensures orderliness in each partition and guarantees that the changelogs with the same column values are sent to the same partition. + + For more information about the matching rules, see [Partition dispatchers](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#partition-dispatchers). + +- `column_selectors`: selects columns from events. TiDB Cloud only sends the data changes related to those columns to the downstream. + + - `matcher`: specifies which tables the column selector applies to. For tables that do not match any rule, all columns are sent. + - `columns`: specifies which columns of the matched tables will be sent to the downstream. + + For more information about the matching rules, see [Column selectors](https://docs.pingcap.com/tidb/stable/ticdc-sink-to-kafka/#column-selectors). diff --git a/tidb-cloud/serverless-limitations.md b/tidb-cloud/serverless-limitations.md index 40bf6a078bfbc..c4aec974342ad 100644 --- a/tidb-cloud/serverless-limitations.md +++ b/tidb-cloud/serverless-limitations.md @@ -45,7 +45,7 @@ We are constantly filling in the feature gaps between TiDB Cloud Serverless and ### Stream data -- [Changefeed](/tidb-cloud/changefeed-overview.md) is not supported for TiDB Cloud Serverless currently. +- You can manage changefeeds for TiDB Cloud Serverless only with [TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md) currently. - [Data Migration](/tidb-cloud/migrate-from-mysql-using-data-migration.md) is not supported for TiDB Cloud Serverless currently. ### Time to live (TTL) diff --git a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md index ee5d21a3c55d2..6bf9afec474ea 100644 --- a/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md +++ b/tidb-cloud/tidb-cloud-billing-ticdc-rcu.md @@ -6,6 +6,10 @@ aliases: ['/tidbcloud/tidb-cloud-billing-tcu'] # Changefeed Billing +> **Note:** +> +> This document is only applicable to TiDB Cloud Dedicated. For TiDB Cloud Serverless, the [changefeed](/tidb-cloud/serverless-changefeed-overview.md) feature is currently in beta and available for free. + ## RCU cost TiDB Cloud measures the capacity of [changefeeds](/tidb-cloud/changefeed-overview.md) in TiCDC Replication Capacity Units (RCUs). When you [create a changefeed](/tidb-cloud/changefeed-overview.md#create-a-changefeed) for a cluster, you can select an appropriate specification. The higher the RCU, the better the replication performance. You will be charged for these TiCDC changefeed RCUs.