Skip to content

Commit a2c3c18

Browse files
authored
Update "Advanced administration" section
Update "Advanced administration" section in Scylla CDC Source Connector README.
1 parent 06c5bbc commit a2c3c18

File tree

1 file changed

+15
-2
lines changed

1 file changed

+15
-2
lines changed

scylla-cdc-kafka-connect/README.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ Scylla CDC Source Connector exposes many configuration properties. These are the
6262
| `scylla.user` | No | The username to connect to Scylla with. If not set, no authorization is done. |
6363
| `scylla.password` | No | The password to connect to Scylla with. If not set, no authorization is done. |
6464

65-
See additional configuration properties in the ["Advanced configuration"](#advanced-configuration) section.
65+
See additional configuration properties in the ["Advanced administration"](#advanced-administration) section.
6666

6767
Example configuration (as `.properties` file):
6868
```
@@ -498,10 +498,23 @@ The connector will generate the following data change event's value (with JSON s
498498
}
499499
```
500500

501-
## Advanced configuration
501+
## Advanced administration
502+
503+
### Advanced configuration parameters
504+
502505
In addition to the configuration parameters described in the ["Configuration"](#configuration) section, Scylla CDC Source Connector exposes the following (non-required) configuration parameters:
503506

504507
| Property | Description |
505508
| --- | --- |
506509
| `scylla.query.time.window.size` | The size of windows queried by the connector. Changes are queried using `SELECT` statements with time restriction with width defined by this parameter. Value expressed in milliseconds. |
507510
| `scylla.confidence.window.size` | The size of the confidence window. It is necessary for the connector to avoid reading too fresh data from the CDC log due to the eventual consistency of Scylla. The problem could appear when a newer write reaches a replica before some older write. For a short period of time, when reading, it is possible for the replica to return only the newer write. The connector mitigates this problem by not reading a window of most recent changes (controlled by this parameter). Value expressed in milliseconds.|
511+
512+
### Configuration for large Scylla clusters
513+
#### Offset (progress) storage
514+
Scylla CDC Source Connector reads the CDC log by quering on [Vnode](https://docs.scylladb.com/architecture/ringarchitecture/) granularity level. It uses Kafka Connect to store current progress (offset) for each Vnode. By default, there are 256 Vnodes per each Scylla node. Kafka Connect stores those offsets in its `connect-offsets` internal topic, but it could grow large in case of big Scylla clusters. You can minimize this topic size, by adjusting the following configuration options on this topic:
515+
516+
1. `segment.bytes` or `segment.ms` - lowering them will make the compaction process trigger more often.
517+
2. `cleanup.policy=delete` and setting `retention.ms` to at least the TTL value of your Scylla CDC table (in milliseconds; Scylla default is 24 hours). Using this configuration, older offsets will be deleted. By setting `retention.ms` to at least the TTL value of your Scylla CDC table, we make sure to delete only those offsets that have already expired in the source Scylla CDC table.
518+
519+
#### `tasks.max` property
520+
By adjusting `tasks.max` property, you can configure how many Kafka Connect worker tasks will be started. By scaling up the number of nodes in your Kafka Connect cluster (and `tasks.max` number), you can achieve higher throughput. In general, the `tasks.max` property should be greater or equal the number of nodes in Kafka Connect cluster, to allow the connector to start on each node. `tasks.max` property should also be greater or equal the number of nodes in your Scylla cluster, especially if those nodes have high shard count (32 or greater) as they have a large number of [CDC Streams](https://docs.scylladb.com/using-scylla/cdc/cdc-streams/).

0 commit comments

Comments
 (0)