You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scylla-cdc-kafka-connect/README.md
+15-2Lines changed: 15 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,7 @@ Scylla CDC Source Connector exposes many configuration properties. These are the
62
62
|`scylla.user`| No | The username to connect to Scylla with. If not set, no authorization is done. |
63
63
|`scylla.password`| No | The password to connect to Scylla with. If not set, no authorization is done. |
64
64
65
-
See additional configuration properties in the ["Advanced configuration"](#advanced-configuration) section.
65
+
See additional configuration properties in the ["Advanced administration"](#advanced-administration) section.
66
66
67
67
Example configuration (as `.properties` file):
68
68
```
@@ -498,10 +498,23 @@ The connector will generate the following data change event's value (with JSON s
498
498
}
499
499
```
500
500
501
-
## Advanced configuration
501
+
## Advanced administration
502
+
503
+
### Advanced configuration parameters
504
+
502
505
In addition to the configuration parameters described in the ["Configuration"](#configuration) section, Scylla CDC Source Connector exposes the following (non-required) configuration parameters:
503
506
504
507
| Property | Description |
505
508
| --- | --- |
506
509
|`scylla.query.time.window.size`| The size of windows queried by the connector. Changes are queried using `SELECT` statements with time restriction with width defined by this parameter. Value expressed in milliseconds. |
507
510
|`scylla.confidence.window.size`| The size of the confidence window. It is necessary for the connector to avoid reading too fresh data from the CDC log due to the eventual consistency of Scylla. The problem could appear when a newer write reaches a replica before some older write. For a short period of time, when reading, it is possible for the replica to return only the newer write. The connector mitigates this problem by not reading a window of most recent changes (controlled by this parameter). Value expressed in milliseconds.|
511
+
512
+
### Configuration for large Scylla clusters
513
+
#### Offset (progress) storage
514
+
Scylla CDC Source Connector reads the CDC log by quering on [Vnode](https://docs.scylladb.com/architecture/ringarchitecture/) granularity level. It uses Kafka Connect to store current progress (offset) for each Vnode. By default, there are 256 Vnodes per each Scylla node. Kafka Connect stores those offsets in its `connect-offsets` internal topic, but it could grow large in case of big Scylla clusters. You can minimize this topic size, by adjusting the following configuration options on this topic:
515
+
516
+
1.`segment.bytes` or `segment.ms` - lowering them will make the compaction process trigger more often.
517
+
2.`cleanup.policy=delete` and setting `retention.ms` to at least the TTL value of your Scylla CDC table (in milliseconds; Scylla default is 24 hours). Using this configuration, older offsets will be deleted. By setting `retention.ms` to at least the TTL value of your Scylla CDC table, we make sure to delete only those offsets that have already expired in the source Scylla CDC table.
518
+
519
+
#### `tasks.max` property
520
+
By adjusting `tasks.max` property, you can configure how many Kafka Connect worker tasks will be started. By scaling up the number of nodes in your Kafka Connect cluster (and `tasks.max` number), you can achieve higher throughput. In general, the `tasks.max` property should be greater or equal the number of nodes in Kafka Connect cluster, to allow the connector to start on each node. `tasks.max` property should also be greater or equal the number of nodes in your Scylla cluster, especially if those nodes have high shard count (32 or greater) as they have a large number of [CDC Streams](https://docs.scylladb.com/using-scylla/cdc/cdc-streams/).
0 commit comments