diff --git a/docs/fcbis.md b/docs/fcbis.md new file mode 100644 index 00000000..76f88e9e --- /dev/null +++ b/docs/fcbis.md @@ -0,0 +1,59 @@ +# File copy based initial sync + +!!! admonition "Version added: [7.0.22-12](release_notes/7.0.22-12.md)" + +When a new member joins the replica set, it receives the data from the existing replica set node via the initial sync. + +In Percona Server for MongoDB, you can choose a file copy-based initial sync for a new node. You must have WiredTiger defined as the storage. + +The file copy-based initial sync method is a physical copying of the data files from the source to the target. It is much faster than the default [logical initial sync :octicons-link-external-16:](https://www.mongodb.com/docs/manual/core/replica-set-sync/#logical-initial-sync-process) for big datasets (500GB+), which is especially beneficial in heavy write environments. Using this initial sync method speeds up cluster scaling and increases restore performance. + +File copy-based initial sync implementation is compatible with that of MongoDB Enterprise Advanced and has the same [configuration parameters](#configuration-parameters). + +To select the initial sync method, specify the `initialSyncMethod` parameter in the configuration file for the target node: + +```yaml +setParameter: + initialSyncMethod: fileCopyBased +``` + +You can only set this server parameter at startup. + +## Workflow + +When you start a new node for the replica set, the workflow is the following: + +1. The new node (also referred to as the target node) selects the source node for the sync. This sync source is typically the node that responded first and has the passing configuration (e.g. it has WiredTiger set as the storage and the same arrangement of files and indexes as the target node.) +2. The target node opens a backup cursor on the sync source. The backup cursor is used to retrieve the list of files to copy and the timestamp of the oplog end in the metadata file. +3. The file copy starts. During this process the target node lags behind the sync source as it remains operational and its data changes. The sync source node is periodically checked to ensure the time of the lag falls within the defined time. +4. If the lag between the sync source and the target exceeds the defined threshold, the target node executes the `$backupCursorExtend` aggregation to retrieve the changes. Depending on the file copy duration, the target node can execute `$backupCursorExtend` several times, limited by the maximum number of cycles (3 by default) +5. When the files are copied and the lag between the sync source and the target is acceptable, the target node closes the backup cursor. +6. The target node internally moves the downloaded files to the local `dbPath`, applies oplog on top, reconstructs timestamps to ensure data consistency. + +## Configuration parameters + +These configuration parameters can be used to control the file copy-based initial sync flow. You can set them only at startup. + +| Name | Type | Default | Description | +| --- | --- | --- | --- | +| `initialSyncMethod` | string | logical | Specifies which method of initial sync to use. Valid options are: fileCopyBased, logical. | +| `numInitialSyncAttempts` | integer | 10 | Number of attempts of attempts to make at replica set initial synchronization | +| `numInitialSyncConnectAttempts` | integer | 10 | The number of attempts to select and connect to a valid sync source | +| `fileBasedInitialSyncMaxLagSec` | integer | 300 | Specifies the max lag in seconds between the syncing node and the sync source to mark the file copy based initial sync as done successfully | +| `fileBasedInitialSyncMaxCyclesWithoutProgress` | integer | 3 | Specifies the max number of cycles to clone updates while the lag between the syncing node and the sync source is higher than `fileBasedInitialSyncMaxLagSec` | + + +## Limitations + +Using file copy-based initial sync has the following limitations: + +* Don't run backups on either sync source or syncing nodes +* Don't write to the `local` database on the syncing node +* You cannot use the same sync source for multiple target nodes simultaneously because only one backup cursor can exist at any moment. +* If you're using encrypted storage, Percona Server for MongoDB applies the encryption key from the sync source node to secure the data on the syncing node. +* You must have WiredTiger defined as the storage engine to run file copy-based initial sync. [Percona memory engine](inmemory.md) engine is not supported. + + + + + diff --git a/docs/psmdb-pro.md b/docs/psmdb-pro.md index fa713dd7..7ec52a59 100644 --- a/docs/psmdb-pro.md +++ b/docs/psmdb-pro.md @@ -15,6 +15,7 @@ Find the list of solutions available in Percona Server for MongoDB Pro builds: | Name | Version added | Description | | ----------------------------------- | ------------- | ------------- | [FIPS support ](fips.md)| [7.0.4-2](release_notes/7.0.4-2.md) | FIPS mode provides a way to use FIPS-compliant encryption and run the Percona Server for MongoDB with the FIPS-140 certified library for OpenSSL. This helps customers meet minimum security requirements for cryptographic modules and testing in both hardware and software. | +| [File copy-based initial sync](initial-sync.md) | [7.0.22-12](release_notes/7.0.22-12.md) | File copy based initial sync is an additional method of doing an initial sync for a new node in a replica set. For big data sets, this method is faster than the logical sync as it copies physical files rather than cloning data. | | Binaries with debug symbols | [7.0.18-11](release_notes/7.0.18-11.md) | By including debug symbols in the binary, Percona Server for MongoDB enables deeper integration with monitoring agent-based solutions. These agents can instrument the binary at runtime, providing more detailed telemetry data, such as performance metrics, error tracking, and function-level diagnostics. This enhanced observability allows for better monitoring of system health, faster identification of issues, and more granular insights into how the application performs in production environments.
Including this information empowers teams to respond proactively to performance bottlenecks, optimize resource allocation, and improve the overall stability of the application with real-time insights. ## Benefits diff --git a/mkdocs-base.yml b/mkdocs-base.yml index 84bd61f2..36e44eb3 100644 --- a/mkdocs-base.yml +++ b/mkdocs-base.yml @@ -208,6 +208,7 @@ nav: - "Use local keyfile": keyfile.md - "Migrate from keyfile to Vault": encryption-mode-switch.md - fips.md + - fcbis.md - audit-logging.md - rate-limit.md - log-redaction.md