You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/pages/operating/physical_replication.md
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,8 +13,8 @@ pull the physical logs from the source machine and apply it locally.
13
13
It is important to note that physreps are applying the logs out-of-band and, thus,
14
14
should not be considered part of the source cluster.
15
15
16
-
In order to enable replication, `replicate_from` must be added to the `copycomdb2 -d`
17
-
physrep's lrl file. This line can either take a valid Comdb2 cluster tier, a
16
+
In order to enable replication, `replicate_from` must be added to the physrep's
17
+
lrl file. This line can either take a valid Comdb2 cluster tier, a
18
18
hostname or a comma-separated list of hostnames (without any space).
19
19
20
20
```
@@ -30,9 +30,9 @@ physrep added to the system would add to the overall cost incurred by the source
30
30
host/cluster.
31
31
32
32
In order to avoid having source support all physreps directly, one could setup tiered
33
-
replication (not to be confused with machine classes), in which some physical replicants
34
-
could become the source for the other replicants, thus keeping some load off of
35
-
the top-level source host/cluster.
33
+
replication, in which some physical replicants could become the source for other
34
+
physical replicants, thus keeping some of the load off of the top-level source
35
+
host/cluster. A physical replicant's 'tier' is it's distance from the source cluster.
36
36
37
37
```
38
38
@@ -57,19 +57,20 @@ These tables automatically get updated to reflect the changes as replicants
57
57
join or leave the system and thus are not designed to be manually modified
58
58
under normal circumstances. In order to keep the load evenly spread, these table
59
59
are consulted to ensure a certain fanout `physrep_fanout` is maintained across all
60
-
the nodes. The LSN (file:offset) information in `comdb2_physreps` table is used by all the
61
-
nodes to pause log-deletion.
60
+
the nodes. The LSN file and offset in the `comdb2_physreps` table is
61
+
used by all the nodes to pause log-deletion.
62
62
63
63
## Algorithm
64
64
65
65
On start, a physical replicant executes `sys.physrep.register_replicant()` against
66
-
the `physrep_metadb`, which in turn, responds with a list of potential nodes
67
-
(by doing a graph traversal on nodes (`comdb2_physreps`) and edges (`comdb2_physrep_connections`)
68
-
, starting at the source as root node/tier 0, ref: `lua/lib/physrep_register_replicant.lua`) that
69
-
can be used as the source of physical logs. The replicant then picks up a node from
70
-
the list and tries to connect to it. On successful connection, the replicant executes
71
-
`sys.physrep.update_registry()` against the `physrep_metadb`, confirming that the
72
-
replicant is now `connected` to a node.
66
+
the `physrep_metadb`, which in turn, responds with a list of potential nodes which
67
+
can be used as the source of physical logs. The metadb creates this list by doing
68
+
a graph traversal of nodes (listed in comdb2_physreps) connected by edges (listed
69
+
in comdb2_physrep_connections) starting at the source db as the root-node/tier 0.
70
+
The physical replicant then chooses a node from this list and tries to connect to it.
71
+
On successful connection, the replicant executes `sys.physrep.update_registry()`
72
+
against the `physrep_metadb`, confirming that the replicant is now `connected` to a
73
+
node.
73
74
74
75
Upon successful registration, the physical replicants execute
75
76
```SELECT .. FROM .. comdb2_transaction_logs```
@@ -117,18 +118,17 @@ separate database running in the a lower (development) tier.
117
118
118
119
### Alternate Metadbs
119
120
120
-
Physrep setup supports configuring multiple alternate metadbs in addition to the primary
121
-
`physrep_metadb`. The idea was to setup an alternate metadb in a separate tier/class (say beta) so that
122
-
the production tier/class doesn't have to directly interact with its lower level tiers (this is an update to the
123
-
cross-tier replication model discussed above).
124
-
125
-
Key gotchas:
126
-
* A physical replicant registers (`register_replicant`) only against the primary metadb (never an alternate).
127
-
* The Metadb does not provide transaction logs, but returns candidate source nodes to replicate from (based on fanout and tree traversal, refer to [algorithm](#algorithm)).
128
-
* Alternate metadbs are primarily used by the source (physrep-parent) side to try and establish a reverse connection based on the `comdb2_physrep_sources` table.
129
-
* The source cluster writes replication metadata (entries into `comdb2_physreps`, `comdb2_physrep_connections`) to primary physrep_metadb.
130
-
* If a source is itself a physrep (tiered chain), it still uses only its primary metadb for its own registration, while reverse connecting outwards based on configured alternate metadbs or physrep_metadb.
121
+
Physrep setup supports configuring multiple alternate metadbs in addition to its primary
122
+
`physrep_metadb`. The intention is to allow prod databases to connect only to prod
123
+
metadbs. A beta physical replicant which replicates from a prod instance can serve as an
124
+
intermediary source for lower tiered physreps by listing an alternate-metadb which is
125
+
accessible by the lower tiered physreps. Lower tiered physreps would then replicate from
126
+
the beta instance using this alternate metadb as their primary metadb.
131
127
128
+
Key takeaways:
129
+
* A physical replicant registers (`register_replicant`) only against it's primary metadb.
130
+
* A source database will periodically update its physrep-metadb and all alternate metadbs with its current LSN and first-logfile via the 'physrep_keepalive' function.
131
+
* A source database uses the 'comdb2_physrep_sources' table in its primary metadb and all alternate metadbs to determine whether it should initiate a reverse-connection to a child database.
* replicate_from dbname @host/tier: This line sets the source host/cluster. It is required for all physical replicants.
201
201
* replicate_wait <sec>: Tells the physical replicant to wait for this many seconds before applying the log records.
202
202
* physrep_metadb: If set, all the nodes will connect to this database (as against source host/cluster mentioned via `replicate_from`) for replication metadata tables
203
-
* alternate_metadb <dbname> <host>: If set, parent node will try to establish reverse connection based on the `comdb2_physrep_sources` table.
203
+
* alternate_metadb <dbname> <host>: If set, parent node will use this in addition to its normal metadb in updating stats, and determining whether it should establish a reverse-connection.
204
204
* physrep_fanout_override <dbname> <fanout>: This is set on the metadb, and allows per-database overrides of the 'physrep_fanout' tunable. The 'physrep_fanout_override' message-trap allows this to be set dynamically. The 'physrep_fanout_dump' message-trap prints the current overrides.
205
205
* physrep_ignore <tables>: All the log records that belong to any of these tables are ignored by physical replicants
206
206
* nonames: This configuration forces system database file names to not carry the database name. This setting is required for physical-log based replication to work properly.
0 commit comments