Skip to content

Commit 633da2d

Browse files
committed
Update disaster recovery documentation for Harbor
- Clarified the access address for the Primary Object Storage in the disaster recovery steps. - Renamed the "Primary-Standby Switchover Procedure" section to "Failover" for better clarity. - Expanded the "Disaster Recovery" section to include recovery steps for the original Primary Harbor. - Added details on automatic start/stop mechanisms for the disaster recovery instance, including configuration and script examples for managing Harbor and PostgreSQL instances.
1 parent d9c3db7 commit 633da2d

File tree

1 file changed

+161
-3
lines changed

1 file changed

+161
-3
lines changed

docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md

Lines changed: 161 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ You need to create a CephObjectStoreUser in advance to obtain the access credent
108108
You only need to create the CephObjectStoreUser on the Primary Object Storage. The user information will be automatically synchronized to the Secondary Object Storage through the disaster recovery replication mechanism.
109109
:::
110110

111-
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#configure-external-access-for-primary-zone) of `Object Storage Disaster Recovery`.
111+
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#address) of `Object Storage Disaster Recovery`.
112112

113113
3. Create a Harbor registry bucket on Primary Object Storage using mc, in this example, the bucket name is `harbor-registry`.
114114

@@ -279,7 +279,7 @@ spec:
279279
replicas: 0
280280
```
281281

282-
### Primary-Standby Switchover Procedure in Disaster Scenarios
282+
### Failover
283283

284284
1. First confirm that all Primary Harbor components are not in working state, otherwise stop all Primary Harbor components first.
285285
2. Promote Secondary PostgreSQL to Primary PostgreSQL. Refer to `PostgreSQL Hot Standby Cluster Configuration Guide`, the switchover procedure.
@@ -311,7 +311,27 @@ spec:
311311
5. Test image push and pull to verify that Harbor is working properly.
312312
6. Switch external access addresses to Secondary Harbor.
313313

314-
### Disaster Recovery Data Check
314+
### Disaster Recovery
315+
316+
When the primary cluster recovers from a disaster, you can restore the original Primary Harbor to operate as a Secondary Harbor. Follow these steps to perform the recovery:
317+
318+
1. Set the replica count of all Harbor components to 0.
319+
2. Configure the original Primary PostgreSQL to operate as Secondary PostgreSQL according to the `PostgreSQL Hot Standby Cluster Configuration Guide`.
320+
3. Convert the original Primary Object Storage to Secondary Object Storage.
321+
322+
```bash
323+
# From within the recovered zone, pull the latest realm configuration from the current master zone:
324+
radosgw-admin realm pull --url={url-to-master-zone-gateway} \
325+
--access-key={access-key} --secret={secret}
326+
# Make the recovered zone the master and default zone:
327+
radosgw-admin zone modify --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name> --rgw-zone=<primary-zone-name> --master
328+
```
329+
330+
After completing these steps, the original Primary Harbor will operate as a Secondary Harbor.
331+
332+
If you need to restore the original Primary Harbor to continue operating as the Primary Harbor, follow the Failover procedure to promote the current Secondary Harbor to Primary Harbor, and then configure the new Primary Harbor to operate as Secondary Harbor.
333+
334+
### Data sync check
315335

316336
Check the synchronization status of Object Storage and PostgreSQL to ensure that the disaster recovery is successful.
317337

@@ -353,3 +373,141 @@ The RTO represents the maximum acceptable downtime during disaster recovery. Thi
353373
The operational steps are similar to building a Harbor disaster recovery solution with `Alauda Build of Rook-Ceph` and `Alauda support for PostgreSQL`. Simply replace Object Storage and PostgreSQL with other object storage and PostgreSQL solutions.
354374

355375
Ensure that the Object Storage and PostgreSQL solutions support disaster recovery capabilities.
376+
377+
## Automatic Start/Stop of Disaster Recovery Instance
378+
379+
This mechanism enables automatic activation of the Secondary Harbor instance when a disaster occurs. It supports custom check mechanisms through user-defined scripts and provides control over Harbor dependency configurations.
380+
381+
```mermaid
382+
flowchart TD
383+
Start[Monitoring Program] --> CheckScript[Check if Instance Should Start]
384+
CheckScript -->|"Yes (Script exit 0)"| StartScript[Execute StartScript]
385+
CheckScript -->|"No (Script exit Non-zero)"| StopScript[Execute StopScript]
386+
```
387+
388+
### How to Configure and Run the Auto Start/Stop Program
389+
390+
1. Prepare the configuration file `config.yaml`:
391+
392+
```yaml
393+
check_script: /path/to/check.sh # Path to the script that checks if the instance should start
394+
start_script: /path/to/start.sh # Path to the script that starts the Harbor instance
395+
stop_script: /path/to/stop.sh # Path to the script that stops the Harbor instance
396+
check_interval: 30s
397+
failure_threshold: 3
398+
script_timeout: 10s
399+
```
400+
401+
2. Create the corresponding script files:
402+
403+
- **check.sh**: This script must be customized based on your internal implementation. It should return exit code 0 when the current cluster instance should be started, and a non-zero exit code otherwise. The following is a simple DNS IP check example (do not use directly in production):
404+
405+
```bash
406+
HARBOR_DOMAIN="${HARBOR_DOMAIN:-}"
407+
HARBOR_IP="${HARBOR_IP:-}"
408+
409+
RESOLVED_IP=$(nslookup "$HARBOR_DOMAIN" 2>/dev/null | grep -A 1 "Name:" | grep "Address:" | awk '{print $2}' | head -n 1)
410+
if [ "$RESOLVED_IP" = "$HARBOR_IP" ]; then
411+
exit 0
412+
else
413+
exit 1
414+
fi
415+
```
416+
417+
- **start.sh**: The start script should include checks for Harbor dependencies and the startup of the Harbor instance.
418+
419+
```bash
420+
# Check and control dependencies, such as verifying if the database is the primary instance
421+
# and if the object storage is ready
422+
dependencies start script
423+
424+
# Start Harbor script - this section is required
425+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
426+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
427+
HARBOR_REPLICAS="${HARBOR_REPLICAS:-1}"
428+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p "{\"spec\":{\"helmValues\":{\"core\":{\"replicas\":$HARBOR_REPLICAS},\"portal\":{\"replicas\":$HARBOR_REPLICAS},\"jobservice\":{\"replicas\":$HARBOR_REPLICAS},\"registry\":{\"replicas\":$HARBOR_REPLICAS},\"trivy\":{\"replicas\":$HARBOR_REPLICAS}}}}"
429+
```
430+
431+
- **stop.sh**: The stop script should include shutdown procedures for Harbor dependencies and the Harbor instance.
432+
433+
```bash
434+
# Stop Harbor script - this section is required
435+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
436+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
437+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p '{"spec":{"helmValues":{"core":{"replicas":0},"portal":{"replicas":0},"jobservice":{"replicas":0},"registry":{"replicas":0},"trivy":{"replicas":0}}}}'
438+
439+
# Check and control dependencies, such as setting the database to replica mode
440+
dependencies stop script
441+
```
442+
443+
1. Run the control program:
444+
445+
```bash
446+
docker run -d -v <script dir>:/opt/script -v <config dir>:/opt/config xxx -- -c /opt/config/config.yaml
447+
```
448+
449+
### `Alauda support for PostgreSQL` Start/Stop Script Examples
450+
451+
When using the `Alauda support for PostgreSQL` solution with the `PostgreSQL Hot Standby Cluster Configuration Guide` to configure a disaster recovery cluster, you need to configure replication information in both Primary and Secondary PostgreSQL clusters. This ensures that during automatic failover, you only need to modify `clusterReplication.isReplica` and `numberOfInstances` to complete the switchover:
452+
453+
**Primary Configuration:**
454+
455+
```yaml
456+
clusterReplication:
457+
enabled: true
458+
isReplica: false
459+
peerHost: 192.168.130.206 # Secondary cluster node IP
460+
peerPort: 31661 # Secondary cluster NodePort
461+
replSvcType: NodePort
462+
bootstrapSecret: standby-bootstrap-secret
463+
```
464+
465+
The `standby-bootstrap-secret` should be configured according to the `Standby Cluster Configuration` section in the `PostgreSQL Hot Standby Cluster Configuration Guide`, using the same value as the Secondary cluster.
466+
467+
**Secondary Configuration:**
468+
469+
```yaml
470+
clusterReplication:
471+
enabled: true
472+
isReplica: true
473+
peerHost: 192.168.12.108 # Primary cluster node IP
474+
peerPort: 30078 # Primary cluster NodePort
475+
replSvcType: NodePort
476+
bootstrapSecret: standby-bootstrap-secret
477+
```
478+
479+
#### Start Script Example
480+
481+
```bash
482+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
483+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
484+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":false},"numberOfInstances":2}}'
485+
```
486+
487+
#### Stop Script Example
488+
489+
```bash
490+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
491+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
492+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":true},"numberOfInstances":1}}'
493+
```
494+
495+
### Alauda Build of Rook-Ceph Start/Stop Script Examples
496+
497+
- **Start Script Example**: For more details, refer to [Object Storage Disaster Recovery](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html)
498+
499+
```bash
500+
REALM_NAME="${REALM_NAME:-real}"
501+
ZONE_GROUP_NAME="${ZONE_GROUP_NAME:-group}"
502+
ZONE_NAME="${ZONE_NAME:-zone}"
503+
504+
ACCESS_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.access-key}' 2>/dev/null | base64 -d)
505+
SECRET_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.secret-key}' 2>/dev/null | base64 -d)
506+
ENDPOINT=$(kubectl -n rook-ceph get cephobjectzone realm-zone -o jsonpath='{.spec.customEndpoints[0]}')
507+
TOOLS_POD=$(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
508+
509+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin realm pull --url="$ENDPOINT" --access-key="$ACCESS_KEY" --secret="$SECRET_KEY";
510+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin zone modify --rgw-realm="$REALM_NAME" --rgw-zonegroup="$ZONE_GROUP_NAME" --rgw-zone="$ZONE_NAME" --master
511+
```
512+
513+
- **Stop Script**: No action is required when stopping Alauda Build of Rook-Ceph, so you can add an empty script or skip this step.

0 commit comments

Comments
 (0)