Skip to content

Commit 5522794

Browse files
committed
Update disaster recovery documentation for Harbor
- Clarified the access address for the Primary Object Storage in the disaster recovery steps. - Renamed the "Primary-Standby Switchover Procedure" section to "Failover" for better clarity. - Expanded the "Disaster Recovery" section to include recovery steps for the original Primary Harbor. - Added details on automatic start/stop mechanisms for the disaster recovery instance, including configuration and script examples for managing Harbor and PostgreSQL instances.
1 parent d9c3db7 commit 5522794

File tree

1 file changed

+219
-3
lines changed

1 file changed

+219
-3
lines changed

docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md

Lines changed: 219 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ You need to create a CephObjectStoreUser in advance to obtain the access credent
108108
You only need to create the CephObjectStoreUser on the Primary Object Storage. The user information will be automatically synchronized to the Secondary Object Storage through the disaster recovery replication mechanism.
109109
:::
110110

111-
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#configure-external-access-for-primary-zone) of `Object Storage Disaster Recovery`.
111+
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#address) of `Object Storage Disaster Recovery`.
112112

113113
3. Create a Harbor registry bucket on Primary Object Storage using mc, in this example, the bucket name is `harbor-registry`.
114114

@@ -279,7 +279,7 @@ spec:
279279
replicas: 0
280280
```
281281

282-
### Primary-Standby Switchover Procedure in Disaster Scenarios
282+
### Failover
283283

284284
1. First confirm that all Primary Harbor components are not in working state, otherwise stop all Primary Harbor components first.
285285
2. Promote Secondary PostgreSQL to Primary PostgreSQL. Refer to `PostgreSQL Hot Standby Cluster Configuration Guide`, the switchover procedure.
@@ -311,7 +311,27 @@ spec:
311311
5. Test image push and pull to verify that Harbor is working properly.
312312
6. Switch external access addresses to Secondary Harbor.
313313

314-
### Disaster Recovery Data Check
314+
### Disaster Recovery
315+
316+
When the primary cluster recovers from a disaster, you can restore the original Primary Harbor to operate as a Secondary Harbor. Follow these steps to perform the recovery:
317+
318+
1. Set the replica count of all Harbor components to 0.
319+
2. Configure the original Primary PostgreSQL to operate as Secondary PostgreSQL according to the `PostgreSQL Hot Standby Cluster Configuration Guide`.
320+
3. Convert the original Primary Object Storage to Secondary Object Storage.
321+
322+
```bash
323+
# From within the recovered zone, pull the latest realm configuration from the current master zone:
324+
radosgw-admin realm pull --url={url-to-master-zone-gateway} \
325+
--access-key={access-key} --secret={secret}
326+
# Make the recovered zone the master and default zone:
327+
radosgw-admin zone modify --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name> --rgw-zone=<primary-zone-name> --master
328+
```
329+
330+
After completing these steps, the original Primary Harbor will operate as a Secondary Harbor.
331+
332+
If you need to restore the original Primary Harbor to continue operating as the Primary Harbor, follow the Failover procedure to promote the current Secondary Harbor to Primary Harbor, and then configure the new Primary Harbor to operate as Secondary Harbor.
333+
334+
### Data sync check
315335

316336
Check the synchronization status of Object Storage and PostgreSQL to ensure that the disaster recovery is successful.
317337

@@ -353,3 +373,199 @@ The RTO represents the maximum acceptable downtime during disaster recovery. Thi
353373
The operational steps are similar to building a Harbor disaster recovery solution with `Alauda Build of Rook-Ceph` and `Alauda support for PostgreSQL`. Simply replace Object Storage and PostgreSQL with other object storage and PostgreSQL solutions.
354374

355375
Ensure that the Object Storage and PostgreSQL solutions support disaster recovery capabilities.
376+
377+
## Automatic Start/Stop of Disaster Recovery Instance
378+
379+
This mechanism enables automatic activation of the Secondary Harbor instance when a disaster occurs. It supports custom check mechanisms through user-defined scripts and provides control over Harbor dependency configurations.
380+
381+
```mermaid
382+
flowchart TD
383+
Start[Monitoring Program] --> CheckScript[Check if Instance Should Start]
384+
CheckScript -->|"Yes (Script exit 0)"| StartScript[Execute StartScript]
385+
CheckScript -->|"No (Script exit Non-zero)"| StopScript[Execute StopScript]
386+
```
387+
388+
### How to Configure and Run the Auto Start/Stop Program
389+
390+
1. Prepare the configuration file `config.yaml`:
391+
392+
```yaml
393+
check_script: /path/to/check.sh # Path to the script that checks if the instance should start
394+
start_script: /path/to/start.sh # Path to the script that starts the Harbor instance
395+
stop_script: /path/to/stop.sh # Path to the script that stops the Harbor instance
396+
check_interval: 30s
397+
failure_threshold: 3
398+
script_timeout: 10s
399+
```
400+
401+
2. Create the corresponding script files:
402+
403+
- **check.sh**: This script must be customized based on your internal implementation. It should return exit code 0 when the current cluster instance should be started, and a non-zero exit code otherwise. The following is a simple DNS IP check example (do not use directly in production):
404+
405+
```bash
406+
HARBOR_DOMAIN="${HARBOR_DOMAIN:-}"
407+
HARBOR_IP="${HARBOR_IP:-}"
408+
409+
RESOLVED_IP=$(nslookup "$HARBOR_DOMAIN" 2>/dev/null | grep -A 1 "Name:" | grep "Address:" | awk '{print $2}' | head -n 1)
410+
if [ "$RESOLVED_IP" = "$HARBOR_IP" ]; then
411+
exit 0
412+
else
413+
exit 1
414+
fi
415+
```
416+
417+
- **start.sh**: The start script should include checks for Harbor dependencies and the startup of the Harbor instance.
418+
419+
```bash
420+
# Check and control dependencies, such as verifying if the database is the primary instance
421+
# and if the object storage is ready
422+
#####################################
423+
# Add your PostgreSQL start script here.
424+
# This script should promote the secondary PostgreSQL to primary role and ensure
425+
# the database is ready to serve Harbor before starting Harbor components.
426+
#####################################
427+
428+
#####################################
429+
# Add your S3/Object Storage start script here.
430+
# This script should promote the secondary object storage to primary role and ensure
431+
# the storage system is ready to serve Harbor before starting Harbor components.
432+
#####################################
433+
434+
# Start Harbor script - this section is required
435+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
436+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
437+
HARBOR_REPLICAS="${HARBOR_REPLICAS:-1}"
438+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p "{\"spec\":{\"helmValues\":{\"core\":{\"replicas\":$HARBOR_REPLICAS},\"portal\":{\"replicas\":$HARBOR_REPLICAS},\"jobservice\":{\"replicas\":$HARBOR_REPLICAS},\"registry\":{\"replicas\":$HARBOR_REPLICAS},\"trivy\":{\"replicas\":$HARBOR_REPLICAS}}}}"
439+
```
440+
441+
- **stop.sh**: The stop script should include shutdown procedures for Harbor dependencies and the Harbor instance.
442+
443+
```bash
444+
# Stop Harbor script - this section is required
445+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
446+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
447+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p '{"spec":{"helmValues":{"core":{"replicas":0},"portal":{"replicas":0},"jobservice":{"replicas":0},"registry":{"replicas":0},"trivy":{"replicas":0}}}}'
448+
449+
# Check and control dependencies, such as setting the database to replica mode
450+
#####################################
451+
# Add your PostgreSQL stop script here.
452+
# This script should configure the PostgreSQL cluster to operate as a replica
453+
# and scale down instances when stopping Harbor components.
454+
#####################################
455+
456+
#####################################
457+
# Add your S3/Object Storage stop script here.
458+
# This script should handle any necessary cleanup or configuration changes
459+
# for the object storage when stopping Harbor components.
460+
#####################################
461+
```
462+
463+
3. Deploy the control program as a Deployment in the Harbor namespace:
464+
465+
```yaml
466+
apiVersion: apps/v1
467+
kind: Deployment
468+
metadata:
469+
name: harbor-disaster-recovery-controller
470+
namespace: harbor-ns # Use the same namespace where Harbor is deployed
471+
spec:
472+
replicas: 1
473+
selector:
474+
matchLabels:
475+
app: harbor-disaster-recovery-controller
476+
template:
477+
metadata:
478+
labels:
479+
app: harbor-disaster-recovery-controller
480+
spec:
481+
containers:
482+
- name: controller
483+
image: xxx # Replace with your control program image
484+
command: ["--", "-c", "/opt/config/config.yaml"]
485+
volumeMounts:
486+
- name: script
487+
mountPath: /opt/script
488+
- name: config
489+
mountPath: /opt/config
490+
volumes:
491+
- name: script
492+
configMap:
493+
name: <script-configmap-name> # Replace with your ConfigMap name for scripts
494+
- name: config
495+
configMap:
496+
name: <config-configmap-name> # Replace with your ConfigMap name for config
497+
```
498+
499+
> **Note**: Ensure that the ServiceAccount used by the Deployment has the necessary RBAC permissions to operate on Harbor resources and any other resources controlled by your custom scripts (such as database resources, object storage configurations, etc.) in the target namespace. The control program needs permissions to modify Harbor CRD resources to start and stop Harbor components, as well as permissions for any resources managed by the custom start/stop scripts.
500+
501+
Apply the Deployment:
502+
503+
```bash
504+
kubectl apply -f harbor-disaster-recovery-controller.yaml
505+
```
506+
507+
### `Alauda support for PostgreSQL` Start/Stop Script Examples
508+
509+
When using the `Alauda support for PostgreSQL` solution with the `PostgreSQL Hot Standby Cluster Configuration Guide` to configure a disaster recovery cluster, you need to configure replication information in both Primary and Secondary PostgreSQL clusters. This ensures that during automatic failover, you only need to modify `clusterReplication.isReplica` and `numberOfInstances` to complete the switchover:
510+
511+
**Primary Configuration:**
512+
513+
```yaml
514+
clusterReplication:
515+
enabled: true
516+
isReplica: false
517+
peerHost: 192.168.130.206 # Secondary cluster node IP
518+
peerPort: 31661 # Secondary cluster NodePort
519+
replSvcType: NodePort
520+
bootstrapSecret: standby-bootstrap-secret
521+
```
522+
523+
The `standby-bootstrap-secret` should be configured according to the `Standby Cluster Configuration` section in the `PostgreSQL Hot Standby Cluster Configuration Guide`, using the same value as the Secondary cluster.
524+
525+
**Secondary Configuration:**
526+
527+
```yaml
528+
clusterReplication:
529+
enabled: true
530+
isReplica: true
531+
peerHost: 192.168.12.108 # Primary cluster node IP
532+
peerPort: 30078 # Primary cluster NodePort
533+
replSvcType: NodePort
534+
bootstrapSecret: standby-bootstrap-secret
535+
```
536+
537+
#### Start Script Example
538+
539+
```bash
540+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
541+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
542+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":false},"numberOfInstances":2}}'
543+
```
544+
545+
#### Stop Script Example
546+
547+
```bash
548+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
549+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
550+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":true},"numberOfInstances":1}}'
551+
```
552+
553+
### Alauda Build of Rook-Ceph Start/Stop Script Examples
554+
555+
- **Start Script Example**: For more details, refer to [Object Storage Disaster Recovery](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html)
556+
557+
```bash
558+
REALM_NAME="${REALM_NAME:-real}"
559+
ZONE_GROUP_NAME="${ZONE_GROUP_NAME:-group}"
560+
ZONE_NAME="${ZONE_NAME:-zone}"
561+
562+
ACCESS_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.access-key}' 2>/dev/null | base64 -d)
563+
SECRET_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.secret-key}' 2>/dev/null | base64 -d)
564+
ENDPOINT=$(kubectl -n rook-ceph get cephobjectzone realm-zone -o jsonpath='{.spec.customEndpoints[0]}')
565+
TOOLS_POD=$(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
566+
567+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin realm pull --url="$ENDPOINT" --access-key="$ACCESS_KEY" --secret="$SECRET_KEY";
568+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin zone modify --rgw-realm="$REALM_NAME" --rgw-zonegroup="$ZONE_GROUP_NAME" --rgw-zone="$ZONE_NAME" --master
569+
```
570+
571+
- **Stop Script**: No action is required when stopping Alauda Build of Rook-Ceph, so you can add an empty script or skip this step.

0 commit comments

Comments
 (0)