Skip to content

Commit b7296c5

Browse files
committed
Update disaster recovery documentation for Harbor
- Clarified the access address for the Primary Object Storage in the disaster recovery steps. - Renamed the "Primary-Standby Switchover Procedure" section to "Failover" for better clarity. - Expanded the "Disaster Recovery" section to include recovery steps for the original Primary Harbor. - Added details on automatic start/stop mechanisms for the disaster recovery instance, including configuration and script examples for managing Harbor and PostgreSQL instances.
1 parent d9c3db7 commit b7296c5

File tree

1 file changed

+199
-3
lines changed

1 file changed

+199
-3
lines changed

docs/en/solutions/How_to_perform_disaster_recovery_for_harbor.md

Lines changed: 199 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ You need to create a CephObjectStoreUser in advance to obtain the access credent
108108
You only need to create the CephObjectStoreUser on the Primary Object Storage. The user information will be automatically synchronized to the Secondary Object Storage through the disaster recovery replication mechanism.
109109
:::
110110

111-
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#configure-external-access-for-primary-zone) of `Object Storage Disaster Recovery`.
111+
2. This `PRIMARY_OBJECT_STORAGE_ADDRESS` is the access address of the Object Storage, you can get it from the step [Configure External Access for Primary Zone](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html#address) of `Object Storage Disaster Recovery`.
112112

113113
3. Create a Harbor registry bucket on Primary Object Storage using mc, in this example, the bucket name is `harbor-registry`.
114114

@@ -279,7 +279,7 @@ spec:
279279
replicas: 0
280280
```
281281

282-
### Primary-Standby Switchover Procedure in Disaster Scenarios
282+
### Failover
283283

284284
1. First confirm that all Primary Harbor components are not in working state, otherwise stop all Primary Harbor components first.
285285
2. Promote Secondary PostgreSQL to Primary PostgreSQL. Refer to `PostgreSQL Hot Standby Cluster Configuration Guide`, the switchover procedure.
@@ -311,7 +311,27 @@ spec:
311311
5. Test image push and pull to verify that Harbor is working properly.
312312
6. Switch external access addresses to Secondary Harbor.
313313

314-
### Disaster Recovery Data Check
314+
### Disaster Recovery
315+
316+
When the primary cluster recovers from a disaster, you can restore the original Primary Harbor to operate as a Secondary Harbor. Follow these steps to perform the recovery:
317+
318+
1. Set the replica count of all Harbor components to 0.
319+
2. Configure the original Primary PostgreSQL to operate as Secondary PostgreSQL according to the `PostgreSQL Hot Standby Cluster Configuration Guide`.
320+
3. Convert the original Primary Object Storage to Secondary Object Storage.
321+
322+
```bash
323+
# From within the recovered zone, pull the latest realm configuration from the current master zone:
324+
radosgw-admin realm pull --url={url-to-master-zone-gateway} \
325+
--access-key={access-key} --secret={secret}
326+
# Make the recovered zone the master and default zone:
327+
radosgw-admin zone modify --rgw-realm=<realm-name> --rgw-zonegroup=<zone-group-name> --rgw-zone=<primary-zone-name> --master
328+
```
329+
330+
After completing these steps, the original Primary Harbor will operate as a Secondary Harbor.
331+
332+
If you need to restore the original Primary Harbor to continue operating as the Primary Harbor, follow the Failover procedure to promote the current Secondary Harbor to Primary Harbor, and then configure the new Primary Harbor to operate as Secondary Harbor.
333+
334+
### Data sync check
315335

316336
Check the synchronization status of Object Storage and PostgreSQL to ensure that the disaster recovery is successful.
317337

@@ -353,3 +373,179 @@ The RTO represents the maximum acceptable downtime during disaster recovery. Thi
353373
The operational steps are similar to building a Harbor disaster recovery solution with `Alauda Build of Rook-Ceph` and `Alauda support for PostgreSQL`. Simply replace Object Storage and PostgreSQL with other object storage and PostgreSQL solutions.
354374

355375
Ensure that the Object Storage and PostgreSQL solutions support disaster recovery capabilities.
376+
377+
## Automatic Start/Stop of Disaster Recovery Instance
378+
379+
This mechanism enables automatic activation of the Secondary Harbor instance when a disaster occurs. It supports custom check mechanisms through user-defined scripts and provides control over Harbor dependency configurations.
380+
381+
```mermaid
382+
flowchart TD
383+
Start[Monitoring Program] --> CheckScript[Check if Instance Should Start]
384+
CheckScript -->|"Yes (Script exit 0)"| StartScript[Execute StartScript]
385+
CheckScript -->|"No (Script exit Non-zero)"| StopScript[Execute StopScript]
386+
```
387+
388+
### How to Configure and Run the Auto Start/Stop Program
389+
390+
1. Prepare the configuration file `config.yaml`:
391+
392+
```yaml
393+
check_script: /path/to/check.sh # Path to the script that checks if the instance should start
394+
start_script: /path/to/start.sh # Path to the script that starts the Harbor instance
395+
stop_script: /path/to/stop.sh # Path to the script that stops the Harbor instance
396+
check_interval: 30s
397+
failure_threshold: 3
398+
script_timeout: 10s
399+
```
400+
401+
2. Create the corresponding script files:
402+
403+
- **check.sh**: This script must be customized based on your internal implementation. It should return exit code 0 when the current cluster instance should be started, and a non-zero exit code otherwise. The following is a simple DNS IP check example (do not use directly in production):
404+
405+
```bash
406+
HARBOR_DOMAIN="${HARBOR_DOMAIN:-}"
407+
HARBOR_IP="${HARBOR_IP:-}"
408+
409+
RESOLVED_IP=$(nslookup "$HARBOR_DOMAIN" 2>/dev/null | grep -A 1 "Name:" | grep "Address:" | awk '{print $2}' | head -n 1)
410+
if [ "$RESOLVED_IP" = "$HARBOR_IP" ]; then
411+
exit 0
412+
else
413+
exit 1
414+
fi
415+
```
416+
417+
- **start.sh**: The start script should include checks for Harbor dependencies and the startup of the Harbor instance.
418+
419+
```bash
420+
# Check and control dependencies, such as verifying if the database is the primary instance
421+
# and if the object storage is ready
422+
dependencies start script
423+
424+
# Start Harbor script - this section is required
425+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
426+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
427+
HARBOR_REPLICAS="${HARBOR_REPLICAS:-1}"
428+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p "{\"spec\":{\"helmValues\":{\"core\":{\"replicas\":$HARBOR_REPLICAS},\"portal\":{\"replicas\":$HARBOR_REPLICAS},\"jobservice\":{\"replicas\":$HARBOR_REPLICAS},\"registry\":{\"replicas\":$HARBOR_REPLICAS},\"trivy\":{\"replicas\":$HARBOR_REPLICAS}}}}"
429+
```
430+
431+
- **stop.sh**: The stop script should include shutdown procedures for Harbor dependencies and the Harbor instance.
432+
433+
```bash
434+
# Stop Harbor script - this section is required
435+
HARBOR_NAMESPACE="${HARBOR_NAMESPACE:-harbor-ns}"
436+
HARBOR_NAME="${HARBOR_NAME:-harbor}"
437+
kubectl -n "$HARBOR_NAMESPACE" patch harbor "$HARBOR_NAME" --type=merge -p '{"spec":{"helmValues":{"core":{"replicas":0},"portal":{"replicas":0},"jobservice":{"replicas":0},"registry":{"replicas":0},"trivy":{"replicas":0}}}}'
438+
439+
# Check and control dependencies, such as setting the database to replica mode
440+
dependencies stop script
441+
```
442+
443+
3. Deploy the control program as a Deployment in the Harbor namespace:
444+
445+
```yaml
446+
apiVersion: apps/v1
447+
kind: Deployment
448+
metadata:
449+
name: harbor-disaster-recovery-controller
450+
namespace: harbor-ns # Use the same namespace where Harbor is deployed
451+
spec:
452+
replicas: 1
453+
selector:
454+
matchLabels:
455+
app: harbor-disaster-recovery-controller
456+
template:
457+
metadata:
458+
labels:
459+
app: harbor-disaster-recovery-controller
460+
spec:
461+
containers:
462+
- name: controller
463+
image: xxx # Replace with your control program image
464+
command: ["--", "-c", "/opt/config/config.yaml"]
465+
volumeMounts:
466+
- name: script
467+
mountPath: /opt/script
468+
- name: config
469+
mountPath: /opt/config
470+
volumes:
471+
- name: script
472+
hostPath:
473+
path: <script dir> # Replace with your script directory path
474+
- name: config
475+
hostPath:
476+
path: <config dir> # Replace with your config directory path
477+
```
478+
479+
> **Note**: Ensure that the ServiceAccount used by the Deployment has the necessary RBAC permissions to operate on Harbor resources and any other resources controlled by your custom scripts (such as database resources, object storage configurations, etc.) in the target namespace. The control program needs permissions to modify Harbor CRD resources to start and stop Harbor components, as well as permissions for any resources managed by the custom start/stop scripts.
480+
481+
Apply the Deployment:
482+
483+
```bash
484+
kubectl apply -f harbor-disaster-recovery-controller.yaml
485+
```
486+
487+
### `Alauda support for PostgreSQL` Start/Stop Script Examples
488+
489+
When using the `Alauda support for PostgreSQL` solution with the `PostgreSQL Hot Standby Cluster Configuration Guide` to configure a disaster recovery cluster, you need to configure replication information in both Primary and Secondary PostgreSQL clusters. This ensures that during automatic failover, you only need to modify `clusterReplication.isReplica` and `numberOfInstances` to complete the switchover:
490+
491+
**Primary Configuration:**
492+
493+
```yaml
494+
clusterReplication:
495+
enabled: true
496+
isReplica: false
497+
peerHost: 192.168.130.206 # Secondary cluster node IP
498+
peerPort: 31661 # Secondary cluster NodePort
499+
replSvcType: NodePort
500+
bootstrapSecret: standby-bootstrap-secret
501+
```
502+
503+
The `standby-bootstrap-secret` should be configured according to the `Standby Cluster Configuration` section in the `PostgreSQL Hot Standby Cluster Configuration Guide`, using the same value as the Secondary cluster.
504+
505+
**Secondary Configuration:**
506+
507+
```yaml
508+
clusterReplication:
509+
enabled: true
510+
isReplica: true
511+
peerHost: 192.168.12.108 # Primary cluster node IP
512+
peerPort: 30078 # Primary cluster NodePort
513+
replSvcType: NodePort
514+
bootstrapSecret: standby-bootstrap-secret
515+
```
516+
517+
#### Start Script Example
518+
519+
```bash
520+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
521+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
522+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":false},"numberOfInstances":2}}'
523+
```
524+
525+
#### Stop Script Example
526+
527+
```bash
528+
POSTGRES_NAMESPACE="${POSTGRES_NAMESPACE:-pg-namespace}"
529+
POSTGRES_CLUSTER="${POSTGRES_CLUSTER:-acid-pg}"
530+
kubectl -n "$POSTGRES_NAMESPACE" patch pg "$POSTGRES_CLUSTER" --type=merge -p '{"spec":{"clusterReplication":{"isReplica":true},"numberOfInstances":1}}'
531+
```
532+
533+
### Alauda Build of Rook-Ceph Start/Stop Script Examples
534+
535+
- **Start Script Example**: For more details, refer to [Object Storage Disaster Recovery](https://docs.alauda.io/container_platform/4.1/storage/storagesystem_ceph/how_to/disaster_recovery/dr_object.html)
536+
537+
```bash
538+
REALM_NAME="${REALM_NAME:-real}"
539+
ZONE_GROUP_NAME="${ZONE_GROUP_NAME:-group}"
540+
ZONE_NAME="${ZONE_NAME:-zone}"
541+
542+
ACCESS_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.access-key}' 2>/dev/null | base64 -d)
543+
SECRET_KEY=$(kubectl -n rook-ceph get secrets "${REALM_NAME}-keys" -o jsonpath='{.data.secret-key}' 2>/dev/null | base64 -d)
544+
ENDPOINT=$(kubectl -n rook-ceph get cephobjectzone realm-zone -o jsonpath='{.spec.customEndpoints[0]}')
545+
TOOLS_POD=$(kubectl -n rook-ceph get po -l app=rook-ceph-tools -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
546+
547+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin realm pull --url="$ENDPOINT" --access-key="$ACCESS_KEY" --secret="$SECRET_KEY";
548+
kubectl -n rook-ceph exec "$TOOLS_POD" -- radosgw-admin zone modify --rgw-realm="$REALM_NAME" --rgw-zonegroup="$ZONE_GROUP_NAME" --rgw-zone="$ZONE_NAME" --master
549+
```
550+
551+
- **Stop Script**: No action is required when stopping Alauda Build of Rook-Ceph, so you can add an empty script or skip this step.

0 commit comments

Comments
 (0)