Skip to content

Restore WP1 from a backup

benoit74 edited this page Oct 29, 2024 · 17 revisions

Aside from a Zimfarm instance which is not covered in this documentation, WP1 relies on a compute instance (mwcurator) and a Trove DB.

This documentation details the procedure to restore everything from scratch. Depending on the failure we are encountering, some parts can obviously be skipped.

Prerequisites

  • admin access to the target wikimedia cloud project (currently mwoffliner)
  • a machine (probably in the cloud) with
    • significant bandwidth to borg servers (contains the backup) and wikimedia cloud (target of the restore)
    • sufficient space to hold a database backup on local disk (will be stored there temporarily)
    • mariadb CLI (install with sudo apt install mariadb-client if you do not have it yet)
    • SSH credentials on this machine to access wikimedia cloud machine with SSH (you can add a temporary new key on your user at https://idm.wikimedia.org/keymanagement/

All CLI steps described below are expected to be done on this machine. Anything done in a browser can obviously be done on any machine.

Recreate mwcurator cloud VPS instance and database

You may need to request additional quota in your cloud project from Wikitech, especially if the original machines are still running. See https://phabricator.wikimedia.org/T375977 for inspiration on how to do it (this is the ticket we opened to request increased quota while building this documentation and testing the restore procedure).

  1. Go to https://horizon.wikimedia.org/
  2. Select mwoffliner as your project
  3. Re-create the application server
    1. Under Compute -> Instances
    2. Select “Launch Instance”
    3. Under “Source” select appropriate image (currently we use debian-11-bullseye)
    4. Under “Flavor” select an appropriate number of vCPUs, RAM and disk (currently we use g4.cores8.ram16.disk20)
    5. Under ”Security groups”, add the web security group to the instance to expose ports 80 and 443 (more details in these instructions ().
  4. Re-create the database server
    1. Under Database -> Instances
    2. Select “Launch Instance”
    3. Under “Volume Size”, choose an appropriate number of GB to handle the size of the database (currently we use 60 GB)
    4. Under “Datastore”, choose mariadb
    5. Under “Flavor”, choose an appropriate number of vCPUs, RAM and disk (currently we use g4.cores2.ram4.disk20 | 4GB RAM).
    6. Under “Initialize Databases”
      1. Initial Databases: enwp10_prod
      2. Initial Admin User: wp1
      3. Password: generate a secure password externally and paste in. Write it down, you'll need it later of course.
  5. Under “Advanced”
    1. Configuration Group: wp1-db-import (will make the db import faster)

Connect to server via SSH

If necessary, alter your ~/.ssh/config file to add new stuff.

Typical configuration looks like this (replace <your_username> with your Wikimedia cloud SSH user, see https://wikitech.wikimedia.org/wiki/Help:Accessing_Cloud_VPS_instances for help):

Host bastion.wmcloud.org
    HostName bastion.wmcloud.org
    User <your_username>

Host mwcurator
     HostName mwcurator.mwoffliner.eqiad1.wikimedia.cloud
     ProxyJump bastion.wmcloud.org
     User <your_username>

You should now be able to run this command:

ssh mwcurator

Import the database from a backup

Follow these directions to access the database borg backups. You will need BitWarden credentials. The name of the backup store (borg repository) is wp1db.

Find the hostname of your new Trove (database) instance that you created above. This is in Horizon under Databases -> Instances. Click the instance name and you should see something like this:

image

Set up an SSH tunnel to that database host, through your toolforge account. You can use the command:

ssh -L 3306:ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud:3306 login.toolforge.org

NOTE: See the Wikimedia Cloud docs for more information on setting up the tunnel. You will need to have Tools or Toolforge credentials set up in your ~/.ssh/config file for this to work. See this help file for details on setting up SSH.

Find your backup file. Mine was in /data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/ but if the production trove hostname changes, yours could be different.

Use the following command, entering the password you chose above, to start restoring the database:

mysql -h 127.0.0.1 -P 3306 -u wp1 -p enwp10_prod < data/restore/root/.borgmatic/mysql_databases/tdlqt33y3nt.svc.trove.eqiad1.wikimedia.cloud/enwp10_prod

In testing, this took about 7 hours.

Set up the replacement application server (if necessary)

  1. Install docker using these directions.
  2. Create the following directories:
sudo mkdir -p /data/wp1bot /data/code/ /data/wp1bot/db/ /srv/log/wp1bot/ /srv/data/wp1bot/

Note that the /srv directory is an NFS mount. The /data directory, on the original server, was an attached Cinder volume that was needed for other operations on that server, but is not needed for WP1. These paths are hardcoded in docker-compose.yml but could be updated there if you're having trouble creating the directories.

  1. cd /data/code and checkout the wp1 repository: sudo git clone https://github.com/openzim/wp1.git
  2. Copy the example credentials: sudo cp /data/code/wp1/wp1/credentials.py.example /data/wp1bot/credentials.py
  3. Edit the file (sudo nano /data/wp1bot/credentials.py), providing the necessary values (commented out) and deleting the keys: Environment.DEVELOPMENT, Environment.TEST, and the existing empty Environment.PRODUCTION key.
  4. Edit the ENV = line to read ENV = Environment.PRODUCTION
    1. WIKIDB is the Wikipedia replica db, also known as enwiki_p. The credentials are your project toolforge credentials, which can be found buy logging into ssh login.toolforge.org and reading the file replica.my.cnf.
    2. WP10DB is the application database that you restored to Trove. User should be wp1, password is the password you set when you restored, host is the Trove host (ofi3zurkdgo.svc.trove.eqiad1.wikimedia.cloud in our example). You can leave out the port (it defaults to 3306 which is where Trove is running).
    3. 'REDIS': { 'host': 'redis', 'port': 6379 }
    4. 'API': { 'user': 'WP 1.0 bot@WP_1.0_Bot', 'pass': ??? }, TODO: figure out how we would find/reset this password.
    5. 'MWOAUTH': If you've lost this credential, you will need to register a new OAuth application. The client secret cannot be recovered from any Wikimedia web interface.
    6. 'SESSION': { 'secret_key': 'any sufficiently long string of random characters, like a password' }. If you wish users to not be logged out, you need to set the same 'secret_key' as the previous application server.
    7. For 'CLIENT_URL', it should stay the same as the example values. If these values change, you will need to update the VIRTUAL_HOST keys in docker-compose.yml.
    8. 'STORAGE' is the AWS S3 config, where we store created selections and ZIMs. These should be available from Kiwix.
    9. 'ZIMFARM' is the credentials for the Zimfarm that is used to create ZIMs. Get these credentials from Kiwix.
  5. Create a file, /data/wp1bot/db/.wp1db_backup.env with the following keys (get the values from Kiwix):
    BORGBASE_NAME=
    BW_CLIENTID=
    BW_CLIENTSECRET=
    BW_PASSWORD=
    BITWARDEN_EMAIL=
    DATABASES=
    BACKUP_HOUR=
    BACKUP_MINUTE=
    
    For the DATABASES key, this should be a MariaDB connection string with the username and password embedded. Use the username/password to the Trove database that you set in the credentials.py file above. Eg: mysql://wp1:[email protected]/enwp10_prod
  6. Create another file in the same directory: /data/wp1bot/db/yoyo.ini with the following:
    [DEFAULT]
    sources = /usr/src/app/db/migrations/
    migration_table = _yoyo_migration
    batch_mode = off
    verbosity = 0
    database = ???
    
    This time the database connection string (mysql://user:pass@host/db) is the application database, as seen in the WP10DB key above.

Mapping the IP/DNS

The application server is mapped to a floating IP address, which allows it to be mapped to its domain, wp1.openzim.org. If you are restoring to a new application server, you should go to the floating IP management screen and detach the IP from the existing (crashed) server and reattach it to your new server.

Starting the app server

  1. Follow the deploy instructions in the README, starting with 'Pull the docker images from docker hub'.

Clone this wiki locally