Backup and restore¶
This page serves as a global guideline on how backups are created and can be restored.
Quick FAQ¶
What is our Recovery Point Objective (RPO)?¶
Very simply said, the RPO is the maximum amount of data that can be lost during a disaster. For our solutions, data is backed up at most every 24 hours, so this is our RPO.
What is our Recovery Time Objective (RTO)?¶
Very simply said, the RTO is the time it takes to perform a full restore of the entire solution. Determining an RTO is more complex because it depends on certain factors such as size of the data which need to be restored.
The following timeline can be used roughly, not all steps are always applicable:
- 1 hour: Destroying broken AWS resources.
- 1 hour: (Re)creating AWS resources with Terraform.
- 3 hours: Provisioning all new EC2 instances.
- In parallel:
- 1 hour per 40 GB of data: Running
gitlab-backupto restore the instance.- This data is calculated as "[size of gitlab_backup.tar] + [size of gitaly storage]".
- 1 hour per X GB of data: Restoring a point-in-time backup of the database.
In some cases some steps can be skipped, or sped up. If only the database is broken, there is no need to provision all new EC2 instances.
Backup¶
What is being stored and where?¶
This data is being stored in the same account as the solution:
- Gitaly data is backed up to an S3 bucket (
<prefix>-gitaly-backups) daily. - All relevant S3 buckets are backed up in AWS Backup with a continuous point-in-time backup.
- RDS database is backed up in AWS Backup daily, with a continuous point-in-time backup in AWS Backup as well.
- GitLab secrets and SSH host keys are backed up in AWS Secrets Manager.
Restore¶
Before restoring any kind of backup, ensure that all services that will interfere with the restore process are stopped:
ansible-playbook -i inventory glh.environment_toolkit.tools.pre_data_migration
After you have performed the restoration process, ensure all the services are running again:
ansible-playbook -i inventory glh.environment_toolkit.tools.post_data_migration
Restore an S3 bucket¶
AWS documentation: Restore S3 data
Caveats¶
Unfortunately restoring an S3 buckets is a bit of a hassle. We have blocked public access to S3 buckets, as per AWS' best practices. However, to restore a backup to an S3 bucket you need to enable this. Go to your S3 bucket > Permissions > Block public access. Modify the block public access settings to allow the use of public ACLs, and then restore the backup.
Don't forget to restore this setting afterward. This can also be done by running a terraform apply.
Restoration process¶
S3 buckets need to be restored one-by-one. To restore a bucket perform these actions:
- Go to AWS Backup > Protected resources.
- Click on the Resource ID of the bucket you want to restore.
- Choose the recovery point and click "Restore".
- Here you need to specify:
- The date/time to when you want to roll back.
- Under "Restore type", you will probably want to restore the entire bucket.
- Under "Restore destination", choose the location to where you want to restore.
- Leave the encryption key to its original value.
- Under "Restore role", select the
<prefix>-gitlab-backup-role. - Click "Restore backup".
The backup will now be restored. This will take an increasing amount of time depending on the amount of data being restored.
Restore an RDS database¶
AWS documentation: Restore an RDS database
Caveats¶
Unfortunately restoring an RDS backup can only be done to a new instance, you can't override the existing one. Because the endpoint URL of RDS is based on its name, you can restore a backup to a new instance, delete or rename the old instance, and give the new instance the old instance's name. This effectively replaces the old RDS instance with a new one.
AWS documentation: Renaming to replace an existing DB instance
Restoration process¶
- Go to AWS Backup > Protected resources.
- Click on the Resource ID of the RDS database you want to restore.
- Choose the recovery point and click "Restore".
- Here you need to specify:
- The date/time to when you want to roll back.
- For all the customizable settings, choose the option matching the existing database.
- For the "DB Instance Identifier", choose
<existing-name>-restored. - Under "Restore role", select the "Default role".
- Click "Restore backup".
- After restoration, perform the rename trick as described under "Caveats".
To ensure Terraform will manage the restored database remove the old one from the state and add the new instance:
terraform state rm module.gitlab_cluster.aws_db_instance.gitlab[0]
terraform import module.gitlab_cluster.aws_db_instance.gitlab[0] <rds-arn>
terraform apply
Restore gitlab-secrets¶
GitLab secrets and SSH host keys are backed up in AWS Secrets Manager. The secrets are stored as plain JSON,
the host keys are grouped in a .tar.gz file which is base64-encoded.
To retrieve the secrets, go to AWS Secrets Manager > Secrets > GitLabSecrets and click "Retrieve secret value".
Copy this to ansible/tmp/gitlab-secrets.json in your solution.
To retrieve the host keys, go to AWS Secrets Manager > Secrets > SshHostKeys and click "Retrieve secret value".
Copy this to ansible/tmp/host_keys.tar.gz.base64. Decode this file with base64 -d host_keys.tar.gz.b64 > host_keys.tar.gz.
Extract this file with tar -xvzf host_keys.tar.gz. Copy all the host keys directly to ansible/tmp/.
After this, follow the instructions from the migration docs.