Backup and restore¶

This page serves as a global guideline on how backups are created and can be restored.

Quick FAQ¶

What is our Recovery Point Objective (RPO)?¶

Very simply said, the RPO is the maximum amount of data that can be lost during a disaster. For our solutions, data is backed up at most every 24 hours, so this is our RPO.

What is our Recovery Time Objective (RTO)?¶

Very simply said, the RTO is the time it takes to perform a full restore of the entire solution. Determining an RTO is more complex because it depends on certain factors such as size of the data which need to be restored.

The following timeline can be used roughly, not all steps are always applicable:

1 hour: Destroying broken AWS resources.
1 hour: (Re)creating AWS resources with Terraform.
3 hours: Provisioning all new EC2 instances.
In parallel:
1 hour per 40 GB of data: Running gitlab-backup to restore the instance.
- This data is calculated as "[size of gitlab_backup.tar] + [size of gitaly storage]".
1 hour per X GB of data: Restoring a point-in-time backup of the database.

In some cases some steps can be skipped, or sped up. If only the database is broken, there is no need to provision all new EC2 instances.

Backup¶

What is being stored and where?¶

This data is being stored in the same account as the solution:

Gitaly data is backed up to an S3 bucket (<prefix>-gitaly-backups) daily.
All relevant S3 buckets are backed up in AWS Backup with a continuous point-in-time backup.
RDS database is backed up in AWS Backup daily, with a continuous point-in-time backup in AWS Backup as well.
GitLab secrets and SSH host keys are backed up in AWS Secrets Manager.

Restore¶

Before restoring any kind of backup, ensure that all services that will interfere with the restore process are stopped:

ansible-playbook -i inventory glh.environment_toolkit.tools.pre_data_migration

After you have performed the restoration process, ensure all the services are running again:

ansible-playbook -i inventory glh.environment_toolkit.tools.post_data_migration

Cloud environments¶

We have specific documentation on how to restore different cloud environments. See: