Migrating a new customer to GET¶

This guide describes how to prepare and perform a migration from a customer's existing GitLab instance to a GET based cluster on GitLabHost.

Preparing the environment¶

You should start by creating a new solution and provisioning it as usual. If you are still in a stage where you need to pre-transfer some data (eg: existing S3 buckets), you can leave out everything but the bastion and NFS nodes, a PostgreSQL and Redis host, and one rails node.

Before calling gitlab-backup restore later in this guide you will need to setup the entire environment.

You should also try to map the existing gitlab.rb config as best as possible before starting on this guide. You can leave out options that our toolkit overrides such as S3 storage settings. Of note here are SMTP and OmniAuth related settings, and the configured domains for the cluster.

Unless mentioned explicitly, all steps in this guide are to be performed on the primary rails node. If the command is ansible-playbook run this on your localhost instead.

If the Omnibus tarfile you are given by the customer is larger than 20 GB, you should temporarily attach an additional volume to the node. You can perform this via clickops in the AWS console. Choose gp3 as the volume type and max out the options for both the IOPS and Transfer rate variables. It's wise to make the volume three times as large as the backup tarfile.

It's fairly easy to manually attach the volume to the machine. Please note: if you reboot the machine, you need to re-perform the mounting steps below.

TEMPORARY_DRIVE="/dev/nvme1n1"  # Replace this before copy+pasting.
USER=$(test -n "$SUDO_USER" && echo $SUDO_USER || whoami)

# First-use only, this will error out if a filesystem already exists.
sudo mkfs.ext4 $TEMPORARY_DRIVE

# Mounting steps
sudo mkdir -p /mnt/tmp
sudo mount $TEMPORARY_DRIVE /mnt/tmp
sudo mkdir -p /mnt/tmp/{backups,user}
sudo chown ${USER}:${USER} /mnt/tmp/user
sudo test -d /var/opt/gitlab/backups && sudo mv /var/opt/gitlab/backups /var/opt/gitlab/current_backups
sudo mkdir -p /var/opt/gitlab/backups
sudo mount --bind /mnt/tmp/backups /var/opt/gitlab/backups/
sudo chown -R git:git /var/opt/gitlab/backups

Now you will have a temporary volume on /mnt/tmp, which is both accessible to your user via /mnt/tmp/user, and bind-mounted to the default GitLab backups directory via /mnt/tmp/backups.

The purpose of this is so you can perform all download steps as a regular user instead of root. When you are ready to restore, you can simply move the downloaded files to the /mnt/tmp/backups directory without having to cross filesystem boundaries (which would copy the data instead of moving it).

Install some additional tooling to make your life easier:

sudo apt install -qy tmux byobu curl wget awscli

Installing the correct secrets¶

If you are going to restore from an Omnibus backup, the encryption secrets and SSH host keys will not be included in the process. You need to manually restore these in advance. You should ask the customer for the gitlab-secrets.json file, and the SSH host keys.

Omnibus source¶

For Omnibus setups, the secrets are in /etc/gitlab/gitlab-secrets.json, and the SSH host keys are usually found at /etc/ssh/ssh_host_*. Most customers will find their own way to retrieve and send you these files without assistance.

Kubernetes source¶

If the user is running GitLab in Kubernetes chart, they must perform the extraction of the secrets in a different way. The location of the SSH host keys inside Kubernetes secrets storage is defined as explained here.

In most cases, the underlying values are stored in a yaml file. You can copy autogenerated host keys in /etc/ssh, to your local machine, and replace their contents with the customer-provided values where needed. If the customer only provides you with the private keys, remove the relevant .pub files, we'll generate these later.

While pasting, you may notice that some keys have a --BEGIN RSA PRIVATE KEY-- header and others have a --BEGIN OPENSSH PRIVATE KEY-- header. This is no problem as OpenSSH can work with both. Keep the headers as they are.

For extracting the gitlab-secrets.json, send the customer the autogenerated current version on your cluster. Then, direct the customer to the upstream guide: Migrating from Helm to Linux.

The resulting gitlab-secrets-updated.json can then be used as normal. Rename it to gitlab-secrets.json before transferring it to the target nodes using the steps outlined below.

Propagating the secrets¶

Put all the secrets in ansible/tmp inside your solution. Ensure ansible/tmp/gitlab-secrets.json exists. The SSH host keys must all have an extensionless and a .pub extensioned variant.

If you are missing the .pub variant for a private key, you can extract these with the following command: ssh-keygen -f ssh_host_ecdsa_key -y > ssh_host_ecdsa_key.pub

Your folder should now look like this:

total 44K
drwxr-xr-x 1 user user  278 Feb 14 12:16 .
drwxr-xr-x 1 user user  230 Feb 14 12:15 ..
-rw-r--r-- 1 user user  19K Feb 14 12:14 gitlab-secrets.json
-rw------- 1 user user  513 Feb 14 12:14 ssh_host_ecdsa_key
-rw-r--r-- 1 user user  181 Feb 14 12:14 ssh_host_ecdsa_key.pub
-rw------- 1 user user  411 Feb 14 12:14 ssh_host_ed25519_key
-rw-r--r-- 1 user user  101 Feb 14 12:14 ssh_host_ed25519_key.pub
-rw------- 1 user user 2.6K Feb 14 12:14 ssh_host_rsa_key
-rw-r--r-- 1 user user  573 Feb 14 12:14 ssh_host_rsa_key.pub

Next, run the helper playbook for installing the new secrets. This will cause downtime:

ansible-playbook -i inventory glh.environment_toolkit.tools.transfer_secrets

Please note: your rails nodes may change their SSH fingerprints as we still use the same host keys for Git usage as maintenance usage in some setups. This may prevent you from executing Ansible on these nodes afterwards. You can remove the current fingerprints from shared/ssh/known_hosts and stop any control processes to resolve this.

If the secrets have been transferred successfully, remove the entire ansible/tmp folder. If you want to keep them around, do so for a short amount of time only.

Always ensure the secret files are not commited and/or pushed to Git.

Restoring the GitLab backup¶

Prerequisites¶

Before continuing, please ensure the following statements are True:

All nodes are created and provisioned normally at least once.
The installed GitLab version is exactly the same as the one the backup was made with (eg: 16.4.3-ee).
The Gitaly nodes have data disks that are larger than the amount of Git data in the backup.
The installed gitlab-secrets.json matches the backup to be restored was installed on all nodes.
You are connected with the primary rails node, and you are working in a screen/tmux/byobu session.
The backup tar you want to restore is visible in /var/opt/gitlab/backups/, and is owned by the git user.
You have plenty of time to babysit the restore process.
If the customer was already on object storage (eg: S3, Minio):
- these objects have been copied to the target S3 buckets already.
If the customer was not on object storage, and the objects reside in the tarfile:
- Sidekiq is enabled on the primary rails node, to make this happen:
- add "sidekiq['enable'] = true" to /etc/gitlab/gitlab.rb
- run gitlab-ctl reconfigure afterwards
- The disks are large enough for restoring all the object data to the primary rails node. (move /var/opt/gitlab/gitlab-rails/shared to the temporary volume if required)

Restoring the tarfile¶

If you are sure everything is ready, start by stopping services that will interfere with the restore process:

ansible-playbook -i inventory glh.environment_toolkit.tools.pre_data_migration

Next, just restore the GitLab backup as per usual:

sudo gitlab-backup restore BACKUP=1707868831_2024_02_14_16.7.4-ee

If the system asks for regeneration of the authorized_keys file, just answer yes.

The whole backup restore process can take quite a while if there are lots of Git repositories to be restored. Keep an eye on the available disk space on your primary rails node and on all Gitaly nodes.

Silent mode¶

Should this backup restore primarily be for testing an actual migration, consider enabling Silent Mode before restarting relevant services. Silent mode prevents emails and webhooks from being sent out.

A shortcut copied from the upstream docs, linked above:

sudo gitlab-rails r '::Gitlab::CurrentSettings.update!(silent_mode_enabled: true)'

Migrating data from the tarfile to object storage¶

If the source instance did not store data on object storage, this data will have been restored to the local filesystem. Run gitlab-ctl start sidekiq before running any of the migration raketasks.

You can follow the upstream guides to check if this is the case, and what commands to run to migrate it: https://docs.gitlab.com/ee/administration/object_storage.html#migrate-to-object-storage

The CI secure files feature (annoyingly) has it's own storage and documentation: https://docs.gitlab.com/ee/administration/secure_files.html#using-object-storage

The Registry is not hosted by GitLab and thus must be migrated manually. To migrate the data, perform the following commands:

sudo -i  # If not done already
cd /var/opt/gitlab/gitlab-rails/shared
aws s3 sync registry s3://PREFIX-registry  # Replace PREFIX with your prefix - eg: maikel-dev

Note: if you add a trailing slash to the s3 sync command, everything breaks. After the sync operation completes, you can remove the contents of /var/opt/gitlab/gitlab-rails/shared/registry/.

The raketasks to migrate data may move data in the background without telling you explicitly.

Use the (SQL) commands described in the upstream documentation to check if the migration has completed already. Usually load of the primary rails node is a good indicator, keep htop or dstat open if you want to prevent running the (SQL) command every 5 seconds.

Continue with the next section only after you are sure all the data is migrated away.

Post migration¶

If you started Sidekiq temporarily for migrating objects to S3, remove the relevant line from /etc/gitlab/gitlab.rb, and run gitlab-ctl reconfigure afterwards.

If you moved /var/opt/gitlab/gitlab-rails/shared/ to a temporary volume, move it back.

Next, ensure the secrets are valid for the database: sudo gitlab-rake gitlab:doctor:secrets.

You can run more checks if you want, these are not required but can be found in the upstream documentation.

Now run the helper playbook to (re) start all services: ansible-playbook -i inventory glh.environment_toolkit.tools.post_data_migration.

All services should now start and within a minute or two, the GitLab webinterface will be online. If possible, log in via the webinterface, and look around to check if everything seems ok.

You can also check the monitoring / alerting for any issues, it might take a few minutes for the system to realize that everything is in order, so give the monitoring 5 to 10 minutes to catch up.

If you had to install an older version of GitLab, and the migration window still has time left, try to update GitLab to it's latest patch version. If possible and OK with the customer, update it towards the current version we have defined in the stable version track.

After a few days or weeks of validation, you can remove the temporary volume if that was attached:

# You may need to remove some old or imported backups for this to fit on the target disk.
sudo mv /var/opt/gitlab/backups/*.tar /var/opt/gitlab/current_backups
sudo umount /var/opt/gitlab/backups
sudo rm -r /var/opt/gitlab/backups/
sudo mv /var/opt/gitlab/current_backups /var/opt/gitlab/backups
sudo umount /mnt/tmp

sudo sync  # When this finishes it is safe to remove the volume in AWS.

Re-index OpenSearch¶

If the customer wants to use Advanced Search, you must manually trigger a indexing task to fill the new OpenSearch cluster with data.

For more details, please refer to the upstream documentation.