Helpers and tools¶

Helper to access tokens¶

We provide a simple helper to create a personal access token for a user. This can be used as per the example below:

GITLAB_USERNAME="glh-admin" WRITE_TOKEN_TO="/tmp/out.txt" ansible-playbook -i inventory glh.environment_toolkit.tools.create_access_token

By default, the name of the token is randomly generated, the scope is set to api, and the token expires in 30 days. To override these defauls, the following optional environment variables are available:

GITLAB_TOKEN_NAME: Name of the access token
GITLAB_TOKEN_EXPIRES_DAYS: Days until the token expires, relative to now.
GITLAB_TOKEN_SCOPES: Comma seperated list of scopes for the token.
PRINT_TOKEN: Set this to yes to print the token plain-test in the Ansible playbook output.
WRITE_TOKEN_TO: The token will be written to a file on the localhost at this location when set.

Ansible callback plugin¶

We provide a custom callback plugin called jsonlines which writes task output to disk in jsonlines format.

By default the results are stored in reports/current-date-time/*.jsonl relative to the working directory. There is a combined all.jsonl which writes output as it happens, and a per-host output which filters on the destination host, for easy lookups.

The callback plugin needs to be enabled explicitly, which you can do in one of two ways:

Set the environment variable ANSIBLE_CALLBACKS_ENABLED="glh.environment_toolkit.jsonlines".
Or set callbacks_enabled = glh.environment_toolkit.jsonlines in ansible.cfg under [defaults].

Zero Downtime Update script¶

An improved version of GET's zero downtime update script has been added. This includes support for bastion- and pages-nodes and automatically handles (de)registration of nodes in load balancer target groups.

This script takes a long time, and updates nodes one-by-one but can do so with (near) zero downtime. Zero downtime is not yet guaranteed as it has not been tested yet.

To run the zero downtime update script, simply run the playbook:

$ ansible-playbook -i inventory glh.environment_toolkit.zero_downtime_update

Metrics Endpoint Authentication¶

For environments with external metrics access enabled, the metrics endpoint is protected by an authentication token. This token is automatically generated and stored securely in AWS SSM Parameter Store.

Retrieving the Metrics Auth Token¶

The authentication token can be retrieved using the AWS CLI with the following command:

# Replace PREFIX with your environment's prefix (e.g., customer-name)
aws ssm get-parameter --name "/PREFIX/metrics/auth_token" --with-decryption --query Parameter.Value --output text

Using the Token¶

When accessing the metrics endpoint, include the token in the X-Auth-Token header:

curl -H "X-Auth-Token: YOUR_AUTH_TOKEN" https://YOUR_DOMAIN/metrics

Graphviz Architecture diagrams¶

We support rendering solution-aware Graphviz diagrams from within Terraform. To enable this, set render_diagrams = true in your solution in the module "gitlab_cluster" block.

Afterwards, Graphviz code files will be rendered into the terraform/diagrams/ folder. To create viewable png image files, run diagrams/render.sh. Make sure you have installed the graphviz package using your OS package manager first.

Enabling S3 bucket mirroring afterwards¶

Because S3 replication only replicates new objects and changes, you'll need to perform some additional steps when enabling this feature with pre-existing data.

First, a re-iteration of enabling the mirroring feature:

provider "aws" {
  alias  = "mirror"
  region = "eu-central-1"

  default_tags {
    tags = {
      "map-migrated" : "d-server-00yt69qfuyay2k"
    }
  }
}

For clarity, it's recommended you put all the backup/mirroring code in backups.tf:

locals {
  replication_map = {
    for source in module.gitlab_cluster.s3_bucket_names : source => "${source}-replica"
  }
}

data "aws_kms_key" "aws_s3_replication" {
  provider = aws.mirror
  key_id   = "alias/aws/s3"
}

module "s3_bucket" {
  source   = "git.glhd.nl/glh/bucket/aws"
  version  = "~> 1.0"

  providers = {
    aws = aws.mirror
  }

  for_each = local.replication_map
  name     = each.value

  delete_noncurrent_versions_after_days = local.retention_in_days
  sse_kms_key_arn                       = data.aws_kms_key.aws_s3_replication.arn
}

data "aws_kms_key" "aws_backup_mirror" {
  provider = aws.mirror
  key_id   = "alias/aws/backup"
}

resource "aws_backup_vault" "mirror_backup_vault" {
  provider    = aws.mirror
  name        = "${var.prefix}-mirror-backup-vault"
  kms_key_arn = data.aws_kms_key.aws_backup_mirror.arn
}

Then, we have a extra file with some migrational code:

resource "aws_iam_policy" "batch_replication" {
  name = "${var.prefix}-batch-replication-policy"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "s3:InitiateReplication"
        ],
        Resource = flatten([
          for source, destination in local.replication_map : [
            "arn:aws:s3:::${source}/*",
          ]
        ])
      },
      {
        Effect = "Allow",
        Action = [
          "s3:GetReplicationConfiguration",
          "s3:PutInventoryConfiguration"
        ],
        Resource = flatten([
          for source, destination in local.replication_map : [
            "arn:aws:s3:::${source}",
          ]
        ])
      }
    ]
  })
}

resource "aws_iam_role" "batch_replication" {
  name = "${var.prefix}-batch-replication-role"
  assume_role_policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "batchoperations.s3.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "batch_replication" {
  policy_arn = aws_iam_policy.batch_replication.arn
  role       = aws_iam_role.batch_replication.name
}

resource "local_file" "batch_replication" {
  filename        = "${path.root}/batch_replication.sh"
  file_permission = "0755"
  content = <<EOT
#!/bin/bash

echo "You should wait for about 15 minutes after configuring initial replication"
read -p "Press enter to continue"

account="$(aws sts get-caller-identity --query 'Account' --output text)"

%{for source, destination in local.replication_map}
aws s3control create-job --no-confirmation-required  --priority 10 --role-arn "${aws_iam_role.batch_replication.arn}" --operation '{ "S3ReplicateObject": {} }' --manifest-generator '{ "S3JobManifestGenerator": { "SourceBucket": "arn:aws:s3:::${source}", "Filter": { "EligibleForReplication": true, "ObjectReplicationStatuses": ["FAILED", "NONE"] } , "EnableManifestOutput": false } }' --account-id "$account" --report '{ "Enabled": false }' --query 'JobId' --output text

%{endfor}

EOT
}

Last: the configuration changes required in environment.tf

locals {
  retention_in_days = 31
}

module "gitlab_cluster" {
  # Backup and s3 replication configuration
  noncurrent_version_retention_period = local.retention_in_days

  backup_mirror_vault_arn = aws_backup_vault.mirror_backup_vault.arn
  backup_retention_period = local.retention_in_days

  s3_replication_kms_key = data.aws_kms_key.aws_s3_replication.arn
  s3_replication_map     = local.replication_map
}

You should wait for about 15 minutes after setting up replication to make sure new objects are being replicated. Then, just run the generated script locally like this: AWS_PAGER="" aws-sso bash batch_replication.sh.

The replica jobs are performed server-side by AWS S3, so you don't need to keep a shell open. You can view the jobs and their progress via the AWS Dashboard, under the S3 service, there's a Batch Operations page with all the replica jobs.