Autoscaling runners¶

Autoscaling runners create EC2 machines on-demand and thus will not incur costs when no jobs are run.

This type of runner is suitable for large-scale instances and is ideal for sharing between different projects, groups and or teams, because it can be configured to destroy the machine after each job, ensuring a clean environment and thus preventing information leakage.

Internally, we use the fleeting plugins for GitLab Runner. You can read more about this in the upstream documentation, both the Docker Autoscaler executor documentation and Fleeting project repository are of interest.

GitLab runner runs on its own EC2 node in AWS, which can be quite small, since it just handles the communication between GitLab and the actual fleet nodes that are running Docker.

Internally, a auto scaling group (ASG) is created in AWS, which manages the EC2 nodes automatically and transparantly. The gitlab-runner node will simply change the desired amount of nodes in the ASG and connect to it using AWS Instance Connect. We use owr own pre-prepared AMI that has the required software installed so the nodes can start processing jobs as soon as the machine comes online.

Required settings¶

The examples below are concise and get you started with the basic default values.

Terraform settings¶

To get started with autoscaling GitLab Runners, the following configuration is an example on how to start two fleets:

autoscaling_runner_hive_list = [
  {
    name                = "large"
    instance_type       = "t3a.small"  # Instance that runs the management daemon
    fleet_instance_type = "c5a.large"  # Instance that runs Docker and the CI jobs
  },
  {
    name                = "xlarge"
    instance_type       = "t3a.medium"
    fleet_instance_type = "c5a.xlarge"
    runner_block_cost   = 2
  }
]

When not set manually, Terraform picks a subnet based on the name variable of the runner. This ensures the machine is not re-created every time the node list changes. Because of this, it is recommended to not change the name of the runner node after deployment.

The auto scaling group by default is configured to spin up nodes in all private subnets, thus covering all availablity zones. If you need additional assurance that a autoscaling hive is present in multiple availability zones, create two (or more) of the same availabilty zones, and set different subnet_ids so they are spread over multiple zones.

Ansible settings¶

There is no special configuration, and you can follow the example in Shared Ansible settings.

Optional settings are also available, please refer to the following sections:

Optional auto-scaling specific Ansible settings¶

The following Ansible settings control parameters specific to autoscaling runners:

Name	Default	Description
`gitlab_runner_autoscaling_capacity_per_instance`	`1`	Amount of jobs to run on a node in parallel
`gitlab_runner_autoscaling_max_use_count`	`1`	Amount of jobs to run on a node before destroying it
`gitlab_runner_autoscaling_max_instances`	`10`	Maximum nodes active at once. Must be same as `concurrency` to be effective.
`gitlab_runner_autoscaling_policies`	See below	See Autoscaling policy settings

Auto-scaling policy settings¶

The auto-scaling service can be configured in such a way that the amount of active and idle machines depends on the time of day. This is most useful to prevent machines from standing idle during weekends.

This configuration is created based on the gitlab_runner_autoscaling_policies configuration item. You can set any value as defined in the Autoscaler Policy section documentation.

The default value is set as follows:

gitlab_runner_autoscaling_policies:
  default:
    idle_count: 0
    idle_time: "30m0s"

An extensive example can be found below. This also makes use of the inheritance functions in Ansible, and the support for dictionaries in the token configuration.

all:
  vars:
    gitlab_runner_autoscaling_policies:
      default:
        idle_count: 0
        idle_time: "1h0m0s"
      office_hours:
        idle_count: 6
        periods:
        - '0 8-19 * * MON-FRI'

  hosts:
    example-cluster-autoscaling-runner-small:
      gitlab_runner_token:
        token: 'global-test-token-please-ignore-not-valid'
        policies:
          default:
            idle_count: 1
          office_hours:
            idle_count: 10

The example above will result in the following configuration:

All runners in the cluster will keep 6 machine idle during office_hours, which is defined as Monday through Friday, between 08:00 and 19:00. There will be 0 idle runners outside office_hours.

For the 'runner-small' node, 1 machine will always be kept idle outside of office_hours. During office_hours, there will be 10 machines idle for processing CI jobs.

This configuration allows you to define the periods once, but specify the amounts of idle machines per runner. As an example, this allows for having a small amount of 'c5a.4xlarge' machines and a large amount of 'c5a.xlarge' machines idle during office hours.

Terraform optional node settings¶

In addition to the basic settings above, there are multiple values available to override on a per-runner basis. If you need these settings on all your runners, please refer to the Terraform optional default settings below.

# The object type definition of a autoscaling runner node
# Options prefixed with `fleet_` are for the nodes that execute jobs.
# Others are for the controller node that runs gitlab-runner and manages the auto scaling group.
autoscaling_runner_hive_list = list(object({
  name                = string  # Name of the instance, for example: 'autoscaling-small'
  instance_type       = string  # Instance type, for example: 't3a.small'
  fleet_instance_type = string  # Fleet instance type, for example: 'c5a.large'
  runner_block_cost   = optional(number, 1)  # Value of AWS tag 'runner-block-cost'. Applied to fleeting nodes.

  alternative_fleet_instance_types = optional(list(string), [])  # For example: ['c6i.large', 'c6a.large']

  ami_id        = optional(string)  # If you need a custom AWS AMI instead of the cluster default
  subnet_id     = optional(string)  # If you manually want to assign the subnet the runner lives in

  additional_tags            = optional(map(any), {})      # Any additional tags in AWS set on the EC2 and EBS objects
  additional_iam_policy_arns = optional(list(string), [])  # These ARNs are added to the IAM role the EC2 machine has

  disk_type                  = optional(string)  # If you need a different EBS root volume type
  disk_size                  = optional(string)  # If you need a different EBS root volume size
  disk_iops                  = optional(number)  # If you need more (or less) IOPS on the EBS root volume
  disk_delete_on_termination = optional(bool)    # If you need to keep the EBS volume after the instance is terminated
  disk_encrypt               = optional(bool)    # Can be used to deviate from cluster-wide encryption configuration
  disk_kms_key_arn           = optional(string)  # Can be used to deviate from cluster-wide encryption configuration

  fleet_ami_id       = optional(string)           # If you need a custom AWS AMI instead of the gitlabhost default
  fleet_cpu_is_arm64 = optional(bool)             # If the autodetection fails for some reason, you can manually set this.
  fleet_subnet_ids   = optional(list(string), []) # Overrides subnet allocation for the fleet nodes

  fleet_additional_tags            = optional(map(any), {})     # Adds tags to the ASG and all fleet nodes in it
  fleet_additional_iam_policy_arns = optional(list(string), []) # Add additional IAM policies to the fleet nodes

  # These disk settings are the same as above, but are for the fleet nodes instead.
  fleet_disk_type                  = optional(string)
  fleet_disk_size                  = optional(string)
  fleet_disk_iops                  = optional(number)
  fleet_disk_delete_on_termination = optional(bool)
  fleet_disk_encrypt               = optional(bool)
  fleet_disk_kms_key_arn           = optional(string)
  fleet_disk_device_name           = optional(string)  # Can be used to set the correct root device name for custom AMIs
}))

Terraform optional default settings¶

On the Terraform side of the configuration, there is support for a set of default values for each runner. When set, these values are used when the matching parameter is not defined in the autoscaling_runner_node_list mapping.

Configuring this is entirely optional, and it's possible to create runners without setting any of the following values:

Name	Default	Description
`autoscaling_runner_default_disk_type`	`gp3`	Change the default EBS volume type
`autoscaling_runner_default_disk_size`	`25`	Change the default EBS volume size
`autoscaling_runner_default_disk_iops`	`null`	Change the default EBS volume IOPS count
`autoscaling_runner_default_disk_delete_on_termination`	`true`	Turn this off to keep the EBS volume
`autoscaling_runner_default_disk_encrypt`	`null`	Used to deviate from cluster-wide defaults
`autoscaling_runner_default_disk_kms_key_arn`	`null`	Used to deviate from cluster-wide defaults
`autoscaling_runner_default_iam_instance_policy_arns`	`[]`	Will be added to the EC2 instance role

For the fleet_ options, there is a set of default parameters as well:

Name	Default	Description
`autoscaling_fleet_default_disk_type`	`gp3`	Change the default EBS volume type
`autoscaling_fleet_default_disk_size`	`25`	Change the default EBS volume size
`autoscaling_fleet_default_disk_iops`	`null`	Change the default EBS volume IOPS count
`autoscaling_fleet_default_disk_delete_on_termination`	`true`	Turn this off to keep the EBS volume
`autoscaling_fleet_default_disk_encrypt`	`null`	Used to deviate from cluster-wide defaults
`autoscaling_fleet_default_disk_kms_key_arn`	`null`	Used to deviate from cluster-wide defaults
`autoscaling_fleet_default_disk_device_name`	`/dev/sda1`	Confiugure this when your AMI requires it
`autoscaling_fleet_default_additional_tags`	`{}`	AWS tags for the ASG and all Fleet nodes
`autoscaling_fleet_default_iam_instance_policy_arns`	`[]`	Will be added to the EC2 instance role
`autoscaling_fleet_default_subnet_ids`	`null`	Override subnet allocation
`autoscaling_fleet_default_ami_id`	`null`	EC2 AMI id that is used on the fleet nodes

Additionally, for the auto-discovery of our default AMIs, you can override the AWS account that is searched for in case you are testing new versions of the AMI builder code:

Name	Default	Description
`autoscaling_fleet_ami_owners`	`["523002502769"]`	AWS accounts that are searched for the fleeting AMIs