Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
3.6.0 - Unreleased¶
Added¶
- Added advanced search indexing alert.
Changed¶
cloudwatch-alarm-metricsnow runs as it's own user.- Removed sidekiq from prometheus
OutOfMemoryalerts. - Changed day for full backup to every thursday.
- Moved all gpg keys to S3.
Fixed¶
- Grafana endpoints are now configured on the correct Global Accelerator.
- Fixed copying of GitLab registry key on fresh solutions.
3.5.2 - 2026-03-13¶
Fixed¶
- Fixed GitLab backup script after GitLab version update.
3.5.1 - 2026-03-12¶
Fixed¶
- Replaced GitLab runner apt key.
3.5.0 - 2026-03-12¶
Added¶
- Added troubleshooting steps and workarounds for VPN-related health check timeouts during provisioning.
- Added auto_start_stop function to development template environment for cost saving.
- Added a lambda that applies ElastiCache self-service updates and notifies through Zulip via SNS.
- Implemented batched CloudWatch alarm updates in autostartstop lambda to prevent API throttling.
- Added ssm parameter (lock_while_running) logic to prevent autostartstop lambda from executing while parameter is true.
- Scaleway is included in the environment-template.
- Cronjob to clear docker cache on dedicated runners.
- Web Application Firewall support for Scaleway.
- Container Registry metadata database support with
container_registry_metadata_database_enable(default: false). - Ansible variable
container_registry_maintenance_enableto configure registry maintenance. - Endpoints to centralize Grafana data.
Changed¶
- Updated onboarding documentation with explicit 1Password (
op signin) steps, YubiKey prerequisites, andgitlab_versionconfiguration. - Replaced static sidekiq queue check with percentage based one.
- Changed alertmanager logic to accommodate start/stop logic for alerts.
- Added variable to costsaving and changed auto_start_stop lambda enable alerts logic to fire 15 minutes after ec2's are started.
- Upgraded Loki to version 3.6.3
- Automated deployments now use administrator permissions to prevent permission issues.
- Scaleway Terraform module is now published on the registry.
- Increase memory limits for sidekiq machines with 1:4 CPU/RAM ratio.
- Changes gitaly managed backups of git repositories to incremental backups with full backups on sundays.
Fixed¶
- GitLab Registry key is now copied to Sidekiq nodes, allowing cleanup policies to work.
Removed¶
- Removed all the AWS bastion code.
- Added GEO blocking to WAF with
waf_geo_blocking_enabled(default: true) andwaf_geo_blocked_countriesvariables.
Upgrade instructions¶
Centralized Grafana¶
We have introduced a new centralized Grafana environment. Make sure your solution is included in scaleway-infrastructure.
GEO Blocking¶
GEO blocking is now enabled by default in the WAF. If your solution already has custom WAF blocking implemented via waf_blocking.tf or similar:
- Remove redundant GEO blocking resources from your solution's terraform code:
- Remove the
aws_wafv2_rule_groupresource that contains thegeo_match_statement - Remove the associated
localsblock withblocked_countriesif it exists -
Keep any IP-based blocking resources (
aws_wafv2_ip_set, IP blocking rules) if needed -
Update the rule group reference if you were using
waf_custom_rule_group_pre_arnorwaf_custom_rule_group_post_arn: - If your custom rule group only contained GEO blocking, remove the variable assignment entirely
-
If it contained both GEO and IP blocking, update the rule group to only contain IP blocking
-
To customise the blocked countries list, set
waf_geo_blocked_countriesin your solution's tfvars:waf_geo_blocked_countries = ["IR", "KP", "RU"] # Your custom list -
To disable GEO blocking entirely, set:
waf_geo_blocking_enabled = false -
Run
terraform planto verify the changes before applying.
3.4.2 - 2026-03-03¶
Added¶
- Added optional CloudWatch memory metrics collection for autoscaling runner fleet instances via
autoscaling_fleet_enable_cloudwatch_memory_metricsvariable.
Changed¶
- Updated apt signing key for GitLab repository.
3.4.1 - 2026-02-10¶
Changed¶
- Enabled autoscaling runners over internal networks on Scaleway for when fleeting plugin is fixed.
Fixed¶
- Fixed an issue where NAT instances would be recreated after a second
terraform apply.
Upgrade instructions¶
- Increase retention period in projects from 30/31 to 35 in environment.tf.
3.4.0 - 2026-02-02¶
Added¶
- Scaleway is now a supported provider.
- Added a cloudwatch alert when average exceeds 80% of max connections for RDS.
Changed¶
- Renamed "monitor" to "prometheus" to prevent confusion.
Upgrade instructions¶
- The rename from monitor to prometheus causes a lot of resources to be renewed, this does not cause any downtime.
- Manually remove the "TCP:9009" listener from the internal load balancer, Terraform can't fix this.
- Add
opensearch_passwordtosensitive_vars.ymlin your solution.
3.3.3 - 2026-01-15¶
Added¶
- Added a variable to allow EIP creation for the NAT instances, without creating the NAT instances themselves.
Fixed¶
- Fixed the determination logic for which subnet_ids the autoscaling runner should use.
Removed¶
- Removed the object lock terraform code for the backups. We cannot use this due to Gitlab sending an sha1 hash instead of md5.
3.3.2 - 2026-01-08¶
Fixed¶
- Fixed object storage lock for buckets with customized prefix.
3.3.1 - 2025-12-01¶
Fixed¶
- NAT user data script is now using dynamic interface name detection for ip tables rules instead a static value(eth0).
Changed¶
- Deprecated attribute warning regarding region.name has been resolved.
3.3.0 - 2025-11-28¶
Added¶
- Incremental Logging for job logs is now enabled if previously disabled.
- Add pages subdomain to internal ACM certificate.
- Network isolation support for GitLab runners with NAT instances as cost-effective alternative to NAT gateways.
- Comprehensive monitoring and alerting for NAT instances including CloudWatch alarms and Grafana dashboards.
- Lambda-based route switching between NAT instances and NAT gateways with automated health monitoring.
- Enhanced instance module with native support for NAT-specific configurations (
associate_public_ip_addressandsource_dest_check). - Updated auto-scaling-group module with improved mixed instance policy support and instance requirements compatibility.
- Enhanced bucket module with additional configuration options and improved lifecycle management.
Upgrade instructions¶
- Make sure to upgrade all terraform modules. These changes are highly dependent on the latest terraform modules
- Be sure to run the ansible playbook to update grafana and upload the NAT dashboard
- The Lambda invoke commands from the docs, rely on having the latest version of the RECT tool because the older version didnt allow for JSON payloads being sent.
- Move ALB from public to private subnet.
Changed¶
- Update aws-fleeting-plugin from 1.0.0 to 1.0.1.
- Move ALB from public to private subnet.
- Upgrade AWS terraform provider from 5.33 to 6.20.
Fixed¶
- Fixed issue with creating cloudwatch alerts after updating instance count.
Upgrade instructions¶
- Replacing the ALB causes minimal downtime, but open connections are reset during the change.
- Update the required providers in your solution's main.yaml to the latest versions.
3.2.2 - 2025-09-26¶
Added¶
- Added variables to customize Cost Anomaly notification thresholds.
Changed¶
- Changed RDS and OpenSearch volume types to decrease costs.
Fixed¶
- The Mimir
alertmanager_urlwas not set correctly in the Mimir config causing missing notifications when alerts were triggered. - Replaced Grafana APT repository key with new one.
Upgrade instructions¶
- Changing volume types takes makes the instances unavailable while the volumes are recreated, this can take a long time. You can (temporarily) disable this change by switching the volume types back to io1 in your solution.
- In order to push the fixed
alertmanager_url, run themonitorplaybook.(aws-sso ansible-playbook -i inventory glh.environment_toolkit.monitor)
3.2.1 - 2025-06-26¶
Added¶
- Made Prometheus metrics endpoint publicly available with secure token-based authentication.
- Added cost allocation tags for resources that we bill for when usage is above contractually agreed limits.
Changed¶
- The instance module(since v1.0.3) now supports adding cost allocation tags to the EBS volumes of EC2 instances.
Fixed¶
- Update VPC CIDR block to fix Client IP in GitLab.
- Update backup retention to sensible defaults.
Upgrade instructions¶
- Check if backup retention is set in your solution. See
environment-template/terraform/backups.tffor an example.
3.2.0 - 2025-04-29¶
Added¶
- Support for SSH over SSM. This allows native SSH with pipelining but does not require bastion hosts.
- Documentation on dealing with GitLab database preseeding.
waf_custom_rule_group_pre_arnTerraform variable to inject a Solution-managed WAF rule group at the start of the WAF ruleset.waf_custom_rule_group_post_arndoes the same but appends to the end of the ruleset instead.- Added support to assign static IPs to Network Load Balancer. Set
network_load_balancer_static_ipstotrueto enable.
Changed¶
- Moved all S3 bucket code to our
bucketmodule. object_storage_force_destroyis nowfalseby default.- Enabled SSH-over-SSM for new projects by default.
Removed¶
- Migration path for 3.0.0 to 3.1.0 upgrade of Monitor nodes
- Removed Terraform var
network_load_balancer_subnet_ids, replace withnetwork_load_balancer_subnet_mapping.
Upgrade instructions¶
- Manually empty out the
${prefix}-monitorS3 bucket. Terraform will not be able to remote it otherwise. - If you use S3 replication, don't forget to also remove the replica bucket.
- If you override
object_storage_buckets, you must manually createmovedblocks to move them to the new locations. - Use the examples in
s3_move.tfas a guide. In a solution, you must prepend all IDs withmodule.gitlab_cluster.. - Replace
network_load_balancer_subnet_idswithnetwork_load_balancer_subnet_mappingin your terraform files.
3.1.3 - 2025-03-11¶
Changed¶
- Bumped common-config dependency to
1.0.6
3.1.2 - 2025-02-25¶
Added¶
- Support for customizing the Gitaly S3 upload part size. Defaults to 25 MiB (upstream default is 5 MiB).
Changed¶
- Allow the
autoscaling_runneranddedicated_runnertypes to receive patch updates for theinstancemodule. - Set the cloudwatch module to be auto-updated as well.
3.1.1 - 2025-02-10¶
Removed¶
- Observability related code on port 50000. Was all moved to federated Grafana infrastructure in a previous release.
3.1.0 - 2025-02-10¶
Added¶
- Enabled EBS encryption by default and set the most logical key as the default to use.
- IAM password policy configuration. Configurable via
iam_password_expiry_days. Defaults to90, disable with0. glh.environment_toolkit.health_checkplaybook to run GitLab health checks.- Support for multiple Loki nodes in automatic HA mode via Gossip/Memberlist protocol.
- Support for Prometheus HA deployment using Grafana Mimir.
- Monitor nodes now run Grafana Mimir to provide long-term Prometheus storage in S3.
- The Thanos stack is turned off automatically (will be removed next version).
- Prometheus is no longer running inside Docker.
- Grafana role has been updated to reflect this as well, and will install additional dashboards.
- Alertmanager is also replaced by Mimir.
- In HA mode (automatically conifgured), etcd will be installed to provide leader election for Mimir.
- Enabled RDS auto minor version upgrade by default.
- Support for new (17.8+) version of Gitaly storage config on Rails nodes.
Changed¶
- Loki will store data in the new v13 format, starting at 2025-04-01.
- Updated Ansible dependency common config to 1.0.5.
- CI secure files now uses consolidated object storage configuration. This was not supported before GitLab 17.0.
Fixed¶
- Runner EC2 instances now respect the default KMS key variables when set.
Removed¶
- Removed
rds_postgres_backup_retention_period, replace withbackup_retention_periodin your solution.
Upgrade instructions¶
- Your runner instances will need to be re-created when they are currently not using a KMS key. This will remove all your data. If you want to keep the data and prevent Ansible re-provisioning, either run the helper script in helpers/encrypt_existing_ebs_volumes.sh or manually perform these steps:
- Turn off the EC2 instance
- Create a encrypted snapshot of the current volume
- Create a new encrypted EBS volume with the correct key, based off the snapshot
- Attach this new EBS volume to the EC2 instance and start it.
- Remove the old volume and the snapshot.
- If you are specifying a KMS key to use as default encryption (either EBS or globally), you must ensure that you
grant the auto-scaling service role permissions to use this key if you have autoscaling runners.
Alternatively, simply retrieve the ARN for the
alias/aws/ebskey and set that using either variableautoscaling_fleet_default_disk_kms_key_arnor by usingfleet_disk_kms_key_arn, which can be specificed per runner. - If you are upgrading from 3.0.x, and 2025-04-01 has passed already, set Ansible variable
loki_v13_start_dateto a ISO formatted date that is a few days in the future. This marks the date Loki starts to use the new v13 data format. - If you have
grafana_install_omnibus_dashboardsset, it's recommended to use the valus ingrafana_install_external_dashboardsinstead. This will future-proof your selection for new external dashboards. - Thanos and the current monitoring history will be retained (until GET 3.2), but will be inaccessible by default.
To access the old history, run
docker compose up -din/rootasroot. Manually query against it using Grafana. The address is the internal IP of the monitor node, it listens on port 19090. - Remove
rds_postgres_versionfromenvironment.tf. - Replace
rds_postgres_backup_retention_periodwithbackup_retention_period.
3.0.6 - 2025-02-03¶
Added¶
primary_alb_arnoutput variable for usage in solution code.
Fixed¶
- Issues with Gitaly server side backups not being configured properly
3.0.5 - 2025-02-03¶
Added¶
gitlab_rails_exceptionsmetric in Promtail to keep track of exception log.- Force-prefixed by Promtail with
promtail_custom_in Prometheus output. - Alerts when the rate on
promtail_custom_gitlab_rails_exceptionsover 1 minute is more than 0.5 for 5 minutes.
Fixed¶
docker_proxy_hostnot working for Kroki and Registry mirror roles.- GitLab Omnibus installation timeouts, resolved by adding higher
lock_timeoutvalues. - Zero downtime updates break primary Rails node because of DB changes after schema cache is filled.
Changed¶
- Increased installation timeout for GitLab Runner as well, as a preventative measure.
- Increased grpc limits for Loki because it was too low to retrieve large logfiles.
3.0.4 - 2025-01-16¶
Added¶
- New variable on Autoscaling Runner definition:
runner_block_cost. Defaults to1.
Fixed¶
- Duplicate zero downtime detection login in CI causing invalid playbook names.
- Add more OIDC permissions for Global Accelerator
3.0.3 - 2024-12-19¶
Added¶
- Outputs for OIDC role ARNs so solutions can add custom permissions.
Changed¶
- CI codebase adapted to automatically switch to SSM usage and provide more sane deployment defaults.
Fixed¶
gitlab_object_storage_prefixnot used bygitlab_runnerrole.
3.0.2 - 2024-12-12¶
Fixed¶
docker_proxy_hostvalue was not propagated properly when defined.
Changed¶
- Updated Ansible dependency common config to 1.0.2
3.0.1 - 2024-12-03¶
Added¶
- CI step to deploy the terraform module.
Fixed¶
- Bastion SSH config template breaks when localhost is in hostvars.
- Render SSH config playbook breaks when localhost is in hostvars.
- Fixed the "create_admin_user" ansible tool.
- Callback plugin generates errors when working on an included playbook.
3.0.0 - 2024-12-03¶
Added¶
- Documentation about backup and restore procedures.
- Full documentaion rework, with
mkdocsas the builder. Also deploys versioned releases to a separate repository. - Added shared GitLab CI templates.
- Default configuration for AWS Backup mirroring to separate region.
- Basic code to render Graphviz diagrams based inside Terraform, outputted to the solution folder.
- Cost anomaly detection.
- OpenSearch update notifications through SNS.
- Setting to use S3 replication for backups.
- Dependency on
glh.common_configAnsible Collection. - A new set of playbooks was introduced to perform various routine update tasks:
all, still deploys the entire codebase with downtimegitlab_update, only updates Omnibus and Runner versions, with downtimezero_downtime_all, runs the entire codebase, per node with loadbalancer (de)registration.zero_downtime_gitlab_update, only updates Omnibus and Runner, per node with loadbalancer (de)registration.- The rate of zero downtime updates can be controlled with the
serialvariable.
Changed¶
- Pages enablement is now controlled with Terraform variable
pages_enabled. Enabled by default. - Pages services and daemons are now started on the GitLab Rails nodes, separate EC2 Pages nodes is no longer supported.
- Project now uses
pyproject.tomlinstead ofsetup.pyandrequirements.txt. - Ansible namespace has been changed to
glh.environment_toolkit. - The
common,common_vars,pre_common,post_configureroles were all removed in favor of local defaults. - The
zero_downtimerole was removed in favor of the newzero_downtime_allandzero_downtime_gitlab_updateplays. - The following roles were moved to a new
omnibusrole that encompasses everything related togitlab-omnibus: - praefect, gitaly, gitlab_rails, sidekiq
- Extending the rails config must now be done by adding paths on localhost to:
common_custom_config_filegitlab_rails_common_custom_config_file, which is used ongitlab_railsandsidekiqnodes.[gitlab_node_type]_custom_config_file, such assidekiq_custom_config_filepraefect_primary_custom_config_filegitlab_rails_custom_config_file
- Adding per-solution Ansible code is now done using:
pre_rolesandpost_rolesarray, which include a Ansible FQDN role pre and post GET respectively.pre_tasks_common,pre_tasks_[group_name],post_tasks_commonandpost_tasks_[group_name], which can be used to include custom Ansible task files. These are not playbooks, they useinclude_tasks.
Moved¶
- Compacted instance IAM policy ARN's into single resource.
- The following Ansible code was moved to
glh.common_config: - openssh-server and Unix admin user installation
- Debian tweaks and Ansible requirements installation
- Dependencies for Ansible versions via
pip - Installation of AWS tooling such as
aws-ssm-agent - Basic system initialization such as ntp, unattended upgrade, IPv6 disablement, cloud-init config
- The following components were moved to
glh.get_extensionsbecause they cluttered the core codebase: rsyslogforwarder supportamazon-cloudwatch-agentsupport- The following Terraform (sub)modules were moved into their own repositories:
- cloudwatch_alarms
- instance
- auto_scaling_group
- security_group
- vpc_peering
- bucket (new)
Removed¶
- Cleanup paths from 2.1.0 for resolvconf and sshd-git.
- Disabled S3 malware protection due to insane pricing.
- Dependency on
gitlab.gitlab_environment_toolkit. - Removed Terraform support for:
- Consul
- ElastiCache separate databases
- Redis EC2 instances
- NFS instances
- Haproxy instances
- Existing network infrastructure
- OpenSearch EC2 instances
- PgBouncer EC2 instances
- Postgres EC2 instances
- Praefect Postgres EC2 instances
- RDS Praefect separate database
- GEO support
- S3 bucket replication
- Seperate EC2 nodes for GitLab Pages
- Removed Ansible support for:
- All of the Terraform objects above
- Single-rails-node deployment
- non-Gitaly cluster solutions
- GCP, Azure, on-premise deployments
- Inclusion of
*.rbconfig on Omnibus nodes.
Upgrade instructions¶
Update the reference to the included GitLab CI file in your .gitlab-ci.yml to point to this repository.
For an example, please see environment-template/.gitlab-ci.yml.jinja.
Enable backup mirroring in your solution. See the environment-template for an example.
Change any references from gitlabhost.gitlab_environment_toolkit to glh.environment_toolkit, most notably in
ansible/requirements.txt. The repositoru path has not changed. Remove all geerlingguy roles from your
ansible/galaxy-requirements.yml as well, these are no longer required.
Make sure you run pip install with the -U flag, to ensure dependencies only pinned on major/minor versions are
updated properly as well.
If you want to upgrade with less downtime, run Ansible first and Terraform second. Do not change the GitLab version in this process. This is only required if you are running GitLab Pages.
2.1.8 - 2024-11-19¶
Added¶
- Setting to use S3 replication for backups.
Upgrade instructions¶
Enable backup mirroring in your solution. See the environment-template for an example.
2.1.7 - 2024-11-12¶
Added¶
- VPC endpoints for Secrets Manager, SNS and SSM.
Changed¶
- When Global Accelerator is enabled, ports 80 and 443 forward traffic directly to the Application Load Balancer.
Fixed¶
- Made the CloudWatch alert for autoscaling groups less trigger-happy.
2.1.6 - 2024-10-22¶
Added¶
- Added missing IAM permission.
2.1.5 - 2024-10-22¶
Changed¶
- Disabled continuous backups by default due to insane cost increase.
2.1.4 - 2024-10-09¶
Fixed¶
- Workaround for issues with Gitaly server side backup due to
aws_s3_endpointnot defaulting to a sane value.
2.1.3 - 2024-09-26¶
Changed¶
- Remove
dennisusers on all nodes.
2.1.2 - 2024-09-23¶
Added¶
- Added feature flag to disable GuardDuty features.
2.1.1 - 2024-08-29¶
Added¶
- Allow updating only
/etc/gitlab/trusted-certsby using--tags update-trusted-certs.
Fixed¶
- The test for determining if a runner fleet image should be ARM or x86 was not strict enough, causing wrong output.
- Purge
resolvconfas well as uninstalling it to remove all leftovers. - Stop and disable
systemd-resolvedsince it makes our networking stack non-deterministic. - Restart our networking interfaces if
resolvconforsystemd-resolvedstates change to generate/etc/resolv.conf. - AWS Guardduty scanning on S3 buckets was not able to be deployed when the
aws/s3default KMS key is used in S3.
2.1.0 - 2024-08-27¶
Added¶
- GitLab Omnibus config override stanza for Praefect nodes.
- Enable 'debug_addr' for Docker Registry on Rails nodes so we can scrape the Prometheus metrics from it.
- AWS GuardDuty is enabled to provide malware protection.
- Ansible Callback plugin to write playbook output to jsonlines formatted files.
- Added CloudWatch alerts for:
- GuardDuty findings
- Unhealthy nodes in target groups
- Anomalous nodes in target groups
- RDS + read replica
- ACM certificate expiration
- ElastiCache Redis cluster
- Opensearch cluster
- EC2 instances
- EC2 autoscaling groups
- Custom GitLabHost MOTD.
- Patches to ensure MOTD is printed on starting a AWS-SSM shell.
- Solutions now set up DNSSEC signing for the
glhc.nldomain. Seegitlab_dns_zone_ds_recordfor the DS value to use. - New Grafana Dashboard with basic metrics about GitLab Runner and Fleeting plugin.
- Support for specifing a custom KMS key ARN for usage with AWS Backup via
backup_kms_key_arn. - Support for mirroring AWS Backups to a user-defined additional AWS Backup Vault via
backup_mirror_vault_arn. - Allow for overriding the AWS Backup cron schedule via
backup_cron_schedule. - We now run
RepositoryArchiveCleanUpServiceon each Rails node every night to clean up temporary files. - Support for using Graviton/ARM64 types in autoscaling runner fleets. The orchestrator nodes still need to be x86.
- Scoped final security group to a Load Balancer instead of entire VPC range.
- New Grafana dashboard with Prometheus scraping status to replace
/prometheus/maintenance endpoint. - Added AlertManager to Grafana via provisioning so alerts can be shown and silences via Grafana.
- Output variable that generates copy-pastable to put into
aws-infrastructurerepository. - Exposing a new maintenance tunnel NLB to allow access to Grafana via
aws-infrastructure. - Prometheus exporter that fetches all CloudWatch Alarms with status, and AlertManager rules to trigger after 12h.
Changed¶
- Tasks that installed sshd-git on gitlab-rails nodes were changed to clean up the leftovers instead.
- We no longer test if
sjoerdorarnoudare removed, they should have no leftover data anywhere by now. - GitLab Omnibus is no longer installed on Bastion nodes and existing installs will be removed automatically.
- The security group outputs for
autoscaling_fleet_node,autoscaling_runner,dedicated_runnernow end in_id. - Split AWS permissions into separate files.
- We now use Copier for creating environment boilerplate.
- The
environment-templatehas been inlined into this main repository. - Update AWS Fleeting plugin for autoscaling runners to version
1.0.0. - Non-working panels removed from Grafana 'Server Performance' dashboard (NFS, HAProxy, PGBouncer).
- All monitoring data in Prometheus is now labeled by
gitlab_node_type. - Prometheus CPU load alert no longer triggers on runners, these have a new alert that triggers at 95% load over 30m.
gitlab_shell_ssh_portis now forced to be port22and no longer needs to be explicitly set in a solution.- The default AMI name filters for autoscaling fleet nodes have been changed in accordance with our new naming scheme.
- Updated RDS CA Certificate to the new AWS default.
- Grafana now accepts auto-login via
auth.proxyoptions. This only works in combination withaws-infrastructure.
Fixed¶
- Global Accelerator health checks now succeed.
- You can provision 1 opensearch node by setting
opensearch_service_multi_azto false andopensearch_service_node_countto 1. - Added CloudWatch alerts to cost saving and fixed IAM permissions.
- Fixed configuring host name after initial provisioning.
Removed¶
- Removed
preparing_your_aws_account.md, no longer makes sense since we migrated to AWS SSO. - Maintenance script on rails node now runs in a systemd timer and uses a
.dstyle layout to allow for overrides. - Prometheus Node Exporter is now disabled in Omnibus config and is replaced with the Debian version instead.
- Removed the
tools.nfs_cleanupplaybook: we don't support NFS anymore. - Removed the
tools.migrate_to_gitlab_sshdplaybook. - Removed all GitLabHost code that supports NFS nodes where possible. Overrides to disable upstream NFS remain.
- Removed bastion related listeners, targets groups and security groups if no bastion nodes are created.
- Removed resolvconf package.
Fixed¶
- SSH host keys from gitlab-sshd were not included in AWS Backups.
Upgrade instructions¶
- If you haven't run the
tools.nfs_cleanupplaybook during the previous update, do that before upgrading. - You can remove
gitlab_shell_ssh_portfrominventory/vars.ymlin each solution. - It's recommended to add the following to your solution's
environment.tf:output "infrastructure_config" { value = module.gitlab_cluster.infrastructure_config }
2.0.7 - 2024-07-18¶
Fixed¶
- Using
aws_s3_endpointdoesn't actually work because of bugs in upstream.
Changed¶
- Autoscaling runners now use a static SSH keypair to reduce CPU load on the orchestration nodes.
- This also allows for manual SSH connections to a fleeting node which is useful for debugging.
gitlab-runner.servicenow also stops gracefully on autoscaling nodes (previously was only on restart).- Revert/lower the fleeting checking interval since upping the value didn't solve anything.
2.0.6 - 2024-07-16¶
Fixed¶
- Traffic to S3 was not properly routed to the S3 VPC Endpoint when initiated from one of the private subnets.
2.0.5 - 2024-07-16¶
Fixed¶
- Our
pre_commonrole now depends on thecommon_varsrole, since it uses variables defined there. - Fixed variable resolving in zero downtime update.
2.0.4 - 2024-07-11¶
Added¶
- Enabled autoscaling group metrics.
- Add OIDC permission to enable metrics collection on auto scaling groups.
- Added keepalive configuration for connection between autoscaling runner and fleet nodes.
Changed¶
- Updated SSL policy on load balancers and made it configurable.
2.0.3 - 2024-06-27¶
Fixed¶
- Remove deprecated Sidekiq concurrency parameter forcefully to assist with simultaneous upgrade of GET 2.0 and GL 17.0.
2.0.2 - 2024-06-27¶
Changed¶
- Cherry pick fixes from 1.6.7 release into 2.0.2.
Fixed¶
- New-style secrets handling was not properly included, breaking gitlab-secrets.json on all nodes.
2.0.1 - 2024-06-24¶
Changed¶
- Cherry pick fixes from 1.6.6 release into 2.0.1.
2.0.0 - 2024-06-21¶
Added¶
- The
proxy_downloadfeature of GitLab object storage is now controllable viagitlab_object_storage_proxy_download. - Cleanup policy to SSM s3 bucket, deletes all files after 1 day.
- Logging of all requests blocked by AWS WAF to a CloudWatch log stream. Retention can be set via:
waf_log_retention. - Allow assuming of OIDC role from within the account itself by enabling
gitlab_oidc_debugging_enabled. - Terraform modules are published on GitLab.
- Bastion nodes now have a simplistic HTTP daemon for usage with NLB health checks that tests if OpenSSH is running.
glh-postgres-gitlabandglh-postgres-praefectCLI commands added to rails nodes for easy DB access.
Changed¶
- Modified grow_filesystems.yml - Added growpart for /dev/xvda to handle resizing of physical disk
- Upstream version of GET has been updated to 3.3.0
- Various variables have been moved from
commontocommon_varswhen they were used in multiple locations. - Some variables have been moved from
commonto their specific role when they were only used there. - The
sharedrole has been split up intogitlab_runner_linuxandgrafana_apt_repo. - Variables related to GitLab runners have been moved from
commontogitlab_runner_linux. - Dependency on
geerlingguy.dockerhas been removed and Docker is now installed by our owndockerrole. - Variable
glh_docker_repo_urlwas renamed todocker_repo_urland moved to the newdockerrole. - Variable
docker_repo_hostnow controls the protocol and host to use,docker_repo_urladds the path as well. - Variable
s3_endpointwas renamed toaws_s3_endpoint. - Variable
sns_endpointwas renamed toaws_sns_endpoint. - Disabled IPv6 by default in user_data for new instances.
- Removed default Prometheus federative servers. Can be enabled at solution level.
- gitlab-sshd is now used by default to provide Git operations over SSH.
Fixed¶
- AWS Permission for
ssm:StartSession. - Allow creation of a singular OpenSearch node by setting
opensearch_service_multi_azto false. - GitLab's GPG key is now also installed from Apt proxy when enabled
Upgrade instructions¶
- Upgrade all Terraform AWS provider version constraints in your solution to
version = "~> 5.33". - Remove
geerlingguy.dockerfromgalaxy-requirements.yml. - Upgrade
geerlingguy.node_exporterto2.0.1ingalaxy-requirements.ymlfile. - Ensure you install all dependencies afterwards (
terraform init -upgrade,pipandansible-galaxy) -
Rename all options called
opensearchtoopensearch_serviceinenvironment.tf. -
For the forced migration to gitlab-sshd, the Git over SSH services will be unavailable during the deployment.
- Ensure
gitlab_shell_ssh_portis not set in yourenvironment.tf. - Before running any other Ansible playbooks, run once:
tools.migrate_to_gitlab_sshd. This is non-destructive. - If you want to upgrade with less downtime, set
gitlab_shell_ssh_port = 22inenvironment.tf.- Run Terraform apply as normal after following the rest of the upgrade instructions.
- Remove this option after running Ansible completely and run Terraform again.
-
OpenSSH services will remain on some nodes for now (even when unused), and will be removed in
2.1.0. -
Rename the following variables in your Ansible Inventory vars.yml:
external_pages_urlis nowpages_external_urlgitlab_pages_ssl_cert_fileis nowpages_ssl_cert_filegitlab_pages_ssl_key_fileis nowpages_ssl_key_filegitlab_pages_custom_config_fileis nowpages_custom_config_file-
gitlab_pages_custom_files_pathsis nowpages_custom_files_paths -
You should run
gitlabhost.gitlab_environment_toolkit.tools.nfs_cleanupplaybook once to clean up NFS leftovers. -
If you override
glh_docker_repo_url, rename it todocker_repo_hostand remove the path (/linux) from the URL. -
Replace the
sourceinenvironment.tfwith one of the following options:
# For production solutions
source = "git.glhd.nl/glh/gitlab-environment-toolkit/aws"
version = "2.0.0"
# For development solutions
source = "git::git@git.glhd.nl:glh/ha/gitlab-environment-toolkit.git//terraform/aws?ref=main"
1.6.7 - 2024-06-27¶
Added¶
- Added additional parameters for fleeting configuration to autoscaling runner:
- delete_instances_on_shutdown
- update_interval
- update_interval_when_expecting
- Allow configuring AWS ALB idle timeout with
application_load_balancer_idle_timeout.
Fixed¶
- When using the multiple tokens option for dedicated runners, a empty token is set as well.
- Override the AWS WAF rule that restricts URL size when the
/-/krokiendpoint is targeted. - Names for CloudWatch metrics for WAF overrides were wrong/duplicated.
1.6.6 - 2024-06-24¶
Added¶
- Add setting for
vm.max_map_countto autoscaling runner userdata template. This allows ElasticSearch to run in jobs.
Fixed¶
- Fixed some lingering references pointing the registry mirror to port 5000.
- Auto scaling group templates updates were not set as the new default template to use.
1.6.5 - 2024-06-20¶
Added¶
- Allow specifying
throughputon EBS volumes via theinstancemodule.
Changed¶
- Set some recommended additional settings on the Auto Scaling Group used for AutoScaling Runners.
1.6.4 - 2024-06-13¶
Added¶
- Logging of all requests blocked by AWS WAF to a CloudWatch log stream. Retention can be set via:
waf_log_retention.
Changed¶
- GitLab Runner Fleeting Plugin for AWS updated to latest upstream: version 0.5.0.
Fixed¶
- More required permissions were added to the OIDC role policy files.
1.6.3 - 2024-06-04¶
Fixed¶
- Loki and Promtail versions were not pinned, and thus were not upgraded either.
- GLH cluster runners can now access the health check endpoints.
1.6.2 - 2024-05-30¶
Fixed¶
- AWS Permission for
ssm:StartSession.
1.6.1 - 2024-05-30¶
Added¶
- Feature flag for OIDC integration.
Fixed¶
- Added missing AWS permissions.
- Health check for GitLab Pages not working when custom domains is enabled, because the protocol was set incorrectly.
- Loki, Registry Mirror, Monitor and Runner nodes were not explictly given access to the S3 KMS key.
1.6.0 - 2024-05-21¶
Added¶
- Added S3 bucket creation, so SSM will work when enabled.
- Amazon SSM agent is now installed by Ansible as well. Can be disabled by setting
install_amazon_ssm_agent: false. - Added a maintenance job to run apt clean on the servers daily.
- Terraform variable
runner_cache_object_retention_periodto control how long objects stay in the shared runner cache. Defaults to 120. - OIDC configuration and IAM permissions for automated deployments.
- Terraform variable
network_load_balancer_security_group_idsto add additional security groups to the primary NLB. - Terraform variable
default_security_group_idsto add additional security groups to all EC2 nodes. - Terraform variable
gitlab_rails_security_group_idsto add additional security groups to the GitLab Rails nodes. - Terraform variable
sidekiq_security_group_idsto add additional security groups to the Sidekiq nodes. - Simple playbook
create_admin_userto create/update admin user in GitLab. - Docker image for automated deployment in solutions.
- Allow installing files into
/etc/gitlab/trusted-certsby listing them intrusted_certsfrom within solutions. vpc_cidr_blockis now set by default inansible/inventory/terraform_vars.yml.- GitLab secrets and SSH host keys are stored in AWS Secret Manager.
get_license_infoplaybook to print EE license information from the Rails console.- Solution documentation is now stored in GET's documentation as well.
create_access_tokenplaybook to create a PAT for a given user in the database
Changed¶
- Added user 'dennis' with matching key to the common_vars role
- Removed EBS snapshot configuration for Gitaly volumes.
- Terraform output variable
s3_bucket_arnsused to contain nested lists. These are now flattened. - Limit prefix length to 14 characters in the name of the target group for Prometheus.
- Limit prefix length to 12 characters in the name of the target group for Registry Mirror.
- It's now possible to run the
transfer_secretsplaybook before initial provisioning is done. - Moved listen port on internal NLB for registry mirror to port 443.
auto_start_stoplambda function was rewritten to allow usage of the code onlocalhostas well.- RDS backups are now stored in AWS Backups as well.
- Moved the following Ansible playbooks to the
toolsnamespace: create_access_tokencreate_admin_userget_license_infogrow_filesystemspost_data_migrationpre_data_migrationremove_swaprender_ssh_configtransfer_secrets
Fixed¶
- Registry Mirror role not working when using
docker_proxy_host.
Removed¶
- Removed references to AWS secrets ini file.
Upgrade instructions¶
- Run at least one successful Terraform run before running Ansible, because
terraform_vars.ymlwas changed. - Set
rds_postgres_backup_retention_periodto 30 in your solution if you want to keep current RDS backups.
1.5.1 - 2024-04-09¶
Added¶
- Allow disabling WAF max body size limit by setting the value of
waf_body_size_restrictionto0.
1.5.0 - 2024-04-09¶
Added¶
- Support installing debs via a proxy for Debian, GitLab, Grafana and Docker.io.
- Support pulling Docker image via a proxy for Monitor, Kroki, Registry Mirror.
- Support disabling the
:cleanupRaketasks withmaintenance_rails_cleanup_enabledfor usage during migrations. - Variable
grafana_install_omnibus_dashboardsto prevent installation dashboards located on the internet. - Variable
sns_endpointfor overriding the AWS SNS API URL in order to support VPC endpoints for SNS in solutions. - Output variable
prometheus_alertmanager_topic_arnfor getting the full ARN for the auto-created SNS topic. - New Ansible role
common_vars. Currently contains maintenance SSH related variables. - Ansible generates
user_data.shfor installing and configuring SSH access in the Terraform folder, which is used when creating new EC2 instances by Terraform. This is done in therender_tfvarsrole/playbook. - Re-implemented logic from
glh-admin-accesspackage in Ansible undercommon/tasks/ssh_access.yml. - Creates system users from list
maintenance_users. - Grants sudo permissions to all users in list
maintenance_users. - Installs SSH authorized*keys for each user from
ssh_keys*<user>variables. - Configures SSHD with our default options and to listen on port
maintenance_ssh_port. - Terraform output variable
s3_bucket_arnswhich can be used in solutions to get a list of all S3 buckets in use. - Support for having Gitaly nodes create backups directly to a (new) S3 bucket.
- Terraform variable
noncurrent_version_retention_periodto set how long noncurrent objects are kept. Default:5.
Changed¶
- Remove geerlingguy.node_exporter and install Debian's version instead.
- Remove
docker-composefrom geerlingguy.docker on Monitor node, replaced withdocker composevia Apt install. - Move
sshd-gitlogic and files fromcommonrole togitlab_railsrole. - GitLab Rails backups are now created with
REPOSITORIES_SERVER_SIDE=trueand*_CONCURRENCY=6. - Renamed Terraform variable
backup_retention_period_s3tobackup_retention_period. - Add the
<prefix>-backupsS3 bucket to AWS Backup plan.
Fixed¶
- Harmonized and changed usage of
geerlingguy.dockerto minimize issues when using Aptproxy. sshd-gitis no longer installed and configured on all nodes in a cluster. Leftovers are auto-removed.- Registry Mirror value for $KANIKO_MIRROR_ARGS on GitLab Runners used to contain 'https://' but this is not allowed.
Removed¶
- Removed support for Ansible variable
glh_apt_repo_url, and the related installation code for that Apt repository. - Removed code for installing
glh-admin-accesspackage, both in Ansible and Terraform. - Removed support for
glh_apt_repo_urlin Terraformaws/instancemodule. - Removed support for migrating from the previous Ansible based admin access to the
glh-admin-accesspackage.
Upgrade instructions¶
- If your solution has
backup_retention_period_s3defined, rename that tobackup_retention_period. - Note: If you need to absolutely guarantee that current data is kept, define
noncurrent_version_retention_periodas well, and set it to the same value asbackup_retention_period. You can remove/revert this after a few days.
1.4.5 - 2024-02-29¶
Fixed¶
- Fixed UptimeRobot IP ranges.
1.4.4 - 2024-02-27¶
Fixed¶
- Added missing IAM policy to store logs to CloudWatch.
1.4.3 - 2024-02-22¶
Added¶
- Allowed UptimeRobot access to health check endpoint.
Fixed¶
- Fixed node exporter on Bastion nodes.
1.4.2 - 2024-02-20¶
Added¶
- Documentation for migrating a customer's existing GitLab instance to GET.
- Helper playbooks for data-migration related tasks.
- Helper playbook
grow_filesystemsto grow filesystems on enlarged disks to their new size. - Check to ensure the Ansible inventory actually contains hosts (prevents expired AWS credential errors)
- Option to store EC2 system logs in AWS CloudWatch.
Changed¶
- Clone omnibus-dashboards repository with
depth: 1to speed up the process.
1.4.1 - 2024-02-05¶
Changed¶
- Zero downtime update no longer includes terminated nodes.
- Improved pending migrations check in zero downtime update.
1.4.0 - 2024-02-01¶
Added¶
- Add override for the NLB subnets.
- Add option to switch the load balancers to internal.
- Support for deploying and managing AWS Global Accelerators.
- Added preserve host header to ALB.
- Prometheus Alertmanager is now installed on the monitor node, and sends its messages to AWS SNS.
- AWS SNS topic for alerts, that is configured for our shared infrastructure (see glh/ha/aws-infrastructure).
- Added s3 bucket for CI Secure Files.
Changed¶
- AWS SSM update is now run weekly at 03:00 instead of during the day.
- Loki server configuration optimized to allow for larger queries.
Fixed¶
- Auto start/stop cost saving lambda was not allowed to start customer-managed KMS encrypted EC2 machines.
Removed¶
- Removed ansible DNS configuration.
- Removed entire consul support.
Upgrade instructions¶
- Projects must be updated to 1.3.x before upgrading to this version due to moved Terraform resources.
- Remove the
groups:consulentry fromansible/inventory/aws_ec2.yml.
1.3.3 - 2024-01-30¶
Fixed¶
- Explicitly disable the KAS service from listening for SSL traffic directly.
1.3.2 - 2024-01-29¶
Fixed¶
- Increased timeout on Grafana dashboard clone to prevent timeout errors.
- Allow outbound ICMP requests.
Removed¶
- Removed
glh_container_registry_enableandglh_container_registry_external_url.
Upgrade instructions¶
- Remove
glh_container_registry_enableandglh_container_registry_external_url, make sure the non-glh prefixed vars are set.
1.3.1 - 2023-12-21¶
Fixed¶
- Added missing egress rule to allow ALB to connect to Kroki nodes.
- Recognise all 200-499 status codes as success for pages nodes, force-auth returns a 3xx code instead of 200.
- Security group definition for GitLab pages is broken when custom domains is disabled.
- Handling of version requirements for gitlab-runner package.
- Package of gitlab-runner cannot be updated by Ansible due to package hold.
- Add a default value of
falseforpages_enable_custom_domainsin Ansible to prevent over-reliance on Terraform.
1.3.0 - 2023-11-30¶
Added¶
- It's now possible to add solution specific Terraform results to Ansible's
terraform_vars.ymlfile. - Allow users to disable management of the 'get-terraform-state' S3 bucket:
manage_get_terraform_state_bucket = false. - Support for autoscaling GitLab Runners based on AWS ASGs and the new Fleeting plugins.
- The Terraform variable
alternative_fleet_instance_typescan now be used on autoscaling runner hive definitions to configure alternative instance types to migitate potential capacity issues on AWS. - The Terraform variables
fleet_additional_tagsandautoscaling_fleet_default_additional_tagscan now be used to add additional AWS tags to autoscaling runner resources and subresources. - Terraform variables
autoscaling_fleet_default_disk_device_nameandfleet_disk_device_namecan be used to override the correct expected root disk name, which is required when using alternative AMIs with a different root device name. - AWS auto scaling groups and subresources for autoscaling runners get the
created-by:runner-autoscaletag assigned. - A VPC Endpoint Gateway and security group rule to route S3 traffic through AWS's internal network to reduce costs.
- Added support for SSM instead of SSH.
- Added docs to restrict AWS permissions.
- All instances are added to AWS Fleet Manager.
- Garbage collection runs on primary rails node.
- Basic support for enabling custom domains for GitLab Pages. Read our docs to learn about the current limitations.
Changed¶
- Renamed
ansible/terraform_output.ymltoansible/terraform_vars.yml. - Allow nodes with the
gitlab_s3_policyto list bucket contents as well, so they can perform cleanup tasks. - Access control for GitLab pages has been set to always enabled, users can disable auth on a per-project basis.
- Bastion nodes are now placed in the private subnet.
- Separated internal and external certificates.
- Moved all security group logic to a module to prevent duplication. No changes should occur because of this.
- Rename
registry_mirrorsecurity group toregistry-mirrorbecause other secgroups are named as such as well.
Fixed¶
- Using an external domain name no longer triggers AWS ACM validation errors.
- Default disk device name for autoscaling runner fleet nodes now works as expected with our default runner AMI id.
- Disable
EC2MetaDataSSRF_QUERYARGUMENTSandGenericRFI_QUERYARGUMENTSWAF rules on/oauth/authorizeendpoint.
Upgrade instructions¶
- Before running
terraform apply, remove theansible/terraform_vars.ymlfile and update your.gitignorefile (ansible/terraform_output.yml=>ansible/terraform_vars.yml). - Ensure you run Ansible on your primary Rails node after updating to configure GitLab Pages oAuth in the GitLab DB.
1.2.5 - 2023-11-30¶
Added¶
- Logic to copy custom files to
gitlab_pagescomponents in Ansible.
1.2.4 - 2023-11-17¶
Changed¶
- Separated internal and external certificates.
1.2.3 - 2023-11-02¶
Added¶
- The Ansible
glh_domainvariable which points to{{ prefix }}.glhc.nlby default. - The Ansible
external_kas_urlvariable which points to{{ external_url_sanitised | replace('https', 'wss') }}/-/kubernetes-agent/by default. - The Ansible
gitlab_kas_enabledvariable which defaults totrue. - Ansible var
glh_container_registry_enablefor solutions to control value ofcontainer_registry_enable. - Ansible var
glh_container_registry_external_urlfor solutions to control value ofcontainer_registry_external_url.
Fixed¶
- The registry is now correctly enabled by default and reachable on
https://registry.{{ glh_domain }}. container_registry_enableandcontainer_registry_external_urlare now overridden in the entire dependency chain.- Use the
glh_counterparts to control their values in solutions. - GitLab KAS is now correctly configured.
- Pin the version of
node_exporterto prevent rate-limit issues when looking up the latest version on GitHub.
1.2.2 - 2023-10-18¶
Added¶
- Playbook for configuring
gitlab_nfsnodes, since it is no longer configured via thecommonplaybook.
Fixed¶
- Ensure
commonrole is never invoked directly to prevent omnibus installation on some roles. - The backups bucket is not actually excluded where needed and ends up in a versioning and backup policy.
- Add autoscaling_runner config for Prometheus, this was missing previously.
Removed¶
- The
commonplaybook has been removed, invoke theallor role-specific playbook instead.
Changed¶
- Moved logic to uninstall Omnibus from Monitor role to common role, and invoke it when required.
1.2.1 - 2023-10-12¶
Fixed¶
- Options regarding EC2 instance root devices were not applied properly.
- Fresh cluster cannot be provisioned because of chicken-and-egg S3 problems between glh_nlb.tf and glh_dns.tf.
- Fresh cluster cannot be provisioned because of chicken-and-egg S3 problems in glh_backup.tf.
1.2.0 - 2023-10-05¶
Added¶
- Support for dedicated single-instance GitLab Runners with the Docker executor.
- The
primary_domain_name,primary_registry_domain_name&primary_pages_domain_namedomains are now resolved in private DNS zones. - Missing security group references.
Fixed¶
- Allow sidekiq access to OpenSearch.
- Monitor nodes are now included in zero downtime update.
Changed¶
- Moved Promtail role and tasks to common role to ease zero downtime updates.
1.1.0 - 2023-09-18¶
Added¶
- Loki role for centralized logging.
- Promtail role for sending logs to Loki.
- Separate Grafana role.
- Thanos install on the monitor role via Docker.
- NLB/ALB configuration for accessing Grafana, Prometheus and Thanos.
- Pre-installed CloudWatch datasource for Grafana.
- Pre-made Grafana dashboards have been copied into our version of the stack from upstream version
2.8.5. - Ansible playbook to generate local SSH configuration for easy SSH proxying.
- Validation to Ansible pre_configure playbook.
Removed¶
- Monitor role no longer has Grafana installed, use the new Grafana role instead.
- Grafana access via
/-/grafanaon the primary GitLab hostname. - Monitor role no longer has
gitlab-omnibusinstalled, and it is automatically removed on the next Ansible run. Prometheus is run via Docker instead.
Changed¶
- Running more than 1 monitor node is temporarily not supported while we figure out how to properly automate a HA setup.
- Prometheus's metrics are now stored in S3 instead of on-disk.
- The
instancelabel on Prometheus metrics is now set tohostname:scrape_port. For example:glh-dev-gitlab-rails-1:8080. - The
hostnamelabel on Prometheus metrics is set to theansible_hostnameby the Prometheus server, even if the metrics already have this label set. Compared to the example above, this will be set to:glh-dev-gitlab-rails-1. - Instances are now rebooted before reconfiguration in zero downtime update.
- All
security.security_group_<GROUP>outputs have been changed tosecurity.security_group_<GROUP>_idand now return the ID instead of the complete resource.
Upgrade instructions¶
- Set
monitor_node_countto1when it is set to more than1currently. - Manually export data from your current Grafana instance when you need to keep it, this includes users and dashboards.
- Silence monitoring for your cluster when you are scraping your cluster with federation, all metrics will be reset which may trigger a lot of false-positive alerts.
- If you encounter errors with running Ansible on existing monitor nodes, manually run
gitlab-ctl stopbeforehand. - If you reference a
security.security_group_<GROUP>.idin your solution, replace it withsecurity.security_group_<GROUP>_id.
1.0.0 - 2023-09-12¶
Added¶
- Initial 1.0.0 release, functionally the same as 0.7.0.
0.7.0 - 2023-08-29¶
Added¶
- Terraform variable
reverse_az_ordering- flips the AZ choice when creating EC2 machines. - Ansible variable
aws_nfs_stop_after_run- can be used to override if NFS servers are stopped or not. - Optional schedule to auto start/stop instances for cost saving.
- Option to filter ingress traffic.
Fixed¶
- Fixed missing dependencies in backup causing errors during setup of new cluster.
- NTP package is now installed by default so Ansible runs don't fail on newly created machines.
Removed¶
- The
security_group_common_egress_https_cidr_blocksvariable, this is replaced byhttp_allowed_egress_cidr_blocks. - The default security group no longer has ingress and egress rules.
Changed¶
- Renamed the following egress filter variables:
security_group_common_egress_dns_cidr_blocks=>dns_allowed_egress_cidr_blockssecurity_group_common_egress_http_cidr_blocks=>http_allowed_egress_cidr_blockssecurity_group_common_egress_ntp_cidr_blocks=>ntp_allowed_egress_cidr_blocks- EC2 nodes that aren't explicitly added to a security group, no longer have network access.
0.6.0 - 2023-08-22¶
Added¶
- DNS nameservers can now be configured using the
dns_nameserversAnsible variable. - NTP servers can now be configured using the
ntp_serversAnsible variable.
Changed¶
- Updated WAF rules for GitLab 15.0 GraphQL API calls.
0.5.1 - 2023-08-18¶
Added¶
- It's now possible to SSH to nodes using their hostname (e.g.
ssh <prefix>-gitlab-rails-1) from a Bastion node.
Changed¶
- All nodes now have their hostname set correct.
- All nodes now reboot during a zero downtime update.
Removed¶
- Swap is no longer added on new nodes.
Upgrade instructions¶
- To propagate the hostname changes, each node needs to be rebooted.
- To remove swap from existing nodes, run the
remove_swapplaybook.
0.5.0 - 2023-08-18¶
Added¶
- Terraform module variables for using existing ACM certificates:
certificate_arnpages_certificate_arnregistry_certificate_arn- Support for Kroki diagram servers.
- Added security groups to Network Load Balancers.
- Ansible variables are now generated by this toolkit.
- The EC2 common security group now filters egress traffic with destinations outside the GitLab VPC.
Changed¶
- Upgraded to GET 2.8.5.
- All non-primary domain traffic is now redirected to the primary domains.
- For example: if you set
domain_nameto git.example.com, git.cluster.glhc.nl is redirected there. - This does not apply to pages, because AWS ALB cannot perform substitutions on hostnames.
- ALB rules were split off into their own file:
glh_alb_rules.tf. - ALB rule ordering was changed to solutions can more easily inject custom rules.
- In case Terraform fails to apply these with error
PriorityInUse, just run Terraform again to fix this. - Restricted access to health check & metric endpoints.
- A number of resources and variables have had their names changed in order to comply to code standards.
Fixed¶
- Fixed invalid s3 lifecycle configuration.
- Redirect pages traffic to the
pages_domain_namein addition to *.. - Fixed copy of SSH config.
- Fixed creation of SSL certificate when no
domain_nameis set. - Fixed access from monitor node to pages metrics.
- Grafana and Kroki now only accept traffic on the primary domain.
Upgrade instructions¶
- Existing Network Load Balancers have to be deleted manually to add the security groups. Ansible needs to be run afterward.
- Remove
terraform/ansible_vars.tffrom your project. - Due to the resource name changes, you might see a large number of Terraform changes. This is expected behaviour.
0.4.1 - 2023-08-08¶
Fixed¶
- Backup S3 bucket now has one lifecycle configuration.
- user_data is now an ignored change on AWS instances.
0.4.0 - 2023-08-08¶
Added¶
- Ansible variable
glh_apt_repo_urlto configure which GitLabHost Apt repository is installed. - Added jumphost IP to Bastion SSH allow list.
Changed¶
- Set "block public access" on S3 buckets to
trueby default for increased security. - Provisioning SSH servers is now done using a Debian package that installs via user-data set by Terraform.
Fixed¶
- Moved consul server config from bastion to consul role to prevent errors during setup.
Removed¶
- Ansible-based user setup, sudo configuration, SSH maintenance server configuration.
Upgrade instructions¶
- Remove the following Ansible variables (including examples):
ansible_useransible_ssh_private_key_file- Remove the following Terraform module parameters (including examples):
ssh_allow_port_22
0.3.1 - 2023-08-07¶
Added¶
- Added load balancer stickiness based on GitLab session cookie for better zero downtime update experience.
0.3.0 - 2023-07-28¶
Added¶
<PREFIX>.glhc.nlDNS record for easier management.- Added lifecycle policy to all buckets to delete noncurrent versions as soon as possible.
Changed¶
- Changed terraform directory structure, "terraform/gitlab_aws_cluster" is now "terraform/aws/cluster".
- The
registry.<PREFIX>.glhc.nlDNS record is now always created.
Fixed¶
- Shared SSH host keys between Bastion nodes before writing custom SSH config.
- Fixed consul in initial setup.
- The Zero Downtime Update playbook is now working properly.
Upgrade instructions¶
- In your
environment.tf, replaceterraform/gitlab_aws_clusterwithterraform/aws/clusterin the module source.
0.2.1 - 2023-06-29¶
Fixed¶
- Generating a certificate now waits for validation before assigning it to the load balancer.
0.2.0 - 2023-06-27¶
Added¶
- GitLab Pages is now supported.
Changed¶
- Path to shared secrets is updated to reflect changes in the template.
0.1.1 - 2023-06-16¶
Added¶
- Consul is now installed on bastion nodes and can be run with
consul members. - Variable to set WAF body size restriction.
- CI linting.
- Wrote a proper README.
Upgrade instructions¶
- Update
ansible/inventory/aws_ec2.yml, add the following underkeyed_groups:groups: # Register primary bastion node as consul servers consul: tags.gitlab_node_level == 'bastion-primary'
0.1.0 - 2023-06-05¶
Added¶
- Initial developmental release.