Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,6 @@ When creating a PR for Refractr, confirm you've done the following steps for a s
- [ ] Have you checked if there are any TLS cert concerns - e.g. if the domain being redirected already exists, and it is being changed to point at refractr, is a temporary TLS 'outage' while waiting for certification via HTTP challenge okay? If not, add a note to the JIRA ticket.
- [ ] If desired, have you generated the nginx config manually to confirm updates work as expected?

After PR merge, next steps include:
- [ ] A merge to the `main` branch will automatically deploy refractr's stage environment -- deploying the prod environment requires a GitHub release to be created.
- [ ] Once deployed, refractr's certmap must be updated and DNS entries must be changed -- SRE can help with this. Please pull someone in on the JIRA ticket or ask for help in #sre on Slack.
After PR merge:
- [ ] A merge to `main` automatically deploys both stage and prod.
- [ ] TLS certificates are created automatically by [Spacelift](https://mozilla.app.spacelift.io/stack/refractr-prod). DNS changes may still require SRE or IT to make changes in other systems (e.g. Markmonitor, Route53) -- ask in #mozcloud-support on Slack or in the JIRA ticket.
3 changes: 3 additions & 0 deletions .github/workflows/build_and_push_to_gar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ on:
branches:
- main

# tag-based deployments are deprecated but can be reconfigured via
# https://github.com/mozilla/global-platform-admin/blob/main/tenants/refractr.yaml
# if desired
tags:
- v[0-9]+.[0-9]+.[0-9]+

Expand Down
8 changes: 4 additions & 4 deletions docs/SRE_INFO.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,15 @@ When an update to e.g. `prod-refractr.yaml` was made, you can validate the syste

#### deployment

We automatically deploy a stage & prod env for refractr. Stage is deployed whenever a merge to `main` is done, prod is deployed when a GitHub release is created (or a tag has been pushed), that matches the pattern `v\d+\.\d+\.\d+`. In both cases, GitHub actions will build the docker image, Argo CD will ship it to GKE.
We automatically deploy a stage & prod env for refractr. Both environments use the same deployment: when a PR merges to `main`, CI builds a single image (tagged with `git describe` output like `v0.0.225-3-gabcdef1`) and Argo CD deploys it to both stage and prod.

#### update certificates

Refractr's Loadbalancer is build from Gateway API resources by GCP's Gateway API Controller. The system handles ~180 redirects and forces TLS for every refract, which results in ~360 SSL certificates attached to the Loadbalancer. To be able to make use of such a high number of SSL certificates at once, we're using GCP's certificate manager API to map hostnames to certificates. Certificate manager API defines a certmap (that is attached to the Gateway API Loadbalancer via an annotation), which in turn defines entries that map hostnames to certificates.
Refractr's Loadbalancer is built from Gateway API resources by GCP's Gateway API Controller. The system handles ~180 redirects and forces TLS for every refract, which results in ~360 SSL certificates attached to the Loadbalancer. To be able to make use of such a high number of SSL certificates at once, we're using GCP's certificate manager API to map hostnames to certificates. Certificate manager API defines a certmap (that is attached to the Gateway API Loadbalancer via an annotation), which in turn defines entries that map hostnames to certificates.

Certificates and certificate-map-entries are managed with terraform, because no method for building them alongside the Gateway API resources directly from GKE was available at the time of migrating to mozcloud.
Certificates and certificate-map-entries are managed with terraform, using a [terraform-module](https://github.com/mozilla/terraform-modules/tree/main/google_certificate_manager_certificate_map) that reads refract definitions from `prod-refractr.yml` directly via an HTTP data source.

To simplify this, we're using a [terraform-module](https://github.com/mozilla/terraform-modules/tree/main/google_certificate_manager_certificate_map), which reads all refract definitions in from a single yaml document. This document can be generated with a `doit` task, `$ poetry run doit certificate_manager_input`. Once generated and put in place, refractr's certmap can be updated with a PR in to the infra repo.
Certificate updates are automated via Spacelift. When a new prod image is deployed, argocd-image-updater commits the new image tag to `webservices-infra`, which triggers a Spacelift tracked run on the `refractr-prod` stack. A plan policy auto-approves changes to certificate manager resources. Additionally, hourly drift detection catches any certificate changes that may have been missed and auto-reconciles them.

#### DNS

Expand Down
4 changes: 2 additions & 2 deletions docs/refractr-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ The refractr.yml spec allows for specifying tests in the form of given-source to
### minimal changes
Due to the nature of redirects and rewrites it is common to add new domains or subtract old ones. This means that the nginx config needs to be told which are the valid list of domains and update them when deploying a new refractr Docker image to GKE. When a new version of the refractr image is pushed to prod, redirects are already live.

In a second step, certificates must be created and linked to refractr's Loadbalancer -- this step currently requires a second PR to be opened after deployment. All certificates are managed with GCP's certificate manager api and attached to the Loadbalancer by a certmap, we manage all of those resources via terraform in refractr's infrastructure project.
Certificates are updated automatically. The terraform configuration reads `prod-refractr.yml` directly via an HTTP data source, so certificate changes are detected by Spacelift drift detection and auto-reconciled without a separate PR.

## refractr traffic flow
Traffic flow to refractr starts with DNS. A domain that should be handled by the system must be pointed to it's Loadbalancer, usually by a CNAME, in some cases, by A / AAAA records. Once a request reaches the Loadbalancer, we force HTTPS, then forward to the actual application pods, which then handle individual redirects as configured.

## Continuous Integration (CI)
CI is done with GitHub Actions. Tests run on every push to any branch in the repo. However, only pushes to the **main** branch and **tags** (matching the `/v[0-9]+.[0-9]+.[0-9]+/`) will cause an update of refractr's Docker image. In addition to tests, Pull Requests (PRs) require code reviews before allowing the change to be to the **main** branch.
CI is done with GitHub Actions. Tests run on every push to any branch in the repo. Pushes to the **main** branch build a single image (tagged with `git describe` output) that is deployed to both stage and prod by Argo CD. In addition to tests, Pull Requests (PRs) require code reviews before allowing the change to be merged to the **main** branch.

## The Handoff
The handoff point between CI and CD is the Docker Repository. In this case we decided to use the Cloud Provider based Docker Repository. For GCP that is Google Artifact Registry (GAR). Images are named refractr and have the output of git describe for the image tag. Image tags are watched and deployed by Argo CD.
Expand Down