Add Transit Gateway egress mode#115
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Transit Gateway (TGW) egress mode to the Terraform modules/vpc and exposes it through modules/quilt, allowing private-subnet default routes to egress via a customer-provided TGW instead of Quilt-managed NAT gateways / IPv6 egress-only IGW.
Changes:
- Introduces
transit_gateway_idinput (quilt + vpc modules) to toggle TGW egress behavior. - Disables NAT gateways and IPv6 egress-only IGW when
transit_gateway_idis set, creates a TGW VPC attachment, and adds IPv4/IPv6 default routes in private route tables to the TGW. - Documents the new option in
examples/main.tf.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/vpc/variables.tf | Adds transit_gateway_id input to the VPC module. |
| modules/vpc/main.tf | Implements TGW attachment + private default routes; disables NAT/egress-only IGW when TGW is set. |
| modules/quilt/variables.tf | Exposes transit_gateway_id on the public quilt module. |
| modules/quilt/main.tf | Wires transit_gateway_id through to the VPC submodule. |
| examples/main.tf | Adds example/commented usage for transit_gateway_id. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address PR review feedback on the TGW egress mode: - Reject empty/malformed transit_gateway_id via a format validation on both the vpc and quilt module variables. - Fail fast when transit_gateway_id is set with create_new_vpc == false (existing-VPC mode, where it would otherwise be silently ignored) by adding it to the vpc module's existing-network configuration_error precondition. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These pin the intended behavior of the TGW egress mode and fail against the current validation bug (the `transit_gateway_id == null` check sits in the new-VPC requirement set, so a new VPC + transit_gateway_id wrongly trips configuration_error and the feature is inert): - vpc: new-VPC + transit_gateway_id must plan and create the TGW attachment + IPv4/IPv6 default egress routes; existing-VPC + transit_gateway_id must be rejected. - quilt smoke: new-VPC + transit_gateway_id must plan end-to-end (wrapper now threads transit_gateway_id). The validation fix follows in the next commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…g-VPC path
The transit_gateway_id == null check was in new_network_requires, so creating a
new VPC with a transit_gateway_id (the supported egress mode) failed the
new-network requirement set and tripped configuration_error — the feature could
never be applied. It also left existing-VPC + transit_gateway_id silently
accepted rather than rejected.
Move the check to existing_network_requires, which expresses the actual intent
("TGW egress requires create_new_vpc == true"):
- new VPC + transit_gateway_id now plans and creates the attachment + routes;
- existing VPC + transit_gateway_id is rejected during plan;
- existing VPC without a TGW is unaffected (transit_gateway_id == null holds).
Turns the failing tests from the previous commit green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Routing the private subnets' ::/0 at the TGW unconditionally assumed the TGW carries IPv6. If it doesn't, IPv6 traffic is black-holed (worse than no route, which lets IPv4 fall back cleanly) — an assumption the general-purpose module should not bake in. Add transit_gateway_ipv6_egress (bool, default false) on the vpc and quilt modules. When false (default), no ::/0 -> TGW route is created and the attachment advertises ipv6_support = "disable"; the established IPv4 control-plane egress is unaffected. Operators whose TGW does carry IPv6 opt in. Tests cover both: IPv6 off (no v6 route, attachment ipv6_support disabled) and IPv6 on (v6 route present, ipv6_support enabled). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The toggle was `transit_gateway_id != null`, used in `count`. That forbids a transit_gateway_id known only after apply (e.g. a TGW created in the same configuration): count must be known at plan, so a computed id raises "Invalid count argument". This is the idiomatic terraform-aws-modules pattern (create_*/enable_* bool next to the value param, like enable_nat_gateway). Add `enable_transit_gateway` (bool, default false) on the vpc and quilt modules; gate the attachment/route counts and the NAT/EOIGW disable on it. transit_gateway_id is now the value (required when enabled, may be computed), guarded by a precondition on the attachment. Existing-VPC + enabled is still rejected via the configuration_error precondition. Tests updated to set enable_transit_gateway; verified a computed transit_gateway_id (same-config TGW) now plans cleanly. Example updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- README: "Transit Gateway egress" section — what it does, how to enable (enable_transit_gateway + transit_gateway_id, computed id allowed), the operator-side TGW prerequisites (RAM share/accept, egress + return routes), CIDR-uniqueness requirement, IPv6 opt-in, and reversibility. - VARIABLES: rows for enable_transit_gateway, transit_gateway_id, transit_gateway_ipv6_egress. - CHANGELOG: [Unreleased] entry for the feature. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptileai please re-review — significant changes since the last pass: the toggle is now an explicit |
The previous wording implied IPv6 "falls back to IPv4" cleanly. Correct it: with transit_gateway_ipv6_egress off there is no IPv6 default route, so an IPv6 connection attempt fails immediately (ENETUNREACH) and the client uses IPv4 with no delay. The slow case is the opposite — enabling it against a TGW that does not carry IPv6 points ::/0 at a black hole, and clients without Happy Eyeballs (e.g. Python requests/urllib3) stall on the connection timeout. So the warning belongs on enabling it, not on the (safe) default-off path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sir-sigurd
left a comment
There was a problem hiding this comment.
Follow-up round — this is likely the last. The core mechanism checks out: count stays plan-resolvable, the standalone routes don't collide with the upstream module's routes (it creates neither a NAT default route nor an egress-only-IGW route when both are disabled), S3 stays on the gateway endpoint, the IPv6 route and ipv6_support are gated on the same variable so they can't drift apart, and existing non-TGW deployments see no plan diff (the now-explicit create_egress_only_igw = true matches the prior module default in vpc 6.6.1). The transit_gateway_id precondition fails at plan time for a known null and correctly defers only for a computed ID.
Design: sound on the mechanism — Transit Gateway routing, template safety (count/route-collision), and upgrade-safety for existing deployments all verified. One operational doc gap on the toggle's egress-IP side effect, below. No blockers.
Documentation
P2 — The reversibility note covers NAT/EIP recreation and the egress interruption, but not that the stack's public egress IP changes when the mode is toggled. Destroying the NAT gateways releases their Elastic IPs, and AWS won't hand the same addresses back on reallocation; toggling on routes egress out the customer's NAT instead. Either direction changes the outbound IP, so anything that allowlists Quilt's egress address (a license endpoint, a partner firewall, a SaaS IP allowlist) breaks silently — and it looks unrelated to the change. Worth one explicit sentence.
P3 — examples/main.tf:142 still reads # ... otherwise IPv6 is left on IPv4 fallback. That's the wording the README was already corrected away from (no IPv6 default route → immediate ENETUNREACH → IPv4, rather than a "fallback"). Align the comment with the README mechanism.
Tests
P3 — The added runs cover the happy paths and the existing-VPC rejection, but not the negative paths the module newly guards: the transit_gateway_id = null precondition failure (enable_transit_gateway = true), the transit_gateway_id regex rejection, and the "id set but enable_transit_gateway = false" no-op. These are the guards most likely to silently regress in a future refactor. (The smoke runs asserting only stack_name is fine — the resource-level assertions live in the vpc tests.) Add-only.
Nits
P3 (reshape) — transit_gateway_id's description drifts between modules: the quilt wrapper says "when creating a new VPC" but the vpc module — which actually enforces the constraint — doesn't. If anything the scoping clause belongs on the vpc side.
P3 (reshape) — The gate local.new_network_valid && var.enable_transit_gateway is repeated across all three count expressions. A derived local alongside the existing *_valid flags would give it one source of truth; the IPv6 route would then read as a clear further-narrowing of it.
P3 (reshape) — A one-line comment on why the attachment lands in intra_subnets (deliberate — it's only ENI placement; the default routes go in the private route tables) would keep a future maintainer from "fixing" it into the private subnets.
P3 — The ^tgw-[0-9a-f]+$ validation is unanchored on length, so tgw-0 passes; real IDs are tgw- + 17 hex. It catches gross typos but not a truncated ID — optional tightening.
Reviewed against 91645b6
- README: note that toggling the mode changes the stack's public egress IP (NAT EIPs released on disable; egress via the TGW's NAT when enabled), so anything allowlisting Quilt's egress address must be updated or it breaks silently. - examples/main.tf: align the IPv6 comment with the corrected mechanism (off = no IPv6 default route, clients use IPv4) instead of "IPv4 fallback". - vpc tests: add negative-path runs — enabled-without-id (attachment precondition), malformed id (variable validation), and id-set-without-enable (no-op: no attachment/routes). - vpc main.tf: hoist the repeated `new_network_valid && enable_transit_gateway` count gate into a `transit_gateway_enabled` local; add a comment on why the attachment lands in intra subnets (ENI placement only). - quilt: align transit_gateway_id description with the vpc module (the new-VPC scoping lives on enable_transit_gateway). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptileai please re-review — addressed the follow-up: egress-IP reversibility note, negative-path tests (enabled-without-id, bad-format id, id-without-enable no-op), the |
|
@drernie |
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The applied review suggestions left two long unwrapped lines; re-flow them to match the surrounding wrap convention (wording unchanged). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sir-sigurd
left a comment
There was a problem hiding this comment.
Carried this over from @drernie. TGW egress mode: toggle design is plan-safe (computed transit_gateway_id works), validation + smoke tests cover positive/negative paths, and the feature was validated end-to-end on a live dev stack (egress confirmed through the TGW). Docs reviewed by @drernie. CI green. Approving.
Cut release 1.8.0 — promotes the Transit Gateway egress mode (#115) from Unreleased into a dated CHANGELOG section.
What is changing
This adds a first-class Transit Gateway egress mode to the Terraform
quilt/vpcmodules. Whenenable_transit_gateway = true(withtransit_gateway_id) while creating a new VPC, Quilt still owns the VPC, subnets, endpoints, and route tables, but it disables NAT gateways and the IPv6 egress-only IGW, creates a TGW VPC attachment in the intra subnets, and points each private route table's IPv4 default route at the TGW.transit_gateway_idmay be a value known only after apply (e.g. a TGW created in the same configuration) — the toggle is the separateenable_transit_gatewaybool, socountstays resolvable at plan time. IPv6 egress through the TGW is opt-in viatransit_gateway_ipv6_egress(default off, since not every TGW carries IPv6); when enabled it points the IPv6 default route at the TGW and enables IPv6 support on the attachment. Set ittrueonly if the TGW actually carries IPv6 egress, since pointing::/0at a TGW that can't route IPv6 black-holes the traffic and stalls clients without Happy Eyeballs (e.g. Python). With it off, the new VPC has no IPv6 default route, so an IPv6 attempt fails immediately and the client uses IPv4 with no delay.Why
A customer needs Quilt-created infrastructure to egress through an existing network path instead of Quilt-created NAT gateways. This keeps the deployment on the supported Terraform path while letting the customer supply the TGW as the egress boundary.
What was considered
transit_gateway_id != nullas the sole toggle was set aside: that gates resourcecount, which must be known at plan time, so atransit_gateway_idcreated in the same configuration (a computed value) would fail with "Invalid count argument." An explicitenable_transit_gatewaybool (the idiomaticenable_*/create_*pattern, mirroringenable_nat_gateway) keeps the count plan-resolvable and lets the id be computed, so the TGW can be provisioned in the same config as the stack.Risks and reversibility
The apply depends on the customer sharing the TGW into the deployment account and configuring TGW route tables/return paths correctly. If those prerequisites are wrong, attachment or egress validation can fail. The change is reversible by setting
enable_transit_gateway = false, which restores the existing NAT gateway and IPv6 egress-only IGW behavior for newly-created VPCs.Open questions
What approval means
Approval means Quilt supports this as the Terraform-owned way to create a new Quilt VPC whose private-subnet default egress routes through a customer-provided TGW.
Validation
Automated (CI, no AWS credentials, no infrastructure)
terraform fmt -check -recursiveandterraform validatefor all modules.terraform testinmodules/vpc— plan-time, mocked-provider tests of the new-vs-existing network input validation, including TGW egress: a new VPC withenable_transit_gateway = trueplans and creates the attachment + the IPv4 default egress route (and the IPv6 route only whentransit_gateway_ipv6_egressis set); existing-VPC +enable_transit_gatewayis rejected.terraform testinmodules/quilt/tests/smoke— plan-time smoke test of the public module against a mocked provider, including the TGW path.fmt/validate.End-to-end (live, on an internal dev stack)
Deployed a full Quilt stack with
enable_transit_gateway = true— the TGW created in the same configuration (computedtransit_gateway_id), NAT gateways disabled — and confirmed egress actually flows through the TGW:CREATE_COMPLETEand the catalog served traffic.transit_gateway_idpath applies cleanly (no "Invalid count argument").Links
Greptile Summary
This PR adds a first-class Transit Gateway egress mode to the
vpcandquiltmodules. Whenenable_transit_gateway = true(only withcreate_new_vpc = true), the module disables NAT gateways and the IPv6 egress-only IGW, creates a TGW VPC attachment in the intra subnets, and routes each private route table's default IPv4 (and optionally IPv6) egress through the TGW.enable_transit_gatewaybool gates all resourcecountexpressions, whiletransit_gateway_idcan be a computed/unknown value — this avoids the "Invalid count argument" planning error that would occur if the count depended on a not-yet-known TGW ID.expect_failurestests inmodules/vpc/tests/validation.tftest.hcl, plus positive-path smoke tests threaded throughmodules/quilt.examples/main.tfare updated consistently, with clear reversibility and egress-IP-change warnings.Confidence Score: 5/5
Safe to merge. The toggle design is architecturally sound, validation coverage is comprehensive, and the end-to-end test confirms the computed-ID path works correctly.
The transit_gateway_enabled local correctly ties all resource counts to new_network_valid && var.enable_transit_gateway — both values are known at plan time, preventing Invalid count argument even when transit_gateway_id is computed. The lifecycle precondition catches the null-ID case, the variable validation rejects malformed IDs, and the existing_network_requires map blocks the unsupported existing-VPC combination. All seven new test cases (positive and negative paths) exercise these invariants, and the PR's end-to-end validation confirms the full flow on a live stack.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A([enable_transit_gateway]) -->|true| B{new_network_valid?} A -->|false| NAT[NAT gateways + egress-only IGW] B -->|false| ERR[configuration_error precondition fires] B -->|true| C[transit_gateway_enabled = true] C --> D[aws_ec2_transit_gateway_vpc_attachment.egress count=1] D --> PREC{transit_gateway_id != null?} PREC -->|false| FAIL[Precondition failure] PREC -->|true| E[IPv4 routes: aws_route.private_tgw_ipv4_egress x N private RTs] E --> F{transit_gateway_ipv6_egress?} F -->|true| G[IPv6 routes: aws_route.private_tgw_ipv6_egress x N private RTs - ipv6_support=enable on attachment] F -->|false| H[No IPv6 default route - ipv6_support=disable on attachment] NAT --> I[S3 gateway endpoint unchanged in both modes] G --> I H --> IReviews (3): Last reviewed commit: "Address review follow-up: egress-IP doc,..." | Re-trigger Greptile