Skip to content

docs(self-hosted): add Terraform guides for AWS, GCP, and Azure#4187

Open
devfreddy-langchain wants to merge 10 commits into
mainfrom
feat/self-hosted-terraform-foundation
Open

docs(self-hosted): add Terraform guides for AWS, GCP, and Azure#4187
devfreddy-langchain wants to merge 10 commits into
mainfrom
feat/self-hosted-terraform-foundation

Conversation

@devfreddy-langchain
Copy link
Copy Markdown

Summary

  • Adds Terraform deployment guides for LangSmith self-hosted across AWS, GCP, Azure, and OpenShift, mirroring the structure of the existing Helm setup guide.
  • Each provider gets an overview, quickstart, architecture reference, prerequisites/variables, teardown, troubleshooting, and (for Azure) per-pass walkthroughs for the optional add-ons (LangSmith Deployment, Agent Builder, Insights, Polly).
  • OpenShift ships in Preview while the OCP Terraform module matures; AWS, GCP, and Azure are GA.

Why

Self-hosted enterprise customers using the public Terraform modules at github.com/langchain-ai/terraform have so far been pointed at provider READMEs. This PR brings those walkthroughs into the official docs so the in-product Terraform path matches the experience the Helm path has had.

Areas needing careful review

  • Architecture diagrams and per-pass layer descriptions — accuracy of which services each pass provisions
  • Azure variables reference — deepest variable surface area of any provider
  • OpenShift "Planned production path" sections — preview content, calls out steps not yet GA

Test plan

  • make lint_prose passes (verified locally: 0 errors across all 30 terraform docs)
  • make broken-links passes
  • Mintlify preview renders all new pages and navigation entries resolve
  • /langsmith/self-host-terraform-{aws,gcp,azure,ocp}-quickstart URLs match src/docs.json

AI involvement

Drafted in collaboration with Claude Code (Opus 4.7). Author directed content and accuracy; Claude assisted with prose drafting, Vale lint cleanup passes, and consistency between provider sections.

Wave 0 of migrating Terraform deployment guides from langchain-ai/ps-ops-center
into public docs as a peer to the existing Helm path. Adds an overview page
covering provider choice, prereqs, two-pass model, deployment tiers, and an
SA-package callout, plus a new "Deploy with Terraform" nav group under
LangSmith > Platform setup > Self-hosted, positioned right after the
cloud-provider landing pages so customers see the TF vs Helm choice up front.

Cross-links from each cloud-provider landing page and the existing Helm install
guide point at the new overview. Provider sub-pages and architecture diagrams
land in subsequent waves; image assets are staged now to keep wave PRs small.
Wave 1 of the Terraform docs migration. Adds six AWS-specific pages under
the new "Deploy with Terraform" group:

  - Prerequisites: tools, IAM permissions, AWS auth, license + domain.
  - Quickstart: end-to-end EKS deploy walkthrough (Pass 1 infrastructure,
    Pass 2 application via script-driven or Terraform-managed Helm), plus
    optional Envoy Gateway ingress and private cluster + bastion patterns.
  - Architecture: platform layers, services, IRSA, networking, ingress
    options, TLS, module dependency graph, validated behaviors.
  - Quick reference: make targets, kubectl, AWS CLI, Helm, Terraform.
  - Troubleshooting: 16 known issues with fixes + diagnostic commands.
  - Teardown: terraform destroy path + manual AWS CLI path when state is
    lost, including IAM, VPC, and orphan cleanup.

Source content adapted from github.com/langchain-ai/terraform/modules/aws
(README, ARCHITECTURE, QUICK_REFERENCE, COMMANDS, SERVICES, TROUBLESHOOTING,
TEARDOWN) and from internal ps-ops-center/docs/content/aws/. Module paths
updated from the source docs' stale "terraform/aws/" references to the
current "modules/aws/" layout. Audience reframed from SA-zip delivery to
public self-serve via git clone.

The Wave 0 overview page now links its AWS provider card to the new
quickstart. Azure, GCP, and OpenShift land in subsequent waves.
Wave 2 of the Terraform docs migration. Adds five GCP-specific pages under
the new "Deploy with Terraform" group:

  - Quickstart: end-to-end GKE deploy walkthrough covering all 5 passes
    (infra, base, LangSmith Deployment, Agent Builder, Insights + Polly),
    plus prereqs (tools, APIs, IAM, auth), sizing profiles, and key watchouts.
  - Architecture: platform layers, module descriptions, deployment tiers
    (light vs. production), traffic flow, Workload Identity, Secret Manager
    integration, and the Terraform module graph.
  - Quick reference: make targets, kubectl, gcloud, Helm, Terraform cheat
    sheet plus add-on / sizing toggles.
  - Troubleshooting: 14 known issues (APIs, GKE, Cloud SQL peering,
    Memorystore, cert-manager, GCS, Envoy Gateway IP churn, Workload
    Identity, langsmith-ksa, Helm pending-upgrade, Secret Manager).
  - Teardown: terraform destroy path + manual gcloud path when state is
    lost, including private service connection cleanup and VPC ordering.

Source adapted from github.com/langchain-ai/terraform/modules/gcp (README,
ARCHITECTURE, QUICK_REFERENCE, SERVICES, TROUBLESHOOTING, TEARDOWN) and from
internal ps-ops-center/docs/content/gcp/. Module paths updated from the
source docs' stale "gcp/" / "terraform/gcp/" references to the current
"modules/gcp/" layout. Audience reframed from SA-zip delivery to public
self-serve via git clone. The Wave 0 overview page now links its GCP card
to the new quickstart.
Wave 3 of the Terraform docs migration. Adds eleven Azure-specific pages
under the new "Deploy with Terraform" group:

  - Quickstart: end-to-end AKS deploy walkthrough with prereqs and the
    5-pass model overview.
  - Pass 1: infrastructure provisioning (AKS, VNet, PostgreSQL, Redis,
    Blob, Key Vault, cert-manager, KEDA, ingress).
  - Pass 2: LangSmith Helm install (script-driven and Terraform paths).
  - Pass 3: LangSmith Deployment (host-backend, listener, operator).
  - Pass 4: Agent Builder.
  - Pass 5: Insights + Polly.
  - Architecture: platform layers, services, Workload Identity matrix,
    networking, ingress options, secret flow.
  - Quick reference: make targets, kubectl, az, Terraform, key watchouts.
  - Variables: full input-variable reference.
  - Troubleshooting: 25+ known issues including K8sVersion, vCPU quota,
    Key Vault purge protection, DNS label, istio-addon, Workload Identity,
    AGIC, Istio, encryption key gotchas, OOM, stuck HPAs.
  - Teardown: 3-step procedure, Key Vault soft-delete handling.

Source adapted from github.com/langchain-ai/terraform/modules/azure
(README, ARCHITECTURE, QUICK_REFERENCE, SERVICES, INGRESS_CONTROLLERS,
TROUBLESHOOTING, TEARDOWN) and from internal ps-ops-center/docs/content/
azure/. Module paths updated from "terraform/azure/" to the current
"modules/azure/" layout. Audience reframed from SA-zip delivery to
public self-serve via git clone.

The Wave 0 overview page now links its Azure card to the new quickstart.
Also includes copy-edit polish across the AWS and GCP pages from the
same session for consistent voice.
Wave 4 of the Terraform docs migration. Adds eight OCP-specific pages
under the new "OpenShift (preview)" sub-group within "Deploy with
Terraform":

  - Quickstart: planned production path (ROSA / on-prem OCP) plus the
    current Single Node OpenShift on Azure POC.
  - Pass 1: infrastructure setup (operators, RBAC, storage classes).
  - Pass 2: Helm install with OpenShift Route vs Gateway API guidance.
  - Architecture: SNO-on-Azure POC topology, nip.io DNS, planned
    module layout, key differences from AKS/GKE/EKS, VM sizing.
  - Quick reference: oc + kubectl + helm commands for OCP.
  - Variables: planned Terraform variable reference (preview).
  - Troubleshooting: SCC, Route vs Gateway API, ODF / MinIO storage.
  - Teardown: planned 3-pass order plus POC Azure-host destroy.

Every OCP page carries a preview warning. The Wave 0 overview page now
links its OpenShift card to the new quickstart. Source adapted from
github.com/langchain-ai/terraform/modules/ocp (README, ARCHITECTURE,
QUICK_REFERENCE, TROUBLESHOOTING, TEARDOWN) and from internal
ps-ops-center/docs/content/ocp/. Includes additional copy-polish
across earlier waves' troubleshooting pages from the same session.
… guides

Apply Vale prose cleanups to bring the new Azure and OpenShift guides in
line with the AWS/GCP pages: replace `via` with `with`/`through`, convert
"Known issues" intros to active voice, restructure passive constructions,
drop wordy phrases (`Minimum version` → `Version`, drop `safely`), and
lowercase "pass" in heading sentence case.

CI-clean at the error level. Remaining warning-level findings are all
defensible false positives (product-name headings like `PostgreSQL` / `Key
Vault`, Terraform's `destroy` subcommand, "Let's Encrypt" product name,
and technical compounds like `LTS-only`).
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening a docs PR, devfreddy-langchain! When it's ready for review, please add the relevant reviewers:

  • @katmayb or @fjmorris (LangSmith)

@github-actions github-actions Bot added langsmith For docs changes to LangSmith internal labels May 28, 2026
- Add lowercase `aks` to codespell ignore list (matches `module.aks` Terraform refs; uppercase `AKS` was already ignored but codespell is case-sensitive)
- Replace `pre-select(ed/s)` with `preselect(ed/s)` to match the existing `prebuilt` style convention
…pages

Restructures AWS and GCP Terraform guides to match the new ps-ops-center
source layout. Both quickstarts shrink to rapid-path overviews that point
at dedicated pass pages.

AWS additions: pass-1 (infrastructure), pass-2 (LangSmith application,
including Envoy Gateway and Terraform-managed paths), and a full
variables reference page.

GCP additions: pass-1 (infrastructure with preflight step), pass-2
(scripted and manual Helm paths), pass-3 (LangSmith Deployment plus
Agent Builder, Insights, and Polly add-ons), and a full variables
reference page.

Nav updated to insert the new pages under each provider group.
…rminology

Collapse each provider's quickstart, prerequisites, and per-pass walkthroughs
into a single deploy.mdx so customers follow one continuous narrative instead
of hopping between numbered pages. Delete OCP entirely (preview status no
longer justified the maintenance overhead) and remove the teardown pages
(standard terraform destroy patterns plus per-provider quick reference cover
day-2 needs). Rename "Pass N" sections to descriptive headings (Infrastructure,
LangSmith application, LangSmith Deployment add-on, etc.) across architecture,
variables, troubleshooting, and quick-reference pages.

Final shape per provider: deploy + architecture + variables + quick-reference +
troubleshooting. Vale and broken-links checks pass.
@github-actions
Copy link
Copy Markdown
Contributor

Mintlify preview branch generated: preview-featse-1779998369-ad736a3

Site preview: https://langchain-5e9cc07a-preview-featse-1779998369-ad736a3.mintlify.app

Preview links may take a few minutes to start working while the deployment finishes.

Changed documentation pages (preview deep links):

Only the top 5 changed markdown files by diff size are listed.

@devfreddy-langchain devfreddy-langchain changed the title docs(self-hosted): add Terraform guides for AWS, GCP, Azure, OCP docs(self-hosted): add Terraform guides for AWS, GCP, and Azure May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal langsmith For docs changes to LangSmith

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant