Skip to content

feat: add Scaleway BYOC provider#2105

Draft
simple-agent-manager[bot] wants to merge 39 commits into
mainfrom
feat/scaleway-byoc
Draft

feat: add Scaleway BYOC provider#2105
simple-agent-manager[bot] wants to merge 39 commits into
mainfrom
feat/scaleway-byoc

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager Bot commented May 8, 2026

Summary

Adds Scaleway as a BYOC provider in the Defang CLI. defang --provider scaleway compose up now deploys through a Scaleway Serverless Jobs CD task and integrates with the Pulumi provider in DefangLabs/pulumi-defang#234.

Implemented

  • Scaleway provider authentication from SCW_ACCESS_KEY, SCW_SECRET_KEY, SCW_DEFAULT_PROJECT_ID, and region env.
  • CD setup/run/destroy through Scaleway Serverless Jobs v1alpha2.
  • Secret Manager-backed CD job secrets for sensitive env values instead of plaintext job env.
  • Deterministic CD job setup: stale duplicate defang-cd job definitions are removed before creating the current definition.
  • First-deploy and corrupt-state recovery for missing or invalid project.pb.
  • Cockpit/Loki log querying through the project logs data-source URL and read_only_logs Cockpit tokens.
  • compose ps / services readback from the CD-written project.pb artifact.
  • Expanded CD infrastructure teardown helpers for job definitions, CD secrets, registry images/namespace, and state bucket contents.
  • Scaleway LLM compose fixup: model services are stripped and dependent services receive Scaleway Generative API env vars.
  • First-class Scaleway LLM auth: when a Scaleway model is used and OPENAI_API_KEY is absent, the CLI creates the Defang config from the Scaleway secret key.

Scaleway behavior and limitations

  • Scaleway Generative API is OpenAI-compatible at https://api.scaleway.ai/v1/; no LiteLLM sidecar is deployed.
  • Model aliases are currently hardcoded (chat-default -> llama-3.3-70b-instruct, embedding-default -> bge-multilingual-gemma2).
  • Serverless Containers do not support host-mode ports. Use ingress ports for public HTTP services; managed Postgres/Redis host-mode ports are consumed by provider-specific managed resources.
  • Defang delegated DNS requires Scaleway domain permissions. Live testing with the available credential hit HTTP 403 for the delegated defang.app zone, while the native Scaleway container domain worked.
  • compose logs now reaches Cockpit without endpoint/auth failures. Log completeness still depends on Scaleway product log availability and labels.
  • Scaleway Managed Redis currently requires REDIS_PASSWORD config for this provider path; mastra-extended exposed that requirement even though the sample README only calls out POSTGRES_PASSWORD.

Live validation

Validated on 2026-05-11 against the SAM-provided Scaleway account using the draft Pulumi provider branch and CD image rg.fr-par.scw.cloud/defang-cd/cd:sam-20260511d.

Small app validation:

  • compose up completed for a Python service with a Compose models: entry.
  • Native Scaleway endpoint returned HTTP 200.
  • /llm successfully called Scaleway's OpenAI-compatible /models endpoint with auto-created OPENAI_API_KEY config.
  • compose ps returned DEPLOYMENT_COMPLETED and healthy after the Pulumi provider wrote project.pb.
  • compose logs returned without Cockpit DNS/auth errors.
  • compose down removed the live Serverless Container and namespace.

Mastra Extended validation:

  • Deployed the unmodified projects/samples/samples/mastra-extended sample on stack mastraextended.
  • Supplied only Defang config values: policy-compliant POSTGRES_PASSWORD, REDIS_PASSWORD, and auto-created Scaleway OPENAI_API_KEY.
  • Playwright clicked Generate sample items, verified 10 tasks, 10 events, and 20 classified items, waited 15 seconds, then asked the chat UI what to look at first.
  • Playwright result: 1 passed.
  • compose down removed the sample's Serverless Containers, namespace, managed Postgres, and managed Redis; native API checks confirmed cleanup.

Test plan

  • go1.25.9 test ./pkg/clouds/scaleway ./pkg/cli/compose ./pkg/cli/client/byoc/scaleway
  • go1.25.9 test ./pkg/clouds/scaleway ./pkg/cli/client/byoc/scaleway
  • Build local CLI and run live compose up / endpoint checks / compose ps / compose logs / compose down on Scaleway
  • Playwright validation of deployed mastra-extended UI
  • Full CI pipeline

Related PRs

raphaeltm and others added 17 commits May 8, 2026 19:02
Add Scaleway as a new cloud provider with skeleton BYOC client:
- Add SCALEWAY = 6 to Provider proto enum
- Add ProviderScaleway constant, registration, and all switch cases
- Add region config with default fr-par and SCW_DEFAULT_REGION env var
- Create skeleton ByocScaleway client with stub implementations
- Wire up provider factory in connect.go
- Add test cases for provider ID and region

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add low-level Scaleway API wrappers in pkg/clouds/scaleway/ following
the pattern established by pkg/clouds/aws/ and pkg/clouds/gcp/:

- common.go: Base client with HTTP helpers, auth header, region/zone utils
- auth.go: Credential validation via IAM API, env var loading
- storage.go: S3-compatible object storage using AWS SDK with custom endpoint
- secret.go: Secret Manager API (create, version, list, delete)
- jobs.go: Serverless Jobs API for CD task execution
- registry.go: Container Registry namespace and image management
- dns.go: DNS zone management
- errors.go: Structured error handling with IsNotFound/IsConflict helpers

Includes unit tests for error detection, URL construction, and S3 endpoint
generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the Scaleway BYOC provider methods using the SDK wrappers
from pkg/clouds/scaleway/:

- Authenticate: validates credentials via IAM API, creates S3 client
- Deploy/Preview: marshal compose to protobuf, upload payload, run CD job
- GetProjectUpdate/GetServices/GetService: download state from S3 bucket
- PutConfig/ListConfig/DeleteConfig: Scaleway Secret Manager operations
- CreateUploadURL: presigned S3-compatible upload URLs
- SetUpCD: create bucket, registry namespace, job definition
- TearDownCD: basic cleanup with TODO for full implementation
- CdCommand: run arbitrary CD commands via Serverless Jobs
- CdList: list Pulumi stacks from S3 state bucket
- GetDeploymentStatus: check Serverless Job run status

QueryLogs, Subscribe, and PrepareDomainDelegation remain as stubs
(ErrNotImplemented) as they require Scaleway-specific logging and DNS
infrastructure not yet available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- PrepareDomainDelegation: creates a DNS zone in Scaleway and returns NS
  records for parent zone delegation
- Subscribe: polls Serverless Job run status to stream deployment events
  with state change detection
- QueryLogs: queries Cockpit Loki API for container/job logs with
  support for both historical queries and follow/tail mode
- Add Cockpit SDK wrapper (cockpit.go) for token management and
  Loki query_range integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change CD command from `node lib/index.js` to `/app/cd` (Go CD binary)
- Add DEFANG_PULUMI_DIR debug support for local CD testing
- Add SCALEWAY_DEPLOY_LOG.md tracking deployment progress and fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scaleway S3 presigned URLs use HTTPS path-style format
(https://s3.fr-par.scw.cloud/bucket/key) but Kaniko expects s3://bucket/key
for S3 build contexts. Add convertScalewayS3URL() to perform this conversion,
similar to the existing GCS URL conversion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass S3_ENDPOINT to CD task environment for Scaleway S3-compatible storage
- Fix secret name format: replace path separators with underscores to match
  Scaleway Secret Manager naming convention
- Add early return in debug mode after DebugPulumiCD completes
- Update work log with session 4 findings (blockers 6-10)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the eval$IFS$KANIKO_SCRIPT solution for running shell scripts
in Scaleway Serverless Jobs, including failed approaches and verification
experiments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Session 5 resolved blockers 11-16:
- AWS SDK v2 IMDS fallback (AWS_EC2_METADATA_DISABLED=true)
- Kaniko chown in sandbox (patched binary ignoring EPERM)
- Local storage limit (local_storage_capacity=10GB)
- Staging directory permissions (MkdirAll 0644→0755)
- apt-get setgroups in sandbox (apt config + user mgmt stubs)
- Health check required fields (default threshold/interval)

Deployment succeeded! App endpoint live on Scaleway.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CDImage validation with actionable error message (like Azure)
- Normalize Scaleway's 400 "same secret name" error to 409 so PutConfig
  correctly falls through to update existing secrets
- Store raw API error body for detailed error matching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CD binary reads its command from os.Args, but the Scaleway job
was only setting DEFANG_CD_CMD in env vars (which the CD binary ignores).
Now pass the command string via the Scaleway start endpoint's command
field, which is whitespace-split into an exec array by the API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r stubs

Documents three new blockers fixed:
- Blocker 17: USER appuser not in /etc/passwd (functional stubs instead of exit 0)
- Blocker 18: addgroup symlink overwrites adduser (os.Remove before WriteFile)
- Blocker 19: Kaniko cache reuses old broken layers (deleted cache tags)

Full deployment now working: Next.js app + Postgres on Scaleway Serverless Containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Config → Up → Verify (HTTP 200 + Postgres data) → Down all working.
Added known issue about Loki DNS not resolving in devcontainer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Scaleway provider case to configureAccessGateway in compose fixup:
- Default chat model: llama-3.3-70b-instruct
- Default embedding model: bge-multilingual-gemma2
- Uses LiteLLM's native scaleway/ prefix (SCW_SECRET_KEY for auth)

Add sample chat app (samples/scaleway-llm-chat/) demonstrating the
provider: type: model compose pattern with Scaleway Generative APIs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Simplify compose.yaml to single service talking directly to
  Scaleway Generative APIs (OpenAI-compatible at api.scaleway.ai)
- Remove LiteLLM intermediary (host-mode ports unsupported on Scaleway)
- Add PULUMI_HOME env var for Scaleway CD jobs (non-root container)
- Shorten project name to avoid 16-char resource name limit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
For Scaleway deployments:
- Model services use Scaleway Generative APIs directly (no LiteLLM
  sidecar needed). Dependent services get OPENAI_API_KEY (user-set),
  endpoint URL, and model name injected.
- REDIS_PASSWORD env var is auto-injected for managed Redis services
  since Scaleway requires authentication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
raphaeltm and others added 12 commits May 10, 2026 22:31
Image building section now reflects the Kaniko solution. LiteLLM
"remaining work" section replaced with the final direct Generative API
approach that was validated end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These files don't belong in the CLI repo: SCALEWAY_DEPLOY_LOG.md is an
internal work log and samples/scaleway-llm-chat/ belongs in the samples repo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…UpCD

When CreateJobDefinition returns a conflict error, the code now calls
ListJobDefinitions to find the existing job by name and stores its ID,
preventing a nil jobDefID from being used in subsequent RunJob calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Scaleway API does exact name matching, but ListSecrets documents
prefix semantics. Now falls back to listing all secrets and filtering
client-side by prefix when the exact match returns no results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ication

The function shares significant logic with wireDependentServices but
the differences (no network wiring, nil OPENAI_API_KEY, dependency
removal) are interleaved, making extraction non-trivial.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only call the Scaleway S3 URL conversion when the URL actually contains
a Scaleway domain, avoiding unnecessary processing for other providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ProviderGCP and RegionDefaultGCP constants were reformatted only for
alignment with the new Scaleway additions. Revert to original formatting
to minimize diff noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simple-agent-manager
Copy link
Copy Markdown
Contributor Author

Scaleway mastra-extended live validation completed on 2026-05-11 using CD image rg.fr-par.scw.cloud/defang-cd/cd:sam-20260511e-pr234 and stack mastraextsam.\n\nEndpoint validated: https://mastraextended4036fb1c-app.functions.fnc.fr-par.scw.cloud\n\nProof screenshots are committed on this PR branch:\n\nGenerated dashboard\n\nChat answer\n\nValidation details: generated 10 tasks + 10 events, dashboard reported 20 classified items and zero failed queue jobs, then asked a chat question and captured the answer in-browser with Playwright. The delegated Defang DNS name did not resolve immediately, so browser validation used Scaleway's native Serverless Container domain.

@simple-agent-manager
Copy link
Copy Markdown
Contributor Author

Scaleway log validation update (2026-05-11)

Pushed fixes through c96d540:

  • Parse Scaleway Cockpit JSON log payloads and suppress metadata-only entries.
  • Query runtime logs by Serverless Container resource labels instead of falling through to CD logs for type ALL.
  • Preserve runtime resource_instance as host in log output.
  • Tolerate concurrent Cockpit token recreation/delete races.
  • Added focused tests for CD query scoping, runtime query labels, JSON payload parsing, metadata suppression, and token delete races.

Validation performed against validation/scaleway/log-smoke with /tmp/defang-scaleway-pr:

  • compose up --force --wait-timeout 600 completed successfully for deployment cffdp6qfzqo4 and streamed CD logs during deployment, including Pulumi update lines and the Scaleway endpoint output.
  • logs --type CD --limit 80 --since 3m returned the final CD update logs after deployment, including Container app updated, endpoint https://scalewaylogsmokef9df5e7b-app.functions.fnc.fr-par.scw.cloud, and Resources: ~ 2 updated, 6 unchanged.
  • logs app --type RUN --limit 12 --since 3m returned runtime stdout/stderr, including defang-log-smoke starting server port=8080, heartbeat lines, and request/access lines.
  • logs app --type RUN --follow --since 5s captured a live request generated by curl /follow-check: request path=/follow-check and GET /follow-check HTTP/1.1 200.
  • Focused test gate: go test ./pkg/cli/client/byoc/scaleway passes.

Cleanup:

  • Ran compose down; Pulumi reported - 8 deleted.
  • Deleted Defang-created Cockpit debug/log tokens.
  • Verified no leftover matching Scaleway containers, container namespaces, serverless jobs, registry namespaces, private networks, or secrets for this validation stack.

@lionello lionello mentioned this pull request May 11, 2026
3 tasks
@simple-agent-manager
Copy link
Copy Markdown
Contributor Author

Adding concrete log command/output excerpts from the Scaleway validation run.

Deployment-time logs from compose up:

$ defang --provider scaleway --stack logsval --project-name scaleway-log-smoke compose up --force --wait-timeout 600
* Tailing logs for deployment ID cffdp6qfzqo4 ; press Ctrl+C to detach:
* Waiting for services to finish deploying: ["app"]
2026-05-11T18:53:44.622Z defang-cd-logsval Updating (logsval):
2026-05-11T18:53:45.562Z defang-cd-logsval  ~  scaleway:containers:Container app updating (0s) [diff: ~environmentVariables]
2026-05-11T18:54:06.400Z defang-cd-logsval  ~  scaleway:containers:Container app updated (20s) [diff: ~environmentVariables]
2026-05-11T18:54:10.123Z defang-cd-logsval Outputs:
2026-05-11T18:54:10.123Z defang-cd-logsval     endpoints      : {
2026-05-11T18:54:10.123Z defang-cd-logsval         app: "https://scalewaylogsmokef9df5e7b-app.functions.fnc.fr-par.scw.cloud"
2026-05-11T18:54:10.123Z defang-cd-logsval     }
2026-05-11T18:54:10.123Z defang-cd-logsval Resources:
2026-05-11T18:54:10.123Z defang-cd-logsval     ~ 2 updated
2026-05-11T18:54:10.123Z defang-cd-logsval     6 unchanged
* Deployment complete. Waiting for services to be healthy...
SERVICE  DEPLOYMENT    STATE  FQDN                                                 ENDPOINT
app      cffdp6qfzqo4         app.scaleway-log-smoke-logsval.raphaeltm.defang.app  https://app.scaleway-log-smoke-logsval.raphaeltm.defang.app
* Done.

Post-deploy CD logs command:

$ defang --provider scaleway --stack logsval --project-name scaleway-log-smoke logs --type CD --limit 80 --since 3m
2026-05-11T18:53:44.622Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e Updating (logsval):
2026-05-11T18:53:45.562Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e  ~  scaleway:containers:Container app updating (0s) [diff: ~environmentVariables]
2026-05-11T18:54:06.400Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e  ~  scaleway:containers:Container app updated (20s) [diff: ~environmentVariables]
2026-05-11T18:54:10.123Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e Resources:
2026-05-11T18:54:10.123Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e     ~ 2 updated
2026-05-11T18:54:10.123Z  defang-cd-logsval f9ef08f9-506f-4765-9e85-459ebf61e14e     6 unchanged

Runtime logs command after hitting the endpoint:

$ curl -fsS https://scalewaylogsmokef9df5e7b-app.functions.fnc.fr-par.scw.cloud/post-deploy-final
defang log smoke ok

$ defang --provider scaleway --stack logsval --project-name scaleway-log-smoke logs app --type RUN --limit 12 --since 3m
2026-05-11T18:53:53.856Z  scalewaylogsmokef9df5e7b-app-00002-deployment-68b7b9754f-gxdlp defang-log-smoke 2026-05-11T18:53:53.856587+00:00 starting server port=8080
2026-05-11T18:53:53.857Z  scalewaylogsmokef9df5e7b-app-00002-deployment-68b7b9754f-gxdlp defang-log-smoke 2026-05-11T18:53:53.857572+00:00 heartbeat
2026-05-11T18:53:54.019Z  scalewaylogsmokef9df5e7b-app-00002-deployment-68b7b9754f-gxdlp defang-log-smoke 2026-05-11T18:53:54.018939+00:00 request path=/
2026-05-11T18:53:54.019Z  scalewaylogsmokef9df5e7b-app-00002-deployment-68b7b9754f-gxdlp defang-log-smoke-access 2026-05-11T18:53:54.019267+00:00 "GET / HTTP/1.1" 200 -

Follow mode live request:

$ defang --provider scaleway --stack logsval --project-name scaleway-log-smoke logs app --type RUN --follow --since 5s
$ curl -fsS https://scalewaylogsmokef9df5e7b-app.functions.fnc.fr-par.scw.cloud/follow-check
2026-05-11T18:52:39.917Z  scalewaylogsmokef9df5e7b-app-00001-deployment-558d7b9b67-27fph defang-log-smoke 2026-05-11T18:52:39.917386+00:00 request path=/follow-check
2026-05-11T18:52:39.917Z  scalewaylogsmokef9df5e7b-app-00001-deployment-558d7b9b67-27fph defang-log-smoke-access 2026-05-11T18:52:39.917651+00:00 "GET /follow-check HTTP/1.1" 200 -

Cleanup log excerpt:

$ defang --provider scaleway --stack logsval --project-name scaleway-log-smoke compose down
2026-05-11T18:56:57.507Z defang-cd-logsval Destroying (logsval):
2026-05-11T18:56:58.622Z defang-cd-logsval  -  scaleway:containers:Container app deleting (0s)
2026-05-11T18:56:59.488Z defang-cd-logsval  -  scaleway:containers:Container app deleted (0.87s)
2026-05-11T18:57:10.258Z defang-cd-logsval Resources:
2026-05-11T18:57:10.258Z defang-cd-logsval     - 8 deleted
* Done.

Scaleway Managed Database requires passwords with uppercase, lowercase,
digit, and special character. CreateRandomConfigValue now accepts a
ProviderID and generates Scaleway-compatible passwords for that provider.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant