Skip to content

fix: change default endpoint mode from vip to dnsrr for applications#4522

Open
maickonn wants to merge 29 commits into
Dokploy:canaryfrom
maickonn:fix/default-dnsrr-endpoint-mode
Open

fix: change default endpoint mode from vip to dnsrr for applications#4522
maickonn wants to merge 29 commits into
Dokploy:canaryfrom
maickonn:fix/default-dnsrr-endpoint-mode

Conversation

@maickonn
Copy link
Copy Markdown

Problem

Applications deployed with Dokploy on Docker Swarm frequently return 502 Bad Gateway when accessed through Traefik. The root cause is that Docker Swarm's default endpoint mode (vip) relies on IPVS (IP Virtual Server) for internal load balancing, which fails in several common environments:

  • LXC containers (common VPS type): Kernel blocks IPVS operations with EPERM (confirmed via strace, dockerd logs show "Failed to create a new service for vip ... operation not permitted")
  • Specific kernel versions (e.g. 6.17.0): IPVS netlink operations return -EPERM even for root with full capabilities
  • Multi-node Swarm with overlay networks: VIP routing can break between nodes, making service VIPs unreachable from Traefik

Dokploy's built-in databases (PostgreSQL, MySQL, MariaDB, Redis, MongoDB, LibSQL) already default to dnsrr mode and work correctly in all these environments. Applications were the only service type still defaulting to vip.

Solution

Change the default endpoint mode for applications from vip (implicit Docker default) to dnsrr (DNS Round Robin), matching what databases already use.

File Changes
packages/server/src/utils/builders/index.ts Add Mode: "dnsrr" to application EndpointSpec fallback
packages/server/src/services/rollbacks.ts Add Mode: "dnsrr" to rollback EndpointSpec
apps/dokploy/.../endpoint-spec-form.tsx Update UI default + placeholder to reflect dnsrr as default

How dnsrr works

Instead of a Virtual IP (VIP) that requires IPVS kernel routing, dnsrr makes the Swarm DNS resolver return the container IPs directly. Traefik receives the real container IPs and performs client-side load balancing via its own health checks — which it already does for all backends regardless of endpoint mode.

Compatibility

  • Multi-replica services: Fully supported. DNS returns all task IPs, Traefik round-robins between them.
  • Single-node Swarm: Works identically or better than VIP (no IPVS dependency).
  • Multi-node Swarm: Works correctly across nodes without relying on IPVS VIP routing.
  • Users who need vip: Can still switch via UI (Advanced → Cluster → Swarm Settings → Endpoint Spec).

Related issues

ngenohkevin and others added 29 commits May 12, 2026 21:35
The empty-records branch of `main()` returned without calling
`process.exit(0)`, leaving the Drizzle Postgres connection pool
holding the event loop open. The `migrate-auth-secret` process
then hangs indefinitely after printing "No 2FA records found,
nothing to migrate." causing the upstream `0.29.3.sh` security
migration script (which calls this via `docker exec`) to never
reach its final `docker service update` step that mounts the new
Docker Secret. Operators end up with the new secret created but
the dokploy service still configured with the hardcoded
`BETTER_AUTH_SECRET`, while believing the migration completed.

Match the success branch a few lines below which already does
`process.exit(0)`, and the pattern used in sibling scripts
`reset-password.ts` and `reset-2fa.ts`.

Closes Dokploy#4392
…ret-exit-on-empty

fix(migrate-auth-secret): exit cleanly when there are no 2FA records
Adds an "Import" option to the Create Service dropdown that lets users
paste a base64-encoded compose export, preview the template (compose YAML,
domains, envs, mounts) before confirming, and create the service only on
confirm. Adds a `previewTemplate` tRPC procedure that processes the base64
without touching the DB, with server access validation via session.
…-base64

feat(compose): add import from base64 in create service dropdown
- Updated the GitHub Actions workflow to sync versioning across MCP, CLI, and SDK repositories.
- Added steps to bump the version in the SDK repository and regenerate tools from the latest OpenAPI spec.
- Improved commit message formatting to include source and release information for all repositories.
- Ensured successful synchronization messages for each repository after the version update.
- Introduced a new `readLogs` procedure that allows users to retrieve logs for a specific deployment by providing the deployment ID and an optional tail parameter.
- Implemented permission checks to ensure users have access to the requested logs.
- Enhanced log retrieval for both cloud and non-cloud environments, utilizing appropriate commands based on the server context.

Resolve Dokploy/mcp#14
- Implemented server access validation in deployment procedures to ensure users can only access deployments associated with their active organization.
- Added checks to throw an UNAUTHORIZED error if a user attempts to access a deployment linked to a server outside their organization.

This enhancement improves security and access control within the deployment management system.
- Added validation to prevent users from being invited with the owner role in the organization and user routers.
- Implemented TRPCError responses to ensure proper error handling when attempting to assign the owner role.
This change enhances role management and security within the organization structure.

https://github.com/Dokploy/dokploy/security/advisories/GHSA-fm9p-wmpw-gxjh
- Added functionality to delete old sessions when a user updates their password, ensuring that only the current session remains active.
- This change enhances security by preventing unauthorized access from previous sessions after a password change.

Close here https://github.com/Dokploy/dokploy/security/advisories/GHSA-rr9m-w87g-46f3
* fix: copy Dokploy server IP when clicking server badge

When a service runs on the local Dokploy server (no remote server),
clicking the server badge did nothing because `data.server` is null.
Now falls back to the server IP from settings so the badge always
copies an IP address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(copy-ip): implement IP address copying functionality across database service components

- Added the ability to copy the server IP address to the clipboard when clicking the server badge in various database service components (Libsql, MariaDB, MongoDB, MySQL, PostgreSQL, Redis).
- Integrated the `copy-to-clipboard` library and `sonner` for user feedback upon successful copy action.
- Ensured fallback to the server IP from settings when the service data is not available, enhancing user experience and functionality.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Mauricio Siu <siumauricio@icloud.com>
Signed-off-by: Nahidujjaman Hridoy <hridoyboss12@gmail.com>
… routes (Dokploy#4468)

* fix: allow square brackets in zip drop path validation for Next.js dynamic routes

ZIP uploads containing Next.js dynamic route files (e.g. app/api/[id]/route.ts,
pages/[slug].tsx) were rejected by readValidDirectory because the path regex
did not include square bracket characters.

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…es (Dokploy#4470)

shouldDeploy passed undefined/null entries from commit.modified straight
into micromatch, which throws "Expected input to be a string" and fails
every webhook deployment when watch paths are configured. Filter out
non-string values before matching.
…t accidental submission (Dokploy#4422)

Co-authored-by: Maks Pikov <mixelburg@users.noreply.github.com>
…ploy#4018) (Dokploy#4474)

* fix: add tls=true label for compose domains when certificateType is none (Dokploy#4018)

* test: cover tls=true label for certificateType none, require https

* fix: scope tls fix to compose labels, leave traefik file config unchanged (Dokploy#4018)
Upgraded next dependency in apps/dokploy to 16.2.6 exactly. Verified typescript typecheck passes successfully.
…nforce-sso) (Dokploy#4511)

* feat: add self-hosted enterprise restrictions (remote-servers-only, enforce-sso)

- Add `remoteServersOnly` field to webServerSettings: prevents creating services
  on the local Dokploy VM, forcing all deployments to remote servers. Validated
  in all 8 service routers (application, compose, postgres, mysql, mongo, redis,
  mariadb, libsql).
- Add `enforceSSO` field to webServerSettings: hides the email/password login
  form and shows only the SSO button on the login page.
- Both settings are enterprise-only (enterpriseProcedure) and self-hosted-only
  (blocked at the API level when IS_CLOUD=true).
- UI toggles added to the SSO settings page under a new "Self-hosted
  Restrictions" card (hidden in cloud). Login page reads enforceSSO from
  getServerSideProps to avoid client-side flash.
- Migrations: 0167_fresh_goliath.sql, 0168_long_justice.sql

* fix: add missing final newlines to migration files

* refactor: improve code formatting for better readability in multiple components

- Adjusted formatting in `add-application.tsx`, `add-compose.tsx`, and `add-database.tsx` to enhance readability by adding line breaks and consistent indentation.
- Updated `toggle-enforce-sso.tsx` to simplify the Switch component's props.
- Reformatted imports in `index.tsx` and `sso.tsx` for consistency.
- Cleaned up conditional statements in various router files for improved clarity.

* fix: add enforceSSO to test mock
On settings/servers, a long server name in the card title (h3) did not
wrap and overflowed its container, overlapping nearby content and
squeezing the three-dots actions menu until it disappeared.

Allow the title block to shrink and wrap (min-w-0 + break-words), keep
the server icon and the actions trigger from being crushed (shrink-0),
and add gap between the title and the actions button.
… docker config (Dokploy#4485)

The compose/stack deploy command runs under `env -i PATH="$PATH"`, which
clears the environment except for PATH. That strips HOME, so when the
generated command is `docker stack deploy --prune --with-registry-auth`
the docker CLI cannot resolve `~/.docker/config.json` (e.g.
`/root/.docker/config.json`) and ships no registry credentials to the
swarm. Private-registry images then fail to pull on the nodes:

  image registry.example.com/... could not be accessed on a registry to
  record its digest. Each node will access ... independently

while the deploy still logs "Docker Compose Deployed: ✅".

Keep PATH isolation but preserve HOME so docker can read its config for
both `stack deploy --with-registry-auth` and `compose up -d --build`.

Add a regression test asserting the generated command preserves
`HOME="$HOME"` for both stack and docker-compose deploys.

Fixes Dokploy#4401

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maickonn maickonn requested a review from Siumauricio as a code owner May 31, 2026 18:06
@dosubot dosubot Bot added the size:S This PR changes 10-29 lines, ignoring generated files. label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom domain with docker swarm doesnt work Docker Swarm Services Return 502 Bad Gateway - Overlay Network Connectivity Issue

10 participants