bug: multi-node setup needs unique network names#375
bug: multi-node setup needs unique network names#375vsoch wants to merge 1 commit intorootless-containers:masterfrom
Conversation
README.md
Outdated
|
|
||
| See `make help`. | ||
|
|
||
| If you are running a multi-node setup with a shared filesystem and location for your network CNI files, you will want to create a non-shared location for each node's usernetes code (e.g., `/tmp` is usually not shared) and run this additional command for each of the control-plane and worker nodes before `make up`. It will give the network (and corresponding CNI files) unique names in the shared location, usually in `~/.config/cni`, |
There was a problem hiding this comment.
I don't think that container engines support locating CNI files on a shared filesystem
There was a problem hiding this comment.
I suggest just mounting a local filesystem on .config/cni
There was a problem hiding this comment.
I suggest just mounting a local filesystem on .config/cni
You mean on the HPC node? On top of NFS, and for every user? That seems overkill for what comes down to a file naming issue.
The solution here does not change functionality for a user that doesn't need this change, but supports multi-node shared filesystem setups for users that need it with an isolated make multi-node command. If there turns out to be other multi-node functionality that is needed, it could be added to that section.
There was a problem hiding this comment.
I don't think that container engines support locating CNI files on a shared filesystem
In rootless mode, podman puts the cni files in the user's home. To be clear, it isn't shared between users, it is shared between nodes. reference,
There was a problem hiding this comment.
Same as:
I don't think a new Makefile target should be added for this.
docker-compose.yaml can be modified in vi or yq.
There was a problem hiding this comment.
So for an HPC cluster of hundreds or thousands of nodes, you want the user to manually update the file with vim?
You requested changes on the PR - can you please clarify what I can change? It seems more you are rejecting any kind of change for this.
There was a problem hiding this comment.
CNI files aren't expected to be shared between nodes.
If you aren't allowed to mount local filesystems, as a workaround you can just automate updating YAMLs with yq https://github.com/mikefarah/yq
There was a problem hiding this comment.
CNI files aren't expected to be shared between nodes.
In a rootless environment with Podman, where they are stored in ~/.config in the user's home (that is mounted and shared across compute nodes) it is not just expected, it is guaranteed.
There was a problem hiding this comment.
They are expected/guaranteed to be under the home, but not expected to be under the shared home
There was a problem hiding this comment.
They are expected/guaranteed to be under the home, but not expected to be under the shared home
I have never seen an HPC cluster with a user home that is not a filesystem mapped across nodes, and thus shared. It's usually NFS. It's strategically like that so you can login to multiple different clusters an see files, and jobs running across compute nodes can see the same space too.
docker-compose.yaml
Outdated
| "nerdctl/bypass4netns-ignore-subnets": "${BYPASS4NETNS_IGNORE_SUBNETS:-}" | ||
| networks: | ||
| default: | ||
| default_network: |
There was a problem hiding this comment.
Probably you can use a variable like ${HOSTNAME} here, then no need to add a new Makefile target
In the case of using Podman (or a runtime that has a shared CNI directory in the user home) and the case that the runtime generates a cni file for each node network, if you have a shared filesystem and a single, non-unique name, each node will write a slightly different address in the CNI file and clobber any previously written files (race condition). This additional make multi-node command will replace "default network" to be specific to the hostname and avoid this. Signed-off-by: vsoch <vsoch@users.noreply.github.com>
7e435b1 to
fbaaec5
Compare
|
Updated to test using |
|
Yeah, that's what I remember from before. I'm going to keep this in our branch - it's a bug here. |
In the case of using Podman (or a runtime that has a shared CNI directory in the user home) and the case that the runtime generates a cni file for each node network, if you have a shared filesystem and a single, non-unique name, each node will write a slightly different address in the CNI file and clobber any previously written files (race condition). This additional make multi-node command will replace "default network" to be specific to the hostname and avoid this.
I renamed the network from
defaulttodefault_networkso it would be more unique for the sed (default is fairly generic). If the user doesn't run this (and they don't need to for most setups without a shared cni cache) the network will just be calledusernetes_default_networkinstead ofusernetes_default.