Skip to content

Commit 1b0838c

Browse files
committed
feat: add cluster setup files
1 parent 1a73c97 commit 1b0838c

23 files changed

Lines changed: 2141 additions & 1 deletion

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
__pycache__
1+
__pycache__
2+
*.ply

AGENTS.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# AGENTS
2+
3+
Quick orientation and cluster-specific setup for this `sam-3d-objects` fork.
4+
5+
## Repo overview
6+
- Model: SAM 3D Objects (single image -> 3D geometry/texture/layout).
7+
- Primary docs: `README.md`, `doc/setup.md`, `SAM3D_SETUP_NOTES.md`.
8+
- Cluster helpers live in `repro/` (scripts for reproducible runs on this cluster).
9+
10+
## Cluster requirements
11+
- Linux platform `linux-64`.
12+
- NVIDIA GPU with >= 32 GB VRAM (A6000 preferred).
13+
- Build/install on a GPU node to avoid PyTorch3D CPU-only builds.
14+
15+
## Recommended Slurm allocation
16+
```
17+
salloc -p a6000 --gres=gpu:1 --cpus-per-task=8 --mem=32G --time=02:00:00
18+
srun --pty bash
19+
```
20+
21+
## Environment setup (mamba)
22+
```
23+
cd /path/to/sam-3d-objects
24+
25+
mamba env create -f environments/default.yml
26+
mamba activate sam3d-objects
27+
28+
export PIP_EXTRA_INDEX_URL="https://pypi.ngc.nvidia.com https://download.pytorch.org/whl/cu121"
29+
pip install -e '.[dev]'
30+
pip install -e '.[p3d]'
31+
32+
export PIP_FIND_LINKS="https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu121.html"
33+
pip install -e '.[inference]'
34+
35+
./patching/hydra
36+
```
37+
38+
## Hugging Face checkpoints
39+
Access is required for `facebook/sam-3d-objects`.
40+
```
41+
pip install 'huggingface-hub[cli]<1.0'
42+
hf auth login
43+
44+
TAG=hf
45+
hf download \
46+
--repo-type model \
47+
--local-dir checkpoints/${TAG}-download \
48+
--max-workers 1 \
49+
facebook/sam-3d-objects
50+
mv checkpoints/${TAG}-download/checkpoints checkpoints/${TAG}
51+
rm -rf checkpoints/${TAG}-download
52+
```
53+
54+
## Sanity checks
55+
```
56+
nvidia-smi
57+
mamba info | rg "platform|platforms"
58+
59+
python - <<'PY'
60+
import torch
61+
print("cuda:", torch.cuda.is_available())
62+
if torch.cuda.is_available():
63+
print(torch.cuda.get_device_name(0))
64+
PY
65+
```
66+
67+
## Quick run
68+
```
69+
python demo.py
70+
```

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,40 @@ SAM 3D Objects is one part of SAM 3D, a pair of models for object and human mesh
2929

3030
Follow the [setup](doc/setup.md) steps before running the following.
3131

32+
## Slurm quickstart (cluster navigation)
33+
34+
This project is often run on a Slurm cluster. Here are the core concepts and the most common commands.
35+
36+
**Concepts**
37+
- Controller: the login node where you run Slurm commands (`sinfo`, `squeue`).
38+
- Node: a compute machine (e.g. `gpu01`); jobs run here.
39+
- Partition: a queue of nodes with shared policies (e.g. `defq`, `a6000`).
40+
- Job/step: a scheduled unit of work (`sbatch` for batch jobs, `srun` for steps).
41+
- GRES/TRES: resource labels like GPUs (`gres/gpu=1`) and memory/CPU tracking.
42+
43+
**Find resources**
44+
- Nodes and state: `sinfo -N -l`
45+
- Node details (GPUs/CPU/RAM): `scontrol show node gpu01`
46+
- Your jobs: `squeue -u $USER`
47+
- Watch your queue: `watch -n 2 "squeue -u $USER -o '%.18i %.9P %.20j %.8T %.10M %.6D %R'"`
48+
49+
**Run work**
50+
- Interactive shell on a node: `srun -N 1 -n 1 -c 4 --mem=16G --pty bash`
51+
- Run a command on a specific node: `srun -w gpu01 hostname`
52+
- Request GPUs (required for `nvidia-smi` to see devices):
53+
`srun -w gpu01 --gres=gpu:1 nvidia-smi -L`
54+
- Batch job (script):
55+
`sbatch path/to/job.sh`
56+
57+
**Control jobs**
58+
- Cancel job: `scancel <jobid>`
59+
- Inspect job: `scontrol show job <jobid>`
60+
61+
**Resource flags (common)**
62+
- CPUs: `-c 8` or `--cpus-per-task=8`
63+
- Memory: `--mem=64G` or `--mem-per-cpu=4G`
64+
- GPUs: `--gres=gpu:1` (or `--gpus-per-task=1` if configured)
65+
3266
## Single or Multi-Object 3D Generation
3367

3468
SAM 3D Objects can convert masked objects in an image, into 3D models with pose, shape, texture, and layout. SAM 3D is designed to be robust in challenging natural images, handling small objects and occlusions, unusual poses, and difficult situations encountered in uncurated natural scenes like this kidsroom:

SAM3D_SETUP_NOTES.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# SAM 3D Objects - Cluster Setup Notes
2+
3+
This document captures the findings and a complete setup flow for `sam-3d-objects` on this Slurm cluster using mamba.
4+
5+
## Repository Location
6+
7+
- Repo path: `$REPO_ROOT`
8+
9+
## Prerequisites (from `doc/setup.md`)
10+
11+
- Linux 64-bit (mamba platform `linux-64`).
12+
- NVIDIA GPU with at least 32 GB VRAM.
13+
- Build on a GPU node to avoid PyTorch3D "Not compiled with GPU support" errors.
14+
15+
## Slurm Findings
16+
17+
Partitions observed:
18+
19+
- `defq` (nodes `gpu01-08`)
20+
- `a6000` (node `gpu09`)
21+
22+
GPU resources for `a6000`:
23+
24+
- `gpu09` has `gres=gpu:4` and is in partition `a6000`.
25+
- Use this partition to satisfy the >= 32 GB VRAM requirement (A6000 is typically 48 GB).
26+
27+
## Recommended Interactive Allocation
28+
29+
```
30+
salloc -p a6000 --gres=gpu:1 --cpus-per-task=8 --mem=32G --time=02:00:00
31+
srun --pty bash
32+
```
33+
34+
## Environment Setup (mamba)
35+
36+
From `doc/setup.md`:
37+
38+
```
39+
cd $REPO_ROOT
40+
41+
mamba env create -f environments/default.yml
42+
mamba activate sam3d-objects
43+
44+
export PIP_EXTRA_INDEX_URL="https://pypi.ngc.nvidia.com https://download.pytorch.org/whl/cu121"
45+
pip install -e '.[dev]'
46+
pip install -e '.[p3d]'
47+
48+
export PIP_FIND_LINKS="https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.5.1_cu121.html"
49+
pip install -e '.[inference]'
50+
51+
./patching/hydra
52+
```
53+
54+
## GPU and Platform Verification
55+
56+
```
57+
nvidia-smi
58+
mamba info | rg "platform|platforms"
59+
```
60+
61+
Expected:
62+
63+
- GPU present and visible in `nvidia-smi`.
64+
- `platform : linux-64` in `mamba info`.
65+
66+
## Hugging Face Checkpoints
67+
68+
Access required for `facebook/sam-3d-objects`.
69+
70+
```
71+
pip install 'huggingface-hub[cli]<1.0'
72+
hf auth login
73+
74+
TAG=hf
75+
hf download \
76+
--repo-type model \
77+
--local-dir checkpoints/${TAG}-download \
78+
--max-workers 1 \
79+
facebook/sam-3d-objects
80+
mv checkpoints/${TAG}-download/checkpoints checkpoints/${TAG}
81+
rm -rf checkpoints/${TAG}-download
82+
```
83+
84+
## Sanity Check (CUDA)
85+
86+
```
87+
python - <<'PY'
88+
import torch
89+
print("cuda:", torch.cuda.is_available())
90+
if torch.cuda.is_available():
91+
print(torch.cuda.get_device_name(0))
92+
PY
93+
```

download_model.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from huggingface_hub import hf_hub_download
2+
3+
path = hf_hub_download("facebook/sam-3d-objects", "pipeline.yaml")

repro/capture_state.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
source "$(dirname "$0")/env.sh"
4+
5+
mkdir -p "${REPO}/repro/state"
6+
7+
# repo revision
8+
git -C "${REPO}" rev-parse HEAD > "${REPO}/repro/state/git_commit.txt"
9+
git -C "${REPO}" status --porcelain > "${REPO}/repro/state/git_dirty.txt" || true
10+
11+
# container fingerprint
12+
sha256sum "${SIF}" > "${REPO}/repro/state/container.sha256"
13+
apptainer inspect "${SIF}" > "${REPO}/repro/state/container.inspect.txt" || true
14+
15+
# environment package locks
16+
./repro/container_exec.sh "
17+
ENV_PREFIX=\$(micromamba run -n sam3d-objects python -c 'import sys; print(sys.prefix)')
18+
echo \"ENV_PREFIX=\$ENV_PREFIX\" > repro/state/env_prefix.txt
19+
20+
micromamba list -n sam3d-objects > repro/state/micromamba_list.txt
21+
micromamba list -n sam3d-objects --explicit > repro/state/micromamba_explicit.txt
22+
23+
micromamba run -n sam3d-objects python -m pip freeze > repro/state/pip_freeze.txt
24+
25+
nvidia-smi > repro/state/nvidia-smi.txt || true
26+
nvcc --version > repro/state/nvcc.txt || true
27+
ldd --version | head -n 1 > repro/state/glibc.txt || true
28+
"

repro/container_exec.sh

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
source "$(dirname "$0")/env.sh"
4+
5+
# command to run inside container
6+
CMD="${*:-bash}"
7+
8+
apptainer exec --nv --cleanenv \
9+
--bind "${BASE}:${BASE}" \
10+
--bind "${SCRATCH}:${SCRATCH}" \
11+
"${SIF}" bash -lc "
12+
set -euo pipefail
13+
14+
export HOME="\${HOME}"
15+
16+
export SCRATCH='${SCRATCH}'
17+
export XDG_CACHE_HOME='${XDG_CACHE_HOME}'
18+
export HF_HOME='${HF_HOME}'
19+
export TORCH_HOME='${TORCH_HOME}'
20+
export TMPDIR='${TMPDIR}'
21+
22+
export MAMBA_ROOT_PREFIX='${MAMBA_ROOT_PREFIX}'
23+
export MAMBA_PKGS_DIRS='${MAMBA_PKGS_DIRS}'
24+
export CONDA_PKGS_DIRS='${CONDA_PKGS_DIRS}'
25+
26+
export PATH='${SCRATCH}/bin':/usr/local/cuda/bin:\$PATH
27+
export CUDA_HOME=/usr/local/cuda
28+
export CUDACXX=/usr/local/cuda/bin/nvcc
29+
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:\${LD_LIBRARY_PATH:-}
30+
31+
# Make conda env libs visible at runtime (critical for open3d/kaolin/etc.)
32+
ENV_PREFIX=\$(micromamba run -n sam3d-objects python -c 'import sys; print(sys.prefix)')
33+
export LD_LIBRARY_PATH=\"\$ENV_PREFIX/lib:\$LD_LIBRARY_PATH\"
34+
35+
export TORCH_CUDA_ARCH_LIST='${TORCH_CUDA_ARCH_LIST}'
36+
export SAM3D_HF_DIR='${SAM3D_HF_DIR}'
37+
38+
cd '${REPO}'
39+
${CMD}
40+
"

repro/env.sh

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
# ---- site-specific paths ----
5+
export BASE="${HOME}/data/${USER}"
6+
export REPO="${BASE}/projects/sam-3d-objects"
7+
8+
export SCRATCH_ROOT="/path/to/scratch"
9+
export SCRATCH="${SCRATCH_ROOT}/${USER}"
10+
export SIF="${SCRATCH}/containers/cuda121-ubuntu22.sif"
11+
12+
# micromamba root + packages on Lustre (avoid /trinity/home caches)
13+
export MAMBA_ROOT_PREFIX="${SCRATCH}/micromamba"
14+
export MAMBA_PKGS_DIRS="${SCRATCH}/micromamba/pkgs"
15+
export CONDA_PKGS_DIRS="${SCRATCH}/micromamba/pkgs"
16+
17+
# caches on Lustre
18+
export XDG_CACHE_HOME="${SCRATCH}/cache"
19+
export HF_HOME="${SCRATCH}/cache/huggingface"
20+
export TORCH_HOME="${SCRATCH}/cache/torch"
21+
export SAM3D_HF_DIR="${SCRATCH}/sam3d-hf"
22+
23+
# optional: keep pip temp on Lustre too
24+
export TMPDIR="${SCRATCH}/tmp"
25+
26+
# CUDA build target (RTX A5000)
27+
export TORCH_CUDA_ARCH_LIST="8.6+PTX"
28+
29+
mkdir -p \
30+
"${SCRATCH}/containers" \
31+
"${MAMBA_ROOT_PREFIX}" "${MAMBA_PKGS_DIRS}" \
32+
"${XDG_CACHE_HOME}" "${HF_HOME}" "${TORCH_HOME}" \
33+
"${SAM3D_HF_DIR}" "${TMPDIR}"

repro/install_deps.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
source "$(dirname "$0")/env.sh"
4+
5+
"$(dirname "$0")/container_exec.sh" "
6+
# keep pip stable + avoid packaging 25 issues
7+
micromamba run -n sam3d-objects python -m pip install -U 'pip==24.3.1' 'setuptools' 'wheel' 'packaging<25'
8+
9+
# build backend used by the repo
10+
micromamba run -n sam3d-objects python -m pip install -U hatchling hatch-requirements-txt editables
11+
12+
# git needed for git+https deps
13+
micromamba install -y -n sam3d-objects -c conda-forge git
14+
15+
# runtime libs for open3d
16+
micromamba install -y -n sam3d-objects -c conda-forge \
17+
xorg-libx11 xorg-libxext xorg-libxrender xorg-libxi xorg-libxfixes xorg-libxrandr \
18+
libgl libegl libglu mesalib libcxx libcxxabi
19+
20+
# ensure open3d extension is executable (ldd warning you saw)
21+
ENV_PREFIX=\$(micromamba run -n sam3d-objects python -c 'import sys; print(sys.prefix)')
22+
chmod a+rx \"\$ENV_PREFIX/lib/python3.11/site-packages/open3d/cpu/\"pybind*.so || true
23+
24+
# install project + inference extras (build gsplat against container nvcc)
25+
micromamba run -n sam3d-objects python -m pip uninstall -y gsplat || true
26+
micromamba run -n sam3d-objects python -m pip install -v --no-build-isolation -e '.[inference]'
27+
"

repro/pull_container.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
source "$(dirname "$0")/env.sh"
4+
5+
apptainer pull "${SIF}" docker://nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
6+
7+
# record immutable fingerprint
8+
sha256sum "${SIF}" | tee "${REPO}/repro/container.sha256"
9+
apptainer inspect "${SIF}" > "${REPO}/repro/container.inspect.txt" || true

0 commit comments

Comments
 (0)