WIP: Add a new remote execution system that is integrated in the cuopt engine#769
WIP: Add a new remote execution system that is integrated in the cuopt engine#769tmckayus wants to merge 39 commits intoNVIDIA:release/26.02from
Conversation
This adds a header that provides convenient aliases for data_model_view_t and span<T> in the cuopt::linear_programming namespace. The canonical implementation remains in cuopt::mps_parser to maintain mps_parser as a standalone library.
- Refactored problem_and_stream_view_t to use unique_ptr for both CPU and GPU storage - Check is_remote_solve_enabled() at problem creation time - Local solve: store problem directly in GPU memory (optimization_problem_t) - Remote solve: store problem in CPU memory (problem_cpu_data_t) - cuOptSolve routes to appropriate path based on is_remote_configured flag - Updated all cuOptGet* functions to read from correct storage - No memory overhead: only one storage type is allocated per problem - Maintains backward compatibility with existing C API
- Add is_device_memory field to data_model_view_t to track memory location - Simplify solve_lp/solve_mip overloads: mps_data_model_t now creates view and delegates - C API checks remote config at creation time: GPU memory for local, CPU for remote - Handle edge case: GPU data + remote config copies to CPU with warning - Add white-box tests for all local/remote + CPU/GPU combinations
- Modify cython_solve.cu to check for remote solve configuration early - When remote is enabled, mark data_model as CPU memory and call solve_lp/solve_mip with data_model_view_t overloads - Fix segfault by initializing warm start data to empty device buffers instead of nullptr when returning from remote solve path - Local solve path unchanged - still creates GPU optimization_problem_t
…emote solve - Add is_device_memory flag to optimization_problem_solution_t and mip_solution_t - Use std::unique_ptr for both CPU (std::vector) and GPU (rmm::device_uvector) storage - Add _host() getters for C++ users to access CPU-resident solution data - Update cython_solve.cu to populate CPU or GPU storage based on solve type - Update Python wrapper to transparently handle both CPU and GPU solution data - Backward compatible: existing GPU-based code unchanged, Python users need no changes
- Check is_remote_solve_enabled() before any CUDA initialization in CLI - Skip raft::handle creation and memory resource setup for remote solve - Use CPU-only solution constructors when remote solve returns error - Handle nullptr handle_ptr gracefully in solve_lp/solve_mip - Enables CLI to run on hosts without GPU when remote solve is configured
- Make raft::handle_t in problem_and_stream_view_t lazy (unique_ptr) - Make stream_view in solution_and_stream_view_t optional - Check is_remote_solve_enabled() before creating handle in cuOptSolve - Update cuOptGetPrimalSolution/DualSolution/ReducedCosts to handle CPU data - Enables C API to run on hosts without GPU when remote solve is configured
- cython_solve.cu: Check remote solve before CUDA stream/handle creation - Remote solve path passes nullptr handle to solve_lp/solve_mip - Local solve path creates CUDA resources only when needed - cuopt/__init__.py: Make linear_programming and routing imports lazy - Uses __getattr__ for deferred module loading - Allows 'import cuopt' on CPU-only hosts with remote solve - solver_wrapper.pyx: Lazy CUDA imports and remote solve detection - Remove eager imports of cupy, cudf, numba.cuda - Add is_remote_solve_enabled() Python function - Add _get_cuda_imports() for lazy module loading - Update create_solution() to check is_device_memory_ for warm start - utilities/__init__.py: Lazy imports for CUDA-dependent utilities - type_cast, series_from_buf, check_solution via __getattr__ - utilities/utils.py: Lazy cudf/pylibcudf imports in series_from_buf Enables Python programs to run on CPU-only hosts when remote solve is configured via CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT.
Features: - Remote solve server (cuopt_remote_server) with protobuf serialization - Real-time log streaming from server to client via stdout capture - Transparent remote solve via CUOPT_REMOTE_HOST/PORT environment variables - Pluggable serialization interface (remote_serialization.hpp) - Static linking of libmps_parser into libcuopt for simpler deployment Components: - cuopt_remote_server.cpp: Server with streaming protocol - cuopt_remote.proto: Protocol Buffers schema for LP/MIP problems - protobuf_serializer.cu: Serialization implementation - remote_solve.cu: Client-side remote solve with streaming support Build: - Added BUILD_REMOTE_SERVER cmake option - Added cuopt_remote_server target to build.sh - Added mps_parser_static library for static linking The server captures stdout during solve and streams it to clients while also echoing to the server console (for docker logs, monitoring, etc.).
When users set constraints using set_row_types() + set_constraint_bounds() instead of set_constraint_lower/upper_bounds(), the serializer now computes the constraint lower/upper bounds from the row types: - 'E' (equality): lb = ub = rhs - 'L' (less-than): lb = -inf, ub = rhs - 'G' (greater-than): lb = rhs, ub = inf - 'N' (free): lb = -inf, ub = inf Also fixed server to always use streaming protocol for solutions (even when --no-stream is set, only log streaming is disabled).
- Integrate solve_mip_remote() into solve_mip() dispatch - Serialize variable types (integer/binary) to protobuf is_integer/is_binary fields - Deserialize variable types on server side - Fix is_mip_request() to check header.problem_type for reliable detection MIP remote solve now correctly handles integer and binary variables.
- Add to_host() method to LP/MIP solution classes for GPU->CPU copy - Server calls to_host() before serialization, simplifying serializer impl - Add dynamic serializer loading via CUOPT_SERIALIZER_LIB env var - Add MsgPack serializer plugin as proof-of-concept - Add serialization plugin development guide
Features:
- Async job management: SUBMIT_JOB, CHECK_STATUS, GET_RESULT, DELETE_RESULT, GET_LOGS
- Worker processes with shared memory job queues (fork-based, not threads)
- File-based log streaming for async mode (matches Python server architecture)
- Per-job log files: /tmp/cuopt_logs/log_{job_id}
- Incremental reads via byte offset (frombyte parameter)
- Automatic cleanup on job deletion
- Sync mode still uses real-time socket streaming
- CUOPT_REMOTE_USE_SYNC=1 env var to force sync mode
Protocol additions:
- RequestType.GET_LOGS for log retrieval
- AsyncRequest.frombyte for incremental log reads
- LogsResponse with log_lines, nbytes, job_exists
Client modes:
- Sync (blocking=true): Real-time log streaming over socket
- Async (blocking=false): Submit -> Poll (logs + status) -> Get -> Delete
Server (cuopt_remote_server.cpp): - Add worker process monitoring with waitpid - Automatically restart dead workers - Mark in-progress jobs as FAILED when worker dies - Add flush calls to monitoring log messages Msgpack serializer: - Implement full async interface support - Add async message types (MSG_ASYNC_LP/MIP_REQUEST, etc.) - Implement server-side methods: is_async_request, is_blocking_request, extract_problem_data, get_async_request_type, get_job_id, get_frombyte - Implement response serialization: serialize_submit_response, serialize_status_response, serialize_result_response, serialize_delete_response, serialize_logs_response - Fix is_mip_request to detect both sync and async MIP formats Tested with LP (afiro.mps) and MIP (flugpl.mps) using msgpack async.
- docs/developer/REMOTE_SOLVE_GUIDE.md: Comprehensive guide covering server setup, client configuration, async/sync modes, job lifecycle, custom serialization, worker monitoring, and troubleshooting - docs/developer/SOLUTION_MEMORY_ARCHITECTURE.md: Documents the hybrid GPU/CPU memory strategy for solutions, including to_host() method, accessor patterns, and design rationale Deleted 56 obsolete markdown files from root directory that contained outdated information from previous development iterations.
Features: - CANCEL_JOB: Cancel queued jobs (marked cancelled, workers skip them) or running jobs (worker killed via SIGKILL, auto-restarted) - WAIT_FOR_RESULT: Blocking wait using condition variables (no polling) Server thread sleeps until job completes, fails, or is cancelled - cuopt_cancel_job: Standalone CLI utility to cancel jobs by ID Protocol changes: - Added CANCEL_JOB (type 5) and WAIT_FOR_RESULT (type 6) request types - Added CANCELLED (5) to JobStatus enum - Added CancelResponse message to protobuf schema Server changes: - Added cancelled flag to shared memory JobQueueEntry - Workers check cancelled flag before processing claimed jobs - Cancel handler kills worker process for running jobs - wait_for_result() uses JobWaiter condition variables - Fixed check_job_status() to detect PROCESSING state - Thread-per-connection model (already present) enables blocking waits Serialization: - Added cancel methods to remote_serialization.hpp base class - Implemented in both protobuf and msgpack serializers - Added MSG_WAIT_REQUEST (17) to msgpack message types Documentation: - Added docs/remote_solve_architecture.md for security review - Covers protocol, process model, data flow, security considerations
C++ API: - Added cancel_job_remote() function to remote_solve.hpp/cu - Added remote_job_status_t enum and cancel_job_result_t struct - Simplified cuopt_cancel_job utility to use the new API (194→105 lines) Python API: - Added cuopt.linear_programming.remote module - cancel_job() function for cancelling remote jobs - JobStatus enum and CancelResult dataclass - Exported from cuopt.linear_programming package All three APIs tested and verified working: - Raw protobuf cancel (baseline) - C++ utility cancel (cuopt_cancel_job) - Python API cancel (cancel_job)
Server-side: - All jobs (blocking and non-blocking) now go through the shared memory job queue and are processed by worker processes - For blocking=true requests, server submits to queue then waits on condition variable until completion, then auto-deletes and returns - This enables cancellation for 'sync' jobs and allows concurrent solves - Single code path for all job processing Client-side: - Updated solve_lp_remote and solve_mip_remote to use the unified response format (result_response) for blocking mode - Removed streaming log dependency for blocking mode Benefits: - Cancellation works for all jobs (blocking and non-blocking) - Multiple concurrent 'sync' solves possible (one per worker) - Simpler code with single processing path - Worker isolation (one job failure doesn't crash server)
- Rename protobuf OptimizationProblem fields to match data_model_view_t: * constraint_matrix_values -> A_values (matches A_) * constraint_matrix_indices -> A_indices (matches A_indices_) * constraint_matrix_offsets -> A_offsets (matches A_offsets_) * objective_coefficients -> c (matches c_) * constraint_bounds -> b (matches b_) * is_integer/is_binary -> variable_types (bytes, matches variable_types_) - Add missing fields to protobuf and msgpack serialization: * initial_primal_solution - for warm start initial primal values * initial_dual_solution - for warm start initial dual values * Q_values, Q_indices, Q_offsets - quadratic objective matrix for QPS - Update msgpack serializer to use identical field names as protobuf - Update protobuf_serializer.cu serialization/deserialization - Both serializers now handle all data_model_view_t fields
- Rename A_values -> A (matches data_model_view_t::A_) - Remove unused problem_category field and enum - Rename PDLPSolverMode enum values to match pdlp_solver_mode_t: PDLP_MODE_STABLE1 -> Stable1, etc. - Rename LPMethod enum values to match method_t: METHOD_CONCURRENT -> Concurrent, METHOD_BARRIER -> Barrier, etc. - Update protobuf_serializer.cu for new enum names - Update msgpack_serializer.cpp for A field rename
- Remove redundant n_rows, n_cols, nnz fields (sizes inferred from arrays) - Add variable_names and row_names fields (matching protobuf) - Msgpack and protobuf now use identical field names for OptimizationProblem
Shared Memory Changes: - Replace fixed 64MB/128MB embedded data arrays with per-job shared memory - JobQueueEntry and ResultQueueEntry now store shm_data_name instead of data[] - Create /cuopt_job_<id> and /cuopt_result_<id> segments on demand - Cleanup segments after job completion or on error - Change data_size from uint32_t to uint64_t for >4GB support Wire Protocol Changes: - Change message size from uint32_t (4GB max) to uint64_t - Update both server (cuopt_remote_server.cpp) and client (remote_solve.cu) - Increase sanity check limit from 100MB to 16GB Benefits: - No fixed size limits - problems limited only by system RAM - No wasted memory for small problems - Queue entries are now ~256 bytes instead of 64MB+ - Supports very large LP/MIP problems
- Add is_mip parameter to serialize_result_response to properly distinguish LP vs MIP solutions (protobuf parsing is lenient and was misinterpreting MIP results as LP) - Add get_job_is_mip helper function to retrieve job type from tracker - Change job_id fields from string to bytes in protobuf schema to avoid spurious UTF-8 validation warnings when parsing wrong message types - Update all serialize_result_response call sites to pass is_mip flag Both LP and MIP remote solve now work correctly with Python Problem class.
Pipes are container-friendly (no /dev/shm size limits or volume mounts needed). Shared memory may be slightly faster for very large problems. Changes: - Add WorkerPipes struct with pipe file descriptors - Add worker_index, data_sent fields to JobQueueEntry for pipe sync - Add worker_index field to ResultQueueEntry for pipe mode - Add pending_job_data map for storing data before worker claims job - Add pipe I/O helpers: write_to_pipe, read_from_pipe, send_job_data_pipe, recv_job_data_pipe, send_result_pipe, recv_result_pipe - Modify spawn_workers to create pipes before forking - Modify spawn_single_worker to recreate pipes when restarting workers - Modify worker_process to read job data from pipe or shm based on mode - Modify result_retrieval_thread to send job data to workers (pipe mode) and read results from pipes - Add --use-shm command line option - Display IPC mode in server startup message Tested: Both pipe mode (default) and shm mode work correctly with LP and MIP remote solve via Python Problem class.
When a running job is cancelled, the worker is killed with SIGKILL which closes its pipe ends. If the server is in the middle of writing to the worker's input pipe, it receives SIGPIPE which would terminate the server. By ignoring SIGPIPE, the write() call returns -1 with errno=EPIPE instead, which is handled gracefully (logged as 'Failed to send job data'). Tested: Both pipe mode and shm mode work correctly with job cancellation and worker restart.
These scripts are for manual testing during development and won't be run by CI (pytest looks in specific directories like python/cuopt/). - test_pipe_cancel_restart.py: Tests job cancellation and worker restart - test_python_problem_remote.py: Tests Python Problem class with remote solve
Move protobuf::libprotobuf from PUBLIC to PRIVATE linking for both cuopt and cuopt_remote_server libraries. This fixes Python wheel build failures where protobuf was incorrectly exposed as a public dependency.
Remove unnecessary data_sent flag wait loop that caused 30s timeout. The pipe read operation naturally blocks until data is available, so explicit polling of the data_sent flag is redundant and can cause the worker to timeout before the server writes data. This fixes the 'Timeout waiting for job data' error when using pipe-based IPC with large problems. Bug: Worker would timeout after 30 seconds waiting for data_sent flag Fix: Worker now directly reads from pipe (blocking read) Result: Data transfer happens as soon as server writes to pipe
Document all three operating modes (Sync, Async, WAIT_FOR_RESULT) with: - Detailed workflow descriptions and use cases - Log retrieval methods for each mode - Complete API endpoint reference - Python usage examples and best practices - Architecture diagrams for all modes - Future API design recommendations This consolidates the remote solve protocol documentation into a single reference for users and developers.
- Fix send_typed_message to write 8-byte payload size (was incorrectly writing 4 bytes) - Fix pipe mode deadlock: set result.ready=true BEFORE writing to pipe (prevents deadlock when result size exceeds pipe buffer ~64KB) - Add 2-minute poll() timeout in read_from_pipe() for safety if worker dies - Add #include <poll.h> for poll() function
|
/ok to test afd25ed |
- Add libprotobuf to conda recipe build and host requirements - Add libprotobuf to dependencies.yaml build_cpp section - Regenerate conda environment files with libprotobuf Fixes CI build failure where protobuf library was not found during CMake configuration for remote solve server implementation.
|
/ok to test 28ef558 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @conda/environments/all_cuda-130_arch-x86_64.yaml.orig:
- Around line 1-82: The file all_cuda-130_arch-x86_64.yaml.orig should be
deleted from the commit because it represents an unsupported CUDA 13.0
environment and is missing the libprotobuf dependency; remove the file entirely
from the branch (git rm --cached/--force as appropriate) and commit that
deletion, then verify no other generated env files reference CUDA 13.0 or lack
libprotobuf so the supported CUDA 12.9 and 13.1 variants remain intact.
In @conda/recipes/libcuopt/recipe.yaml:
- Line 65: The run requirements for the libcuopt package are missing
libprotobuf, causing runtime link failures because cpp/CMakeLists.txt links
libcuopt.so against protobuf::libprotobuf; update the recipe.yaml run section
for the libcuopt output (the run list around the current entries pinned at the
run block) to include libprotobuf alongside libboost-devel and librmm (i.e.,
insert a run entry "libprotobuf" in the run dependencies list so consumers of
libcuopt get protobuf at runtime).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
conda/environments/all_cuda-129_arch-aarch64.yamlconda/environments/all_cuda-129_arch-x86_64.yamlconda/environments/all_cuda-130_arch-x86_64.yaml.origconda/environments/all_cuda-131_arch-aarch64.yamlconda/environments/all_cuda-131_arch-x86_64.yamlconda/recipes/libcuopt/recipe.yamldependencies.yaml
🔇 Additional comments (5)
conda/environments/all_cuda-131_arch-aarch64.yaml (1)
35-35: LGTM - libprotobuf dependency correctly added.The addition aligns with the
dependencies.yamlsource file change and supports the new protobuf-based remote solve functionality.conda/environments/all_cuda-129_arch-x86_64.yaml (1)
35-35: LGTM - Consistent with other environment files.conda/environments/all_cuda-131_arch-x86_64.yaml (1)
35-35: LGTM - Consistent with other environment files.dependencies.yaml (1)
296-305: LGTM - libprotobuf correctly added to build_cpp dependencies.The placement under
build_cpp.commonwithoutput_types: condais appropriate for the C++ build dependency needed for protobuf-based remote serialization. This follows the existing pattern of other build dependencies in this section.conda/environments/all_cuda-129_arch-aarch64.yaml (1)
35-35: LGTM!The
libprotobufdependency addition is consistent with the PR's protobuf-based remote solve serialization requirements. The alphabetical placement is correct.
| # This file is generated by `rapids-dependency-file-generator`. | ||
| # To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`. | ||
| channels: | ||
| - rapidsai-nightly | ||
| - rapidsai | ||
| - conda-forge | ||
| dependencies: | ||
| - boost | ||
| - breathe | ||
| - bzip2 | ||
| - c-compiler | ||
| - ccache | ||
| - clang-tools=20.1.4 | ||
| - clang==20.1.4 | ||
| - cmake>=3.30.4 | ||
| - cpp-argparse | ||
| - cuda-nvcc | ||
| - cuda-nvtx-dev | ||
| - cuda-python>=13.0.1,<14.0a0 | ||
| - cuda-sanitizer-api | ||
| - cuda-version=13.0 | ||
| - cudf==26.2.*,>=0.0.0a0 | ||
| - cupy>=13.6.0 | ||
| - cxx-compiler | ||
| - cython>=3.0.3 | ||
| - doxygen=1.9.1 | ||
| - exhale | ||
| - fastapi | ||
| - gcc_linux-64=14.* | ||
| - gmock | ||
| - gtest | ||
| - ipython | ||
| - jsonref==1.1.0 | ||
| - libcudss-dev >=0.7 | ||
| - libcurand-dev | ||
| - libcusolver-dev | ||
| - libcusparse-dev | ||
| - libraft-headers==26.2.*,>=0.0.0a0 | ||
| - librmm==26.2.*,>=0.0.0a0 | ||
| - make | ||
| - msgpack-numpy==0.4.8 | ||
| - msgpack-python==1.1.0 | ||
| - myst-nb | ||
| - myst-parser | ||
| - ninja | ||
| - notebook | ||
| - numba-cuda>=0.19.1,<0.20.0a0 | ||
| - numba>=0.60.0 | ||
| - numpy>=1.23.5,<3.0a0 | ||
| - numpydoc | ||
| - pandas>=2.0 | ||
| - pexpect | ||
| - pip | ||
| - pre-commit | ||
| - psutil>=6.0.0 | ||
| - pylibraft==26.2.*,>=0.0.0a0 | ||
| - pyrsistent | ||
| - pytest-cov | ||
| - pytest<8 | ||
| - python>=3.10,<3.14 | ||
| - rapids-build-backend>=0.4.0,<0.5.0.dev0 | ||
| - rapids-dask-dependency==26.2.*,>=0.0.0a0 | ||
| - rapids-logger==0.2.*,>=0.0.0a0 | ||
| - requests | ||
| - rmm==26.2.*,>=0.0.0a0 | ||
| - scikit-build-core>=0.10.0 | ||
| - sphinx | ||
| - sphinx-copybutton | ||
| - sphinx-design | ||
| - sphinx-markdown-tables | ||
| - sphinx_rtd_theme | ||
| - sphinxcontrib-openapi | ||
| - sphinxcontrib-websupport | ||
| - sysroot_linux-64==2.28 | ||
| - tbb-devel | ||
| - uvicorn==0.34.* | ||
| - zlib | ||
| - pip: | ||
| - nvidia_sphinx_theme | ||
| - swagger-plugin-for-sphinx | ||
| - veroviz | ||
| name: all_cuda-130_arch-x86_64 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the non-.orig version exists
echo "=== Checking for existing all_cuda-130_arch-x86_64.yaml ==="
fd -t f "all_cuda-130_arch-x86_64.yaml" conda/environments/
echo ""
echo "=== Checking libprotobuf presence in all environment files ==="
for f in conda/environments/all_cuda-*.yaml; do
if [ -f "$f" ]; then
echo -n "$f: "
grep -q "libprotobuf" "$f" && echo "HAS libprotobuf" || echo "MISSING libprotobuf"
fi
done
echo ""
echo "=== Listing any .orig files that may be artifacts ==="
fd -e orig . conda/environments/
echo ""
echo "=== Checking git status and history for .orig files ==="
cd conda/environments && git status --short | grep orig || echo "No .orig files in git status"Repository: NVIDIA/cuopt
Length of output: 694
🏁 Script executed:
#!/bin/bash
# Check if .orig file is tracked in git
echo "=== Checking if .orig file is tracked in git ==="
git ls-files | grep "all_cuda-130_arch-x86_64.yaml.orig"
echo ""
echo "=== Checking git log for this file ==="
git log --oneline -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig | head -5
echo ""
echo "=== List all all_cuda*.yaml* files to see naming pattern ==="
ls -la conda/environments/all_cuda*.yaml*Repository: NVIDIA/cuopt
Length of output: 804
🏁 Script executed:
#!/bin/bash
echo "=== Check all environment files in the directory ==="
ls -1 conda/environments/all_cuda*.yaml* | sort
echo ""
echo "=== Check what's in the git commit that added libprotobuf ==="
git show --stat 28ef558 | head -50
echo ""
echo "=== Check the full diff for the .orig file in that commit ==="
git show 28ef558 -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig | head -100Repository: NVIDIA/cuopt
Length of output: 5691
🏁 Script executed:
#!/bin/bash
echo "=== Check if dependencies.yaml mentions CUDA 130 ==="
grep -A 10 "130" dependencies.yaml | head -30
echo ""
echo "=== Count how many env variants are defined for each CUDA version in dependencies.yaml ==="
grep -E "cuda-130|cuda-129|cuda-131" dependencies.yaml | wc -lRepository: NVIDIA/cuopt
Length of output: 202
🏁 Script executed:
#!/bin/bash
echo "=== Search for CUDA version patterns in dependencies.yaml ==="
grep -i "cuda" dependencies.yaml | head -20
echo ""
echo "=== Check the structure of dependencies.yaml ==="
head -100 dependencies.yaml
echo ""
echo "=== Look for arch and cuda variants ==="
grep -E "arch|cuda" dependencies.yaml | grep -i "x86_64\|aarch64\|130\|129\|131" | head -20Repository: NVIDIA/cuopt
Length of output: 3306
🏁 Script executed:
#!/bin/bash
echo "=== Check git history for the .orig file ==="
git log --all --oneline -- conda/environments/all_cuda-130_arch-x86_64.yaml.orig
echo ""
echo "=== Check if there are any references to CUDA 130 in the codebase ==="
rg "130" conda/ --type yaml | head -20
echo ""
echo "=== Verify that CUDA 129 and 131 both have both arch variants ==="
echo "CUDA 129:"
ls -1 conda/environments/all_cuda-129* | sort
echo ""
echo "CUDA 131:"
ls -1 conda/environments/all_cuda-131* | sortRepository: NVIDIA/cuopt
Length of output: 504
Delete the all_cuda-130_arch-x86_64.yaml.orig file—it should not be committed.
This file represents an unsupported CUDA version. The dependencies.yaml only defines CUDA versions 12.9 and 13.1, which generate environment files for CUDA 129 and 131. CUDA 130 is not in the supported matrix and has no corresponding aarch64 variant (unlike the 129 and 131 versions which both have x86_64 and aarch64 pairs).
Additionally, this file is missing the libprotobuf dependency that was added across all other environment files in this PR. It appears to be a leftover from the file regeneration process and should be removed entirely rather than renamed.
🤖 Prompt for AI Agents
In @conda/environments/all_cuda-130_arch-x86_64.yaml.orig around lines 1 - 82,
The file all_cuda-130_arch-x86_64.yaml.orig should be deleted from the commit
because it represents an unsupported CUDA 13.0 environment and is missing the
libprotobuf dependency; remove the file entirely from the branch (git rm
--cached/--force as appropriate) and commit that deletion, then verify no other
generated env files reference CUDA 13.0 or lack libprotobuf so the supported
CUDA 12.9 and 13.1 variants remain intact.
| - tbb-devel | ||
| - zlib | ||
| - bzip2 | ||
| - libprotobuf |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the recipe.yaml file and check its structure
find . -name "recipe.yaml" -path "*libcuopt*" 2>/dev/nullRepository: NVIDIA/cuopt
Length of output: 92
🏁 Script executed:
# Look for CMakeLists.txt files that might contain protobuf references
find . -name "CMakeLists.txt" 2>/dev/null | head -20Repository: NVIDIA/cuopt
Length of output: 799
🏁 Script executed:
# Search for any protobuf references in the repository
rg -l "protobuf" --type yaml --type cmake 2>/dev/null | head -20Repository: NVIDIA/cuopt
Length of output: 367
🏁 Script executed:
# Read the recipe.yaml file to see the structure and all requirements
cat -n ./conda/recipes/libcuopt/recipe.yamlRepository: NVIDIA/cuopt
Length of output: 8706
🏁 Script executed:
# Check cpp/CMakeLists.txt for protobuf linking details
rg -n "protobuf" ./cpp/CMakeLists.txt -A 3 -B 1Repository: NVIDIA/cuopt
Length of output: 380
🏁 Script executed:
# Also check the main source CMakeLists for protobuf
rg -n "protobuf" ./cpp/src/CMakeLists.txt -A 3 -B 1Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Search for how protobuf is found/configured in CMakeLists
rg -n "find_package.*protobuf" ./cpp/CMakeLists.txt -i -A 5Repository: NVIDIA/cuopt
Length of output: 342
🏁 Script executed:
# Check if there's any mention of static protobuf linking
rg -n "protobuf.*STATIC\|Protobuf_USE_STATIC_LIBS\|Protobuf.*static" ./cpp/CMakeLists.txt -iRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Look for the full context of the protobuf link in libcuopt
rg -n "target_link_libraries.*cuopt" ./cpp/CMakeLists.txt -B 10 -A 10 | head -60Repository: NVIDIA/cuopt
Length of output: 1934
libprotobuf missing from libcuopt run requirements — add at line 161.
While libprotobuf is correctly added to build and host requirements, it must also be in the run requirements for the libcuopt output package. The CMakeLists.txt dynamically links libcuopt.so against protobuf::libprotobuf (line 370 in cpp/CMakeLists.txt), which means consumers of libcuopt require protobuf at runtime. Without this dependency, binaries and libraries linking to libcuopt.so will fail with missing symbol errors.
Add libprotobuf to the run requirements at line 161:
run:
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
- ${{ pin_subpackage("libmps-parser", exact=True) }}
- libboost-devel
- libprotobuf
- librmm =${{ minor_version }}
🤖 Prompt for AI Agents
In @conda/recipes/libcuopt/recipe.yaml at line 65, The run requirements for the
libcuopt package are missing libprotobuf, causing runtime link failures because
cpp/CMakeLists.txt links libcuopt.so against protobuf::libprotobuf; update the
recipe.yaml run section for the libcuopt output (the run list around the current
entries pinned at the run block) to include libprotobuf alongside libboost-devel
and librmm (i.e., insert a run entry "libprotobuf" in the run dependencies list
so consumers of libcuopt get protobuf at runtime).
- Add depends_on_protobuf section to dependencies.yaml - Include protobuf in py_rapids_build_libcuopt requirements - Regenerate python/libcuopt/pyproject.toml with protobuf Fixes Python wheel CI build failure where CMake could not find protobuf during the libcuopt wheel build. The previous commit added protobuf to conda builds, this commit adds it to pip wheel builds.
|
/ok to test 3b4e741 |
|
TODO: take another pass and replace client protocol with gRPC |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@ci/utils/install_protobuf.sh`:
- Around line 9-25: Pin Protobuf to a compatible version across all places:
update dependencies.yaml to require a specific Protobuf range (e.g. "protobuf:
>=3.20,<4.0"), update CMakeLists.txt's find_package invocation to include the
minimum version (e.g. change find_package(Protobuf REQUIRED) to
find_package(Protobuf 3.20 REQUIRED)), and change this install script
(install_protobuf.sh) to install that pinned version via the package manager
(adjust the dnf/apt package names or add version qualifiers / use apt-get
install protobuf-compiler=3.20.* or the equivalent distro-specific
package/version) and also ensure conda environment specs are updated to the same
version range; make sure error branches still instruct manual install if the
exact version is unavailable.
🧹 Nitpick comments (2)
ci/utils/install_protobuf.sh (2)
14-17: Consider cleaning up apt cache to reduce image size.In CI/container environments, cleaning up the apt cache after installation reduces the final image size.
Suggested improvement
elif [[ "$ID" == "ubuntu" ]]; then echo "Detected Ubuntu. Installing Protobuf via apt..." apt-get update - apt-get install -y libprotobuf-dev protobuf-compiler + apt-get install -y --no-install-recommends libprotobuf-dev protobuf-compiler + rm -rf /var/lib/apt/lists/* else
11-21: Consider adding Debian support.Since Debian is closely related to Ubuntu and uses the same package manager, you might want to support it explicitly. This would help if the CI environment or developer machines use Debian.
Suggested improvement
if [[ "$ID" == "rocky" ]]; then echo "Detected Rocky Linux. Installing Protobuf via dnf..." dnf install -y protobuf-devel protobuf-compiler - elif [[ "$ID" == "ubuntu" ]]; then - echo "Detected Ubuntu. Installing Protobuf via apt..." + elif [[ "$ID" == "ubuntu" || "$ID" == "debian" ]]; then + echo "Detected $ID. Installing Protobuf via apt..." apt-get update apt-get install -y libprotobuf-dev protobuf-compiler else
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ci/build_wheel_libcuopt.shci/utils/install_protobuf.sh
🔇 Additional comments (2)
ci/utils/install_protobuf.sh (1)
1-6: Good shell script setup.The strict mode (
set -euo pipefail) ensures the script fails fast on errors, which is appropriate for CI scripts.ci/build_wheel_libcuopt.sh (1)
20-22: LGTM!The protobuf installation step is correctly placed in the dependency installation sequence, following the established pattern used by other dependencies (Boost, TBB, cuDSS). The script's
set -euo pipefailensures any installation failure will abort the build.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| if [ -f /etc/os-release ]; then | ||
| . /etc/os-release | ||
| if [[ "$ID" == "rocky" ]]; then | ||
| echo "Detected Rocky Linux. Installing Protobuf via dnf..." | ||
| dnf install -y protobuf-devel protobuf-compiler | ||
| elif [[ "$ID" == "ubuntu" ]]; then | ||
| echo "Detected Ubuntu. Installing Protobuf via apt..." | ||
| apt-get update | ||
| apt-get install -y libprotobuf-dev protobuf-compiler | ||
| else | ||
| echo "Unknown OS: $ID. Please install Protobuf development libraries manually." | ||
| exit 1 | ||
| fi | ||
| else | ||
| echo "/etc/os-release not found. Cannot determine OS. Please install Protobuf development libraries manually." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there are version constraints specified elsewhere in the project
echo "=== Checking dependency files for protobuf version constraints ==="
rg "protobuf" --type yaml --type txt -g '!*.pb.*' -n
echo ""
echo "=== Checking CMake files for protobuf version requirements ==="
rg "find_package.*[Pp]rotobuf" -A2 -B2 -n
echo ""
echo "=== Searching for CMakeLists.txt files ==="
fd "CMakeLists.txt" -x cat -n {}
echo ""
echo "=== Checking for .proto files ==="
fd "\.proto$" -x cat -n {}
echo ""
echo "=== Checking requirements.txt or similar Python dependency files ==="
fd -E node_modules "(requirements.*\.txt|setup\.py|pyproject\.toml|poetry\.lock|Pipfile)" -x cat -n {}
echo ""
echo "=== Checking for docker or container files ==="
fd "(Dockerfile|\.dockerignore)" -x cat -n {}Repository: NVIDIA/cuopt
Length of output: 50369
🏁 Script executed:
#!/bin/bash
echo "=== Checking dependencies.yaml for protobuf version constraints ==="
rg -A5 "depends_on_protobuf|libprotobuf" dependencies.yaml
echo ""
echo "=== Checking conda environment files for libprotobuf version specs ==="
fd "all_cuda.*\.yaml" conda/environments/ -x grep -A1 -B1 "libprotobuf" {}
echo ""
echo "=== Checking conda recipe for libprotobuf ==="
grep -A2 -B2 "libprotobuf" conda/recipes/libcuopt/recipe.yaml
echo ""
echo "=== Checking if protobuf has version specifications anywhere ==="
rg "protobuf.*==" -nRepository: NVIDIA/cuopt
Length of output: 1147
🏁 Script executed:
#!/bin/bash
echo "=== Looking for .proto file to understand protobuf usage ==="
find . -name "*.proto" -type f -exec cat -n {} +
echo ""
echo "=== Checking what's in linear_programming/utilities/protobuf_serializer.cu ==="
fd "protobuf_serializer" -x head -30 {}Repository: NVIDIA/cuopt
Length of output: 14312
Add protobuf version constraints to ensure compatibility across distributions.
Protobuf version mismatches between build and runtime environments can cause serialization/deserialization issues. Currently, protobuf has no version pinning in dependencies.yaml, conda environments, or CMakeLists.txt (find_package(Protobuf REQUIRED) lacks version specification), while the install script installs without version constraints. Given that this PR introduces protobuf-based serialization for remote solving, add explicit version constraints to dependencies.yaml, CMakeLists.txt, and update the install script to match the pinned version.
🤖 Prompt for AI Agents
In `@ci/utils/install_protobuf.sh` around lines 9 - 25, Pin Protobuf to a
compatible version across all places: update dependencies.yaml to require a
specific Protobuf range (e.g. "protobuf: >=3.20,<4.0"), update CMakeLists.txt's
find_package invocation to include the minimum version (e.g. change
find_package(Protobuf REQUIRED) to find_package(Protobuf 3.20 REQUIRED)), and
change this install script (install_protobuf.sh) to install that pinned version
via the package manager (adjust the dnf/apt package names or add version
qualifiers / use apt-get install protobuf-compiler=3.20.* or the equivalent
distro-specific package/version) and also ensure conda environment specs are
updated to the same version range; make sure error branches still instruct
manual install if the exact version is unavailable.
|
/ok to test 0dcd54a |
|
/ok to test de8daf7 |
|
/ok to test 7660217 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@cpp/CMakeLists.txt`:
- Around line 621-624: Wrap the
add_subdirectory(src/linear_programming/utilities/serializers) call with the
BUILD_REMOTE_SERVER conditional so the serializers plugin is only built when
BUILD_REMOTE_SERVER is enabled; locate the existing add_subdirectory invocation
in CMakeLists and move or surround it with the if(BUILD_REMOTE_SERVER) ...
endif() block (keeping it near other remote-server related entries and
references such as cuopt_remote_server.cpp) to prevent building the msgpack
serializer when remote server support is disabled.
🧹 Nitpick comments (2)
cpp/CMakeLists.txt (2)
234-238: Prefer target-specific includes over globalinclude_directories().Using global
include_directories()adds paths to all targets in this directory, which can cause unintended include pollution. Modern CMake best practice is to usetarget_include_directories()on specific targets that need the paths.Since multiple targets need the generated proto headers (cuopt, cuopt_remote_server), consider either:
- Adding the paths explicitly to each target that needs them, or
- Creating an interface library for the proto headers
♻️ Suggested approach using target-specific includes
# Protocol Buffers for remote solve serialization find_package(Protobuf REQUIRED) -include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) -include_directories(${CMAKE_CURRENT_BINARY_DIR}) +# Protobuf includes added to targets via target_include_directories belowThen add to the cuopt target (around line 305-319):
target_include_directories(cuopt SYSTEM PRIVATE ${Protobuf_INCLUDE_DIRS}) target_include_directories(cuopt PRIVATE ${CMAKE_CURRENT_BINARY_DIR})And similarly for cuopt_remote_server.
575-581: Consider using PRIVATE for executable link dependencies.For executable targets, the
PUBLIC/PRIVATEdistinction ontarget_link_librarieshas no practical effect on the executable itself. UsingPRIVATEis the conventional choice and more clearly expresses intent.♻️ Suggested change
target_link_libraries(cuopt_remote_server - PUBLIC - cuopt - OpenMP::OpenMP_CXX PRIVATE + cuopt + OpenMP::OpenMP_CXX protobuf::libprotobuf )target_link_libraries(cuopt_cancel_job - PUBLIC - cuopt + PRIVATE cuopt )Also applies to: 609-612
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cpp/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/CMakeLists.txt
🔇 Additional comments (3)
cpp/CMakeLists.txt (3)
142-147: LGTM!Reasonable workaround for GCC's known false-positive
-Warray-boundsdiagnostics from external headers like Protobuf, while preserving-Werrorfor other warnings.
239-253: Proto generation setup looks correct.The custom command is properly configured with
DEPENDSfor rebuild tracking andVERBATIMfor safe argument handling.Minor observation:
PROTO_HDRSis defined but unused. Consider adding it to the cuopt target's sources ortarget_sources()to ensure proper dependency tracking for the header file, though in practice the.ccfile inclusion typically suffices.
264-267: LGTM!The protobuf sources are correctly added to the cuopt library, and the linking arrangement is appropriate:
mps_parser_staticembeds the parser symbols directlyprotobuf::libprotobufas PRIVATE keeps it an implementation detailAlso applies to: 375-377
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # ################################################################################################## | ||
| # - Pluggable Serializers -------------------------------------------------------------------------- | ||
| # Build optional serializer plugins (e.g., msgpack) | ||
| add_subdirectory(src/linear_programming/utilities/serializers) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if serializers are used outside of remote solve code paths
# Find references to serializer in the codebase excluding the serializers directory itself
echo "=== References to serializer outside serializers directory ==="
rg -n "serializer" --type cpp --type cmake -g '!**/serializers/**' -g '!**/test**' | head -100
echo ""
echo "=== Check if serializers CMakeLists.txt has its own guards ==="
fd -t f "CMakeLists.txt" --path "*/serializers/*" -exec cat {} \;
echo ""
echo "=== Check the context around line 621-624 in CMakeLists.txt ==="
rg -B 30 -A 5 "add_subdirectory.*serializers" cpp/CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 7539
🏁 Script executed:
#!/bin/bash
# Get serializers CMakeLists.txt
echo "=== Serializers CMakeLists.txt ==="
find . -path "*/serializers/CMakeLists.txt" -exec cat {} \;
echo ""
echo "=== Check BUILD_REMOTE_SERVER conditional structure in main CMakeLists.txt ==="
# Search for BUILD_REMOTE_SERVER conditions to see the structure
rg -n "BUILD_REMOTE_SERVER|add_subdirectory.*serializers" cpp/CMakeLists.txt | tail -50
echo ""
echo "=== Check protobuf_serializer.cu location and context ==="
fd -t f "protobuf_serializer.cu" -exec head -20 {} \;
echo ""
echo "=== Check if protobuf_serializer is conditioned in src/linear_programming/CMakeLists.txt ==="
rg -B 5 -A 5 "protobuf_serializer" cpp/src/linear_programming/CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 2658
Move the serializers subdirectory inside the BUILD_REMOTE_SERVER conditional.
The add_subdirectory(src/linear_programming/utilities/serializers) call builds the msgpack serializer plugin, which is only used by the remote server. All serializer references in the codebase are confined to cuopt_remote_server.cpp. Building this optional plugin when remote server support is disabled is wasteful and should be gated behind BUILD_REMOTE_SERVER.
🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 621 - 624, Wrap the
add_subdirectory(src/linear_programming/utilities/serializers) call with the
BUILD_REMOTE_SERVER conditional so the serializers plugin is only built when
BUILD_REMOTE_SERVER is enabled; locate the existing add_subdirectory invocation
in CMakeLists and move or surround it with the if(BUILD_REMOTE_SERVER) ...
endif() block (keeping it near other remote-server related entries and
references such as cuopt_remote_server.cpp) to prevent building the msgpack
serializer when remote server support is disabled.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@cpp/CMakeLists.txt`:
- Around line 266-269: The cuopt library is currently built with ${PROTO_SRCS}
always appended which breaks builds when Protobuf is disabled; ensure PROTO_SRCS
is defined to an empty value when the protobuf/remote-server branch is not taken
(e.g., set(PROTO_SRCS "" ) or guard with if(NOT DEFINED PROTO_SRCS)
set(PROTO_SRCS "")) so the add_library(cuopt SHARED ${CUOPT_SRC_FILES}
${PROTO_SRCS}) line works unconditionally; update the CMake logic that defines
PROTO_SRCS (the conditional that generates/sets it) to provide this default.
- Around line 236-255: Wrap the Protobuf dependency and proto generation in an
if(BUILD_REMOTE_SERVER) ... endif() guard so find_package(Protobuf REQUIRED),
the add_custom_command that produces PROTO_SRCS/PROTO_HDRS, and setting
PROTO_FILE/PROTO_SRCS/PROTO_HDRS only run when BUILD_REMOTE_SERVER is ON; remove
the unconditional include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) and
include_directories(${CMAKE_CURRENT_BINARY_DIR}) and instead call
target_include_directories(<the target that consumes PROTO_SRCS/PROTO_HDRS>
PRIVATE ${CMAKE_CURRENT_BINARY_DIR} SYSTEM ${Protobuf_INCLUDE_DIRS}) so only the
target that uses the generated headers gets the includes.
♻️ Duplicate comments (1)
cpp/CMakeLists.txt (1)
623-626: Move the serializers subdirectory inside theBUILD_REMOTE_SERVERconditional.This was flagged in a previous review. The
add_subdirectory(src/linear_programming/utilities/serializers)builds the msgpack serializer plugin, which is only used by the remote server.♻️ Suggested fix
-# ################################################################################################## -# - Pluggable Serializers -------------------------------------------------------------------------- -# Build optional serializer plugins (e.g., msgpack) -add_subdirectory(src/linear_programming/utilities/serializers)Move this inside the
if(BUILD_REMOTE_SERVER AND NOT BUILD_LP_ONLY)block above (before line 621).
🧹 Nitpick comments (2)
cpp/CMakeLists.txt (2)
376-379: Protobuf linkage should be conditional.The
protobuf::libprotobufis linked unconditionally. If the Protobuf dependency is made conditional onBUILD_REMOTE_SERVER, this linkage should also be guarded:♻️ Suggested refactor
PRIVATE ${CUOPT_PRIVATE_CUDA_LIBS} cuopt::mps_parser_static # Static link - symbols embedded in libcuopt.so - protobuf::libprotobuf + $<$<BOOL:${BUILD_REMOTE_SERVER}>:protobuf::libprotobuf> )
559-621: Remote server build section is well-structured.The
BUILD_REMOTE_SERVERguard and RUNPATH configuration are appropriate. Minor suggestions:
Redundant include directories (lines 567-576): For executables,
PUBLICinclude directories are not meaningful since nothing links against them. The same paths appear in bothPRIVATEandPUBLIC.Executable linkage (lines 577-583, 611-614): For executables,
PUBLICvsPRIVATElinkage is semantically equivalent. Consider using onlyPRIVATEfor clarity.♻️ Optional cleanup
target_include_directories(cuopt_remote_server PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/src" "${CMAKE_CURRENT_SOURCE_DIR}/include" "${CMAKE_CURRENT_SOURCE_DIR}/libmps_parser/include" "${CMAKE_CURRENT_BINARY_DIR}" - PUBLIC "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>" "$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}/include>" ) target_link_libraries(cuopt_remote_server - PUBLIC - cuopt - OpenMP::OpenMP_CXX PRIVATE + cuopt + OpenMP::OpenMP_CXX protobuf::libprotobuf )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cpp/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (12)
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/CMakeLists.txt
🔇 Additional comments (1)
cpp/CMakeLists.txt (1)
142-149: LGTM: Pragmatic warning suppression for third-party headers.The GCC-specific suppression of
-Warray-boundsand-Wstringop-overreadis a reasonable approach to avoid false positives from Protobuf headers while maintaining strict warnings for project code.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Protocol Buffers for remote solve serialization | ||
| find_package(Protobuf REQUIRED) | ||
| include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) | ||
| include_directories(${CMAKE_CURRENT_BINARY_DIR}) | ||
|
|
||
| # Generate C++ code from .proto file | ||
| set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto") | ||
| set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc") | ||
| set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h") | ||
|
|
||
| add_custom_command( | ||
| OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}" | ||
| COMMAND ${Protobuf_PROTOC_EXECUTABLE} | ||
| ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} | ||
| --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities | ||
| ${PROTO_FILE} | ||
| DEPENDS ${PROTO_FILE} | ||
| COMMENT "Generating C++ code from cuopt_remote.proto" | ||
| VERBATIM | ||
| ) |
There was a problem hiding this comment.
Protobuf is unconditionally required even when remote server is disabled.
The find_package(Protobuf REQUIRED) and proto generation occur unconditionally, adding a mandatory build dependency for all users even when BUILD_REMOTE_SERVER=OFF. This increases build complexity and external dependencies for users who only need local solves.
Consider making Protobuf conditional on BUILD_REMOTE_SERVER:
♻️ Suggested refactor
-# Protocol Buffers for remote solve serialization
-find_package(Protobuf REQUIRED)
-include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS})
-include_directories(${CMAKE_CURRENT_BINARY_DIR})
-
-# Generate C++ code from .proto file
-set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
-set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
-set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
-
-add_custom_command(
- OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
- COMMAND ${Protobuf_PROTOC_EXECUTABLE}
- ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
- --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
- ${PROTO_FILE}
- DEPENDS ${PROTO_FILE}
- COMMENT "Generating C++ code from cuopt_remote.proto"
- VERBATIM
-)
+# Protocol Buffers for remote solve serialization (optional)
+set(PROTO_SRCS "")
+if(BUILD_REMOTE_SERVER)
+ find_package(Protobuf REQUIRED)
+
+ set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto")
+ set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc")
+ set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h")
+
+ add_custom_command(
+ OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
+ COMMAND ${Protobuf_PROTOC_EXECUTABLE}
+ ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
+ --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
+ ${PROTO_FILE}
+ DEPENDS ${PROTO_FILE}
+ COMMENT "Generating C++ code from cuopt_remote.proto"
+ VERBATIM
+ )
+endif()Additionally, prefer target_include_directories() over global include_directories() for better encapsulation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Protocol Buffers for remote solve serialization | |
| find_package(Protobuf REQUIRED) | |
| include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) | |
| include_directories(${CMAKE_CURRENT_BINARY_DIR}) | |
| # Generate C++ code from .proto file | |
| set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto") | |
| set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc") | |
| set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h") | |
| add_custom_command( | |
| OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}" | |
| COMMAND ${Protobuf_PROTOC_EXECUTABLE} | |
| ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} | |
| --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities | |
| ${PROTO_FILE} | |
| DEPENDS ${PROTO_FILE} | |
| COMMENT "Generating C++ code from cuopt_remote.proto" | |
| VERBATIM | |
| ) | |
| # Protocol Buffers for remote solve serialization (optional) | |
| set(PROTO_SRCS "") | |
| if(BUILD_REMOTE_SERVER) | |
| find_package(Protobuf REQUIRED) | |
| set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto") | |
| set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc") | |
| set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h") | |
| add_custom_command( | |
| OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}" | |
| COMMAND ${Protobuf_PROTOC_EXECUTABLE} | |
| ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} | |
| --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities | |
| ${PROTO_FILE} | |
| DEPENDS ${PROTO_FILE} | |
| COMMENT "Generating C++ code from cuopt_remote.proto" | |
| VERBATIM | |
| ) | |
| endif() |
🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 236 - 255, Wrap the Protobuf dependency and
proto generation in an if(BUILD_REMOTE_SERVER) ... endif() guard so
find_package(Protobuf REQUIRED), the add_custom_command that produces
PROTO_SRCS/PROTO_HDRS, and setting PROTO_FILE/PROTO_SRCS/PROTO_HDRS only run
when BUILD_REMOTE_SERVER is ON; remove the unconditional
include_directories(SYSTEM ${Protobuf_INCLUDE_DIRS}) and
include_directories(${CMAKE_CURRENT_BINARY_DIR}) and instead call
target_include_directories(<the target that consumes PROTO_SRCS/PROTO_HDRS>
PRIVATE ${CMAKE_CURRENT_BINARY_DIR} SYSTEM ${Protobuf_INCLUDE_DIRS}) so only the
target that uses the generated headers gets the includes.
| add_library(cuopt SHARED | ||
| ${CUOPT_SRC_FILES} | ||
| ${PROTO_SRCS} | ||
| ) |
There was a problem hiding this comment.
Proto sources compiled into main library unconditionally.
The ${PROTO_SRCS} is added to the cuopt library regardless of whether remote server support is enabled. If Protobuf setup is made conditional per the previous comment, ensure PROTO_SRCS defaults to empty so this line works correctly in both cases.
🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 266 - 269, The cuopt library is currently
built with ${PROTO_SRCS} always appended which breaks builds when Protobuf is
disabled; ensure PROTO_SRCS is defined to an empty value when the
protobuf/remote-server branch is not taken (e.g., set(PROTO_SRCS "" ) or guard
with if(NOT DEFINED PROTO_SRCS) set(PROTO_SRCS "")) so the add_library(cuopt
SHARED ${CUOPT_SRC_FILES} ${PROTO_SRCS}) line works unconditionally; update the
CMake logic that defines PROTO_SRCS (the conditional that generates/sets it) to
provide this default.
|
closing, superseded by #799 |
Description
This change adds a new remote execution system which will ultimately replace the cuopt Python server. In this new implementation, remote solves are identical to local solves except that the env vars CUOPT_REMOTE_HOST and CUOPT_REMOTE_PORT are set to indicate a remote server.
The remote server code is included and can be built optionally with "build.sh cuopt_remote_server". It does not build by default.
Internal code paths have been modified to use data_model_view_t with non-owning pointers to store problems and solutions as GPU memory or CPU memory, the former for local solves and the latter for remote solves.
Some markdown files have been added that describe the changes:
Checklist
Need to:
evaluate the need for new tests
review the available markdowns and make sure the feature is well-documented
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.