Skip to content

Add memory model support for host-resident problem data#819

Open
tmckayus wants to merge 9 commits intoNVIDIA:release/26.02from
tmckayus:memory-model-clean
Open

Add memory model support for host-resident problem data#819
tmckayus wants to merge 9 commits intoNVIDIA:release/26.02from
tmckayus:memory-model-clean

Conversation

@tmckayus
Copy link
Contributor

@tmckayus tmckayus commented Jan 31, 2026

NOTE: Remote solve is stubbed in this commit and returns dummy solutions. Full gRPC implementation will be added in a follow-up PR.

This change adds problem and solution class hierarchies which allow data to be in GPU structures or host structures throughout. This is foundational to supporting remote execution integrated at the solver level.

The C API and Python API are unchanged.

If remote execution is enabled, the various APIs will construct a problem in host memory and CUDA initialization will be avoided. Remote solving is stubbed out in this change, and will return dummy values. When remote solving is done, solutions will be returned in host memory.

Remote execution is enabled by setting the following env vars:
CUOPT_REMOTE_HOST=somehost
CUOPT_REMOTE_PORT=1234

MEMORY_MODEL_ARCHITECTURE.md

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CPU-only execution support enabling solving on systems without GPU
    • Enabled remote execution capabilities for distributed optimization workflows
    • Introduced transparent backend selection to switch between GPU and CPU memory
  • Tests

    • Added comprehensive CPU-only execution test coverage
    • Extended solution interface validation tests

@tmckayus tmckayus requested review from a team as code owners January 31, 2026 02:03
@tmckayus
Copy link
Contributor Author

this is a cleaner version of #807

@tmckayus tmckayus added feature request New feature or request non-breaking Introduces a non-breaking change P0 labels Jan 31, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a dual-memory backend architecture for cuOpt's linear programming solver, enabling execution on both GPU and CPU. The refactoring introduces interface-based abstractions for problems and solutions, backend selection logic, and corresponding CPU implementations alongside existing GPU backends, with updates to C/C++ APIs, Python bindings, and extensive test coverage.

Changes

Cohort / File(s) Summary
Core Interface Definitions
cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp, cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp
Introduces polymorphic interface base classes optimization_problem_interface_t, lp_solution_interface_t, and mip_solution_interface_t to abstract GPU/CPU implementations with virtual methods for setters/getters, I/O, and conversions.
Problem and Solution Types
cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp, cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp, cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp, cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp
Defines concrete CPU/GPU solution wrappers (cpu_lp_solution_t, gpu_lp_solution_t, cpu_mip_solution_t, gpu_mip_solution_t) and CPU-side PDLP warm-start data structures with host vector storage and conversion utilities.
Backend Implementations
cpp/src/linear_programming/cpu_optimization_problem.cpp, cpp/src/linear_programming/gpu_optimization_problem.cu, cpp/src/linear_programming/solution_conversion.cu, cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu
Implements CPU and GPU problem containers implementing optimization_problem_interface_t, including data setters/getters, I/O, equivalence checks, and solution-to-return-type conversions with zero-copy transfers.
Backend Selection and Utilities
cpp/src/linear_programming/backend_selection.cpp, cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp, cpp/include/cuopt/linear_programming/solve_remote.hpp, cpp/src/linear_programming/solve_remote.cu
Adds environment-based backend detection, problem population from MPS/data-model views, and remote execution stubs for LP/MIP solving with GPU-to-CPU conversions.
C API Refactoring
cpp/src/linear_programming/cuopt_c.cpp, cpp/src/linear_programming/cuopt_c_internal.hpp
Refactors C API to use interface-backed problem/solution containers (problem_and_stream_view_t, solution_and_stream_view_t) with memory-backend-aware initialization and interface-based method dispatch (solve_lp/solve_mip routing, getter delegation).
Solver Entry Points
cpp/src/linear_programming/solve.cu, cpp/src/mip/solve.cu
Adds interface-based solver overloads with remote execution routing and CPU-to-GPU conversion paths for test-mode CPU solving, alongside existing GPU-only implementations.
Cython/Python Bindings
python/cuopt/cuopt/linear_programming/solver/solver.pxd, python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx, python/cuopt/cuopt/__init__.py, python/cuopt/cuopt/utilities/utils.py
Updates Cython type declarations to use std::variant for GPU/CPU solution returns, refactors solution creation to conditionally extract data from device or host buffers, and implements lazy module loading in Python.
Settings and Infrastructure
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/linear_programming/solver_settings.cu, cpp/src/linear_programming/pdlp.cu, cpp/include/cuopt/linear_programming/optimization_problem.hpp, cpp/src/linear_programming/optimization_problem.cu
Adds CPU warm-start data accessors to solver settings, implements move-based zero-copy setters for device vectors in optimization_problem_t, and updates warm-start populated checks to use is_populated() predicate.
Build and Test Infrastructure
cpp/src/linear_programming/CMakeLists.txt, cpp/tests/linear_programming/CMakeLists.txt, ci/run_ctests.sh, cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp, cpp/tests/linear_programming/c_api_tests/c_api_tests.h, cpp/tests/linear_programming/c_api_tests/c_api_test.c, cpp/tests/linear_programming/unit_tests/solution_interface_test.cu, python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py, python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py
Adds new source files to build, expands C API tests with solution-polymorphism and CPU-only execution scenarios, introduces CUDA unit tests for interface validation, and adds Python pytest suite for CPU-only execution paths with environment fixture management.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: introducing memory model support for host-resident problem data. It directly describes the primary architectural addition without being vague or overly broad.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (2)

41-51: ⚠️ Potential issue | 🟠 Major

Guard NULL PyObject returns before DECREF or use.

PyObject_CallMethod can return nullptr; the new Py_DECREF(res) will crash on failure, and numba_matrix / numpy_array can also be nullptr. Add NULL checks and use Py_XDECREF to avoid segfaults and ensure error propagation.

🔧 Proposed fix (safe NULL handling)
   void get_solution(void* data, void* objective_value) override
   {
-    PyObject* numba_matrix =
-      data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
-    PyObject* numpy_array =
-      data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    const bool on_device = data_on_device();
+    PyObject* numba_matrix =
+      on_device ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
+    PyObject* numpy_array =
+      on_device ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (!numba_matrix || !numpy_array) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      PyErr_Print();
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array);
-    Py_DECREF(numba_matrix);
-    Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    if (!res) { PyErr_Print(); }
+    Py_XDECREF(res);
+    Py_XDECREF(numba_matrix);
+    Py_XDECREF(numpy_array);
   }

80-90: ⚠️ Potential issue | 🟠 Major

Add NULL checks for set_solution callback objects.

Same failure mode as get_solution: PyObject_CallMethod can return nullptr, and Py_DECREF on it will crash. Guard and use Py_XDECREF, plus early return if inputs fail to construct.

🔧 Proposed fix (safe NULL handling)
   void set_solution(void* data, void* objective_value) override
   {
-    PyObject* numba_matrix =
-      data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
-    PyObject* numpy_array =
-      data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    const bool on_device = data_on_device();
+    PyObject* numba_matrix =
+      on_device ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
+    PyObject* numpy_array =
+      on_device ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (!numba_matrix || !numpy_array) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      PyErr_Print();
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "set_solution", "(OO)", numba_matrix, numpy_array);
-    Py_DECREF(numba_matrix);
-    Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    if (!res) { PyErr_Print(); }
+    Py_XDECREF(res);
+    Py_XDECREF(numba_matrix);
+    Py_XDECREF(numpy_array);
   }
cpp/src/linear_programming/cuopt_c.cpp (5)

50-73: ⚠️ Potential issue | 🟠 Major

Guard view creation to keep the C API exception-safe.
create_view_from_mps_data_model() is outside the try/catch; if it throws, the C boundary is crossed and problem_and_stream leaks. Wrap view creation and use RAII to ensure cleanup.

🛠️ Suggested fix (exception-safe view creation)
-  problem_and_stream_view_t* problem_and_stream = new problem_and_stream_view_t();
+  auto problem_and_stream = std::make_unique<problem_and_stream_view_t>();
   std::string filename_str(filename);
   bool input_mps_strict = false;
   std::unique_ptr<mps_data_t> mps_data_model_ptr;
   try {
     mps_data_model_ptr = std::make_unique<mps_data_t>(
       parse_mps<cuopt_int_t, cuopt_float_t>(filename_str, input_mps_strict));
   } catch (const std::exception& e) {
     CUOPT_LOG_INFO("Error parsing MPS file: %s", e.what());
-    delete problem_and_stream;
     *problem_ptr = nullptr;
     if (std::string(e.what()).find("Error opening MPS file") != std::string::npos) {
       return CUOPT_MPS_FILE_ERROR;
     } else {
       return CUOPT_MPS_PARSE_ERROR;
     }
   }
-  problem_and_stream->op_problem = mps_data_model_ptr.release();
-  problem_and_stream->view       = create_view_from_mps_data_model(*problem_and_stream->op_problem);
-  problem_and_stream->view.set_is_device_memory(false);
-
-  *problem_ptr = static_cast<cuOptOptimizationProblem>(problem_and_stream);
+  try {
+    problem_and_stream->view = create_view_from_mps_data_model(*mps_data_model_ptr);
+    problem_and_stream->view.set_is_device_memory(false);
+    problem_and_stream->op_problem = mps_data_model_ptr.release();
+  } catch (const std::exception& e) {
+    CUOPT_LOG_INFO("Error creating view from MPS data: %s", e.what());
+    *problem_ptr = nullptr;
+    return CUOPT_INVALID_ARGUMENT;
+  }
+
+  *problem_ptr = static_cast<cuOptOptimizationProblem>(problem_and_stream.release());

737-744: ⚠️ Potential issue | 🟡 Minor

Add the same op_problem null check as other getters.
This keeps cuOptIsMIP aligned with the other accessors and avoids dereferencing invalid handles.

🔧 Suggested fix
   problem_and_stream_view_t* problem_and_stream_view =
     static_cast<problem_and_stream_view_t*>(problem);
+  if (!problem_and_stream_view->op_problem) { return CUOPT_INVALID_ARGUMENT; }
   *is_mip_ptr = static_cast<cuopt_int_t>(problem_and_stream_view->is_mip());

747-798: ⚠️ Potential issue | 🟠 Major

Make cuOptSolve exception-safe to avoid leaks on solve failure.
If solve_mip/solve_lp throws, the allocated solution wrapper leaks and the exception crosses the C boundary.

🛠️ Suggested fix (RAII + try/catch)
-    solution_and_stream_view_t* solution_and_stream_view =
-      new solution_and_stream_view_t(true, handle);
-    solution_and_stream_view->mip_solution_ptr = new mip_solution_t<cuopt_int_t, cuopt_float_t>(
-      solve_mip<cuopt_int_t, cuopt_float_t>(handle, view, mip_settings));
-    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view);
+    auto solution_and_stream_view =
+      std::make_unique<solution_and_stream_view_t>(true, handle);
+    try {
+      solution_and_stream_view->mip_solution_ptr =
+        new mip_solution_t<cuopt_int_t, cuopt_float_t>(
+          solve_mip<cuopt_int_t, cuopt_float_t>(handle, view, mip_settings));
+    } catch (const std::exception& e) {
+      CUOPT_LOG_INFO("MIP solve failed: %s", e.what());
+      return CUOPT_INVALID_ARGUMENT;
+    }
+    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view.release());
...
-    solution_and_stream_view_t* solution_and_stream_view =
-      new solution_and_stream_view_t(false, handle);
-    solution_and_stream_view->lp_solution_ptr =
-      new optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>(
-        solve_lp<cuopt_int_t, cuopt_float_t>(handle, view, pdlp_settings));
-    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view);
+    auto solution_and_stream_view =
+      std::make_unique<solution_and_stream_view_t>(false, handle);
+    try {
+      solution_and_stream_view->lp_solution_ptr =
+        new optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>(
+          solve_lp<cuopt_int_t, cuopt_float_t>(handle, view, pdlp_settings));
+    } catch (const std::exception& e) {
+      CUOPT_LOG_INFO("LP solve failed: %s", e.what());
+      return CUOPT_INVALID_ARGUMENT;
+    }
+    *solution_ptr = static_cast<cuOptSolution>(solution_and_stream_view.release());

878-903: ⚠️ Potential issue | 🟠 Major

Handle host-resident solutions in cuOptGetPrimalSolution.
Remote solves can return host data; the current code always reads device vectors and will throw for host-only solutions.

🛠️ Suggested fix
   if (solution_and_stream_view->is_mip) {
     mip_solution_t<cuopt_int_t, cuopt_float_t>* mip_solution =
       static_cast<mip_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->mip_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (mip_solution->is_device_memory()) {
+      const auto& solution_values = mip_solution->get_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& solution_values = mip_solution->get_solution_host();
+      std::copy(solution_values.begin(), solution_values.end(), solution_values_ptr);
+    }
   } else {
     optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>* optimization_problem_solution =
       static_cast<optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->lp_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values =
-      optimization_problem_solution->get_primal_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& solution_values = optimization_problem_solution->get_primal_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& solution_values = optimization_problem_solution->get_primal_solution_host();
+      std::copy(solution_values.begin(), solution_values.end(), solution_values_ptr);
+    }
   }

981-999: ⚠️ Potential issue | 🟠 Major

Add host-path copies for dual solution and reduced costs.
These getters still assume device memory and will fail for remote/host solutions.

🛠️ Suggested fix
-    const rmm::device_uvector<cuopt_float_t>& dual_solution =
-      optimization_problem_solution->get_dual_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(dual_solution_ptr, dual_solution.data(), dual_solution.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& dual_solution = optimization_problem_solution->get_dual_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(dual_solution_ptr, dual_solution.data(), dual_solution.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& dual_solution = optimization_problem_solution->get_dual_solution_host();
+      std::copy(dual_solution.begin(), dual_solution.end(), dual_solution_ptr);
+    }
...
-    const rmm::device_uvector<cuopt_float_t>& reduced_cost =
-      optimization_problem_solution->get_reduced_cost();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(reduced_cost_ptr, reduced_cost.data(), reduced_cost.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const auto& reduced_cost = optimization_problem_solution->get_reduced_cost();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(reduced_cost_ptr, reduced_cost.data(), reduced_cost.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& reduced_cost = optimization_problem_solution->get_reduced_cost_host();
+      std::copy(reduced_cost.begin(), reduced_cost.end(), reduced_cost_ptr);
+    }

Also applies to: 1021-1039

cpp/src/linear_programming/solver_solution.cu (1)

191-265: ⚠️ Potential issue | 🟠 Major

Release device buffers when copy_from switches to host data.
When other.is_device_memory_ is false, the existing device buffers remain allocated, which can retain large GPU memory unnecessarily.

🛠️ Suggested fix
   } else {
     // Copy CPU data
+    // Release device buffers when switching to host-backed data
+    primal_solution_.reset();
+    dual_solution_.reset();
+    reduced_cost_.reset();
     if (!primal_solution_host_) { primal_solution_host_ = std::make_unique<std::vector<f_t>>(); }
     if (!dual_solution_host_) { dual_solution_host_ = std::make_unique<std::vector<f_t>>(); }
     if (!reduced_cost_host_) { reduced_cost_host_ = std::make_unique<std::vector<f_t>>(); }
As per coding guidelines: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events.
🤖 Fix all issues with AI agents
In `@cpp/libmps_parser/src/data_model_view.cpp`:
- Around line 351-392: The code in data_model_view_t::get_problem_category()
(and other span-returning getters using variable_types_.data() / spans)
dereferences host spans without checking the is_device_memory_ flag; either
guard these accesses by checking is_device_memory_ and performing a device->host
copy into a host-local span before reading, or remove the unused flag and
document that spans must be host-resident; specifically update
data_model_view_t::get_problem_category(), all span-returning getters, and any
callers that assume host memory (e.g., remove reliance on
set_is_device_memory(false) or add an explicit device-to-host copy helper
invoked when is_device_memory_ is true) so accesses to variable_types_,
column_types_, etc. never read device memory directly.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 1299-1306: In solve_lp, remove the unreachable duplicate
remote_config.has_value() branch (the block that logs CUOPT_REMOTE_HOST/PORT and
calls solve_lp_remote) since remote_config was already checked earlier and
returned; specifically delete the second if (remote_config.has_value()) {
CUOPT_LOG_INFO(...); return solve_lp_remote(*remote_config, view, settings); }
block so the function flow no longer contains dead code referencing
remote_config, solve_lp_remote, view, and settings.

In `@cpp/src/mip/solve.cu`:
- Around line 338-342: The null handle_ptr check must be moved so it executes
before either data path; ensure you validate handle_ptr (the pointer used by
data_model_view_to_optimization_problem) before calling
data_model_view_to_optimization_problem so both the device and CPU branches are
protected; specifically, place the existing handle_ptr == nullptr check above
the device memory branch (i.e., before any calls into
data_model_view_to_optimization_problem/optimization_problem_t constructor),
keep the same error logging and return of mip_solution_t with
cuopt::logic_error, and avoid duplicating checks—one early guard covers both
paths.

In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp`:
- Around line 177-184: Replace the exact equality check on the floating-point
objective offset with an epsilon comparison: instead of comparing
orig->get_objective_offset() == rerd->get_objective_offset(), compute the
absolute difference and compare it to a small tolerance (e.g.,
std::numeric_limits<double>::epsilon() * scale or a fixed small value) so the
result uses fabs(orig->get_objective_offset() - rerd->get_objective_offset()) <=
tol; keep the integer checks for orig->get_n_constraints(),
orig->get_n_variables(), orig->get_nnz() unchanged and use these symbol names to
locate the change.

In `@python/cuopt/cuopt/__init__.py`:
- Around line 29-33: The __dir__() function currently returns __all__ +
_submodules which duplicates submodule names because __all__ already unpacks
_submodules; change __dir__ to return a deduplicated list (preferably simply
return __all__) or otherwise return an order-preserving deduplicated sequence
(e.g., using dict.fromkeys) so names from __all__ and _submodules aren’t
repeated; update the __dir__ definition accordingly (referencing __dir__,
__all__, and _submodules).

In `@python/cuopt/cuopt/tests/linear_programming/test_memory_model.py`:
- Around line 167-168: The test sets variable types using an integer array which
prevents is_mip() from recognizing MIP variables; update the test to provide
string-encoded type codes (e.g., use a one-byte string array) when calling
data_model_obj.set_variable_types so the solver's is_mip() (solver.py) will
detect the "I" code; replace the np.array([1]) usage with a string bytes array
like np.array(["I"], dtype="S1") (or equivalent single-byte string encoding) so
the test exercises the MIP path.
🧹 Nitpick comments (14)
cpp/tests/linear_programming/unit_tests/lp_solution_memory_test.cu (2)

22-39: Clarify test naming and incomplete guard implementation.

The test name host_only_accessors_need_coderabbit_patch and the commented-out EXPECT_THROW assertions suggest that device-accessor guards for host-only solutions are not yet implemented. The test comments reference a "coderabbit_changes.patch" which is an unusual convention.

Consider:

  1. Renaming tests to reflect current behavior (e.g., host_solution_basic_accessors)
  2. Creating a tracking issue for implementing the accessor guards
  3. Using // TODO(issue#123): Enable when guards are implemented instead of referencing a patch file

70-104: Consider adding stream synchronization before accessor validation.

The test creates device vectors without explicit initialization values. While this is acceptable for testing accessor availability, you may want to add handle.sync_stream() before calling EXPECT_NO_THROW on device accessors to ensure any pending operations are complete.

                                                         &handle,
                                                         true);

+  handle.sync_stream();
   EXPECT_TRUE(solution.is_device_memory());
python/cuopt/cuopt/tests/linear_programming/test_memory_model.py (1)

187-250: Add infeasible/unbounded + free‑variable cases to memory‑model tests.
To align the new memory‑model suite with test requirements, consider adding a minimal infeasible LP, an unbounded LP, and a free‑variable case (host‑memory path).

As per coding guidelines: “Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)” and “Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling.”

cpp/tests/mip/mip_solution_memory_test.cu (1)

25-47: Remove patch-specific naming and enable guard behavior validation in tests.

The guards are implemented: EXE_CUOPT_EXPECTS checks on get_solution() and get_solution_host() verify pointer validity. The test names ("need_coderabbit_patch") and commented-out EXPECT_THROW assertions are misleading and should be removed or the assertions enabled. Either rename the tests to reflect their current behavior or uncomment the EXPECT_THROW for get_solution() on host-only solutions to validate that guards properly throw logic_error.

Suggested fix
-TEST(mip_solution_memory, host_only_accessors_need_coderabbit_patch)
+TEST(mip_solution_memory, host_only_accessors)
 {
-  // This test validates that EXE_CUOPT_EXPECTS guards are in place
-  // Guards are added in coderabbit_changes.patch
+  // Verify host-only solution behavior
   std::vector<double> solution{0.0};
   std::vector<std::string> var_names{"x0"};
   solver_stats_t<int, double> stats{};

   mip_solution_t<int, double> mip_solution(std::move(solution),
                                            std::move(var_names),
                                            0.0,
                                            0.0,
                                            mip_termination_status_t::Optimal,
                                            0.0,
                                            0.0,
                                            0.0,
                                            stats);

   EXPECT_FALSE(mip_solution.is_device_memory());
-  // After applying CodeRabbit patch, this should throw
-  // EXPECT_THROW(mip_solution.get_solution(), cuopt::logic_error);
+  EXPECT_THROW(mip_solution.get_solution(), cuopt::logic_error);
   EXPECT_NO_THROW(mip_solution.get_solution_host());
 }
cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu (1)

287-309: Test uses host pointers with is_device_memory=true flag — intentional but potentially confusing.

The comment on line 292 notes this is "simulated with host pointers for test", but passing host pointers while marking is_device_memory(true) could cause undefined behavior if the conversion function actually attempts GPU memory operations. This test only validates that EXPECT_NO_THROW succeeds, but doesn't verify the resulting optimization problem's correctness.

Consider adding a comment clarifying that this test relies on the conversion function not dereferencing the pointers during view-to-optimization_problem conversion, or using actual device memory allocations for a more realistic test.

cpp/libmps_parser/CMakeLists.txt (2)

98-100: Static library missing compile options.

The shared library mps_parser has target_compile_options applied (line 98-100), but mps_parser_static does not receive the same treatment. This could lead to inconsistent warning/error behavior between the two builds.

♻️ Proposed fix to add compile options to static library
 target_compile_options(mps_parser
   PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
 )
+
+target_compile_options(mps_parser_static
+  PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
+)

Also applies to: 128-135


137-143: Static library missing BZip2/ZLIB include directories.

If MPS_PARSER_WITH_BZIP2 or MPS_PARSER_WITH_ZLIB are enabled, only the shared library receives the corresponding include directories. The static library should receive them as well.

♻️ Proposed fix to add compression library includes to static target
 if(MPS_PARSER_WITH_BZIP2)
     target_include_directories(mps_parser PRIVATE BZip2::BZip2)
+    target_include_directories(mps_parser_static PRIVATE BZip2::BZip2)
 endif(MPS_PARSER_WITH_BZIP2)

 if(MPS_PARSER_WITH_ZLIB)
     target_include_directories(mps_parser PRIVATE ZLIB::ZLIB)
+    target_include_directories(mps_parser_static PRIVATE ZLIB::ZLIB)
 endif(MPS_PARSER_WITH_ZLIB)
cpp/src/mip/solve.cu (1)

331-345: Consider consolidating duplicate conversion calls.

Lines 333 and 343 both call data_model_view_to_optimization_problem(handle_ptr, view). The only difference is the null-check guard for CPU data. Consider restructuring to avoid duplication:

♻️ Proposed refactor to consolidate conversion logic
-  if (view.is_device_memory()) {
-    // Local solve: data already on GPU - convert view to optimization_problem_t and solve
-    auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
-    return solve_mip(op_problem, settings);
-  }
-
   // Local solve with CPU data: copy to GPU and solve
-  if (handle_ptr == nullptr) {
+  if (!view.is_device_memory() && handle_ptr == nullptr) {
     CUOPT_LOG_ERROR("[solve_mip] Local solve requested but handle_ptr is null.");
     return mip_solution_t<i_t, f_t>(
       cuopt::logic_error("No CUDA handle for CPU->GPU copy", cuopt::error_type_t::RuntimeError));
   }
+
   auto op_problem = data_model_view_to_optimization_problem(handle_ptr, view);
   return solve_mip(op_problem, settings);
 }
cpp/libmps_parser/include/mps_parser/data_model_view.hpp (1)

450-454: Document that get_problem_category() reflects the variable types at first call.

The lazy-cached problem_category_ uses std::call_once for thread-safe initialization, reading variable_types_ to compute the category. While the documented API indicates set_variable_types() is called before solving (during initialization), explicitly documenting that get_problem_category() reflects the state at first call would clarify the intended usage pattern and prevent potential misuse where variable types are modified after category computation.

cpp/include/cuopt/linear_programming/mip/solver_solution.hpp (1)

111-116: Document precondition for host accessors.

The get_solution_host() methods document "Only valid when is_device_memory() returns false", but there's no runtime enforcement. Consider adding a debug assertion to catch misuse early:

cuopt_assert(!is_device_memory_, "get_solution_host() called on device-memory solution");

This would help catch programming errors during development.

cpp/src/linear_programming/utilities/cython_solve.cu (1)

208-214: Consider a more explicit warm-start presence check.

The warm-start detection relies on checking if last_restart_duality_gap_dual_solution_.data() != nullptr. While functional, a dedicated has_warm_start_data() method would be clearer and less brittle.

// Current: relies on internal pointer state
if (!is_mip && solver_settings->get_pdlp_warm_start_data_view()
                   .last_restart_duality_gap_dual_solution_.data() != nullptr) {

This is minor since the current approach works, but consider adding a helper method for clarity in future iterations.

cpp/src/linear_programming/optimization_problem_conversions.cu (3)

61-76: Clarify cudaPointerGetAttributes error handling.

The error handling for cudaPointerGetAttributes is functionally correct but could be clearer. The pattern handles:

  1. Device memory: copy D2H
  2. Host memory: direct copy
  3. Unregistered memory (returns cudaErrorInvalidValue): treated as host memory

Consider adding a brief comment explaining why cudaErrorInvalidValue is acceptable:

     } else {
       // Source is on host (or unregistered) - direct copy
       if (err != cudaSuccess) { cudaGetLastError(); }  // Clear cudaPointerGetAttributes error
+      // cudaErrorInvalidValue is expected for unregistered host memory - treat as host
       if (err != cudaSuccess && err != cudaErrorInvalidValue) { RAFT_CUDA_TRY(err); }
       std::memcpy(host_var_types.data(), var_types.data(), var_types.size() * sizeof(char));
     }

108-154: Consider extracting memory-aware copy helper.

The cudaPointerGetAttributes + conditional copy pattern is duplicated between variable types (lines 61-76) and quadratic objective (lines 114-146). Consider extracting a helper template:

template <typename T>
void copy_to_host(std::vector<T>& dest, const T* src, size_t count);

This would reduce duplication and centralize the error handling logic. However, this is a nice-to-have refactor that can be deferred.


241-335: Good non-owning view creation with correct memory semantics.

The function correctly creates a non-owning view pointing to mps_data_model's data and sets is_device_memory(false).

Important lifetime consideration: The returned view holds raw pointers to the mps_data_model's internal storage. The view is only valid as long as the mps_data_model remains alive and unmodified. This is documented implicitly by "non-owning view" but consider adding explicit documentation:

/**
 * `@brief` Create a non-owning view from an MPS data model.
 * `@warning` The returned view holds pointers to mps_data_model's data.
 *          The view is invalid if mps_data_model is destroyed or modified.
 */

Comment on lines 338 to 342
if (handle_ptr == nullptr) {
CUOPT_LOG_ERROR("[solve_mip] Local solve requested but handle_ptr is null.");
return mip_solution_t<i_t, f_t>(
cuopt::logic_error("No CUDA handle for CPU->GPU copy", cuopt::error_type_t::RuntimeError));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, examine the context around lines 338-342 in solve.cu
cat -n cpp/src/mip/solve.cu | sed -n '330,350p'

Repository: NVIDIA/cuopt

Length of output: 1199


🏁 Script executed:

# Find the implementation of data_model_view_to_optimization_problem
find . -type f \( -name "*.cu" -o -name "*.cpp" -o -name "*.cuh" -o -name "*.hpp" \) -exec grep -l "data_model_view_to_optimization_problem" {} \;

Repository: NVIDIA/cuopt

Length of output: 390


🏁 Script executed:

# Examine the implementation of data_model_view_to_optimization_problem
cat -n cpp/src/linear_programming/optimization_problem_conversions.cu | head -150

Repository: NVIDIA/cuopt

Length of output: 8214


🏁 Script executed:

# Search for the function signature and its use of handle_ptr
rg -n "data_model_view_to_optimization_problem" cpp/src/linear_programming/optimization_problem_conversions.cu -B 5 -A 40 | head -100

Repository: NVIDIA/cuopt

Length of output: 3895


🏁 Script executed:

# Check the optimization_problem_t constructor to see if it uses handle_ptr
rg -n "optimization_problem_t.*constructor\|explicit optimization_problem_t" cpp/src/linear_programming/ -B 2 -A 15 | head -80

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for the constructor definition
fd -e hpp -e h cpp/src/linear_programming/ -x grep -l "optimization_problem_t"

Repository: NVIDIA/cuopt

Length of output: 469


🏁 Script executed:

# Find header file for optimization_problem_t
find cpp/include -name "*.hpp" -o -name "*.h" | xargs grep -l "class optimization_problem_t\|struct optimization_problem_t"

Repository: NVIDIA/cuopt

Length of output: 117


🏁 Script executed:

# Search for optimization_problem_t constructor in all header files
fd . cpp/include -name "*.hpp" -o -name "*.h" | xargs grep -A 20 "optimization_problem_t.*handle_ptr"

Repository: NVIDIA/cuopt

Length of output: 288


🏁 Script executed:

# Read the optimization_problem.hpp file to find constructor
cat cpp/include/cuopt/linear_programming/optimization_problem.hpp | head -200

Repository: NVIDIA/cuopt

Length of output: 7595


🏁 Script executed:

# Search for constructor definition with handle_ptr
rg -n "optimization_problem_t.*handle_ptr\|optimization_problem_t\(" cpp/include/cuopt/linear_programming/optimization_problem.hpp -A 10 | head -60

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find the full constructor implementation
rg -n "optimization_problem_t\(.*handle" cpp/include/cuopt/linear_programming/optimization_problem.hpp -A 5

Repository: NVIDIA/cuopt

Length of output: 297


🏁 Script executed:

# Look at the implementation in the .cu file
rg -n "optimization_problem_t<i_t, f_t>::optimization_problem_t.*handle_ptr" cpp/src/linear_programming/ -A 20

Repository: NVIDIA/cuopt

Length of output: 1924


Move the null handle check before the device memory path.

Both the device (line 333) and CPU (line 343) data paths call data_model_view_to_optimization_problem, which creates an optimization_problem_t that immediately dereferences handle_ptr in its constructor (line 41 of optimization_problem.cu). The current null check at line 338 only guards the CPU path; the device path executes without validation and will crash with a null pointer dereference in the constructor. Move the null check before line 331 to protect both paths.

🤖 Prompt for AI Agents
In `@cpp/src/mip/solve.cu` around lines 338 - 342, The null handle_ptr check must
be moved so it executes before either data path; ensure you validate handle_ptr
(the pointer used by data_model_view_to_optimization_problem) before calling
data_model_view_to_optimization_problem so both the device and CPU branches are
protected; specifically, place the existing handle_ptr == nullptr check above
the device memory branch (i.e., before any calls into
data_model_view_to_optimization_problem/optimization_problem_t constructor),
keep the same error logging and return of mip_solution_t with
cuopt::logic_error, and avoid duplicating checks—one early guard covers both
paths.

Comment on lines 177 to 184
// Compare key problem attributes since mps_data_t doesn't have is_equivalent()
auto* orig = original_problem_wrapper->op_problem;
auto* rerd = reread_problem_wrapper->op_problem;

result =
original_problem_wrapper->op_problem->is_equivalent(*reread_problem_wrapper->op_problem);
(orig->get_n_constraints() == rerd->get_n_constraints() &&
orig->get_n_variables() == rerd->get_n_variables() && orig->get_nnz() == rerd->get_nnz() &&
orig->get_objective_offset() == rerd->get_objective_offset());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use epsilon comparison for floating-point objective_offset.

Line 184 uses exact equality (==) for get_objective_offset(), which returns a floating-point value. MPS serialization/deserialization could introduce minor precision drift, causing spurious test failures.

Additionally, this comparison validates problem dimensions but not the actual data content (constraint coefficients, variable bounds, objective coefficients). Consider whether a more comprehensive check is needed for true roundtrip validation.

Proposed fix for floating-point comparison
+#include <cmath>
+
+// Helper for approximate FP comparison
+static bool approx_equal(double a, double b, double eps = 1e-12) {
+  return std::fabs(a - b) <= eps * std::max(1.0, std::max(std::fabs(a), std::fabs(b)));
+}
+
 // In test_mps_roundtrip:
     result =
       (orig->get_n_constraints() == rerd->get_n_constraints() &&
        orig->get_n_variables() == rerd->get_n_variables() && orig->get_nnz() == rerd->get_nnz() &&
-       orig->get_objective_offset() == rerd->get_objective_offset());
+       approx_equal(orig->get_objective_offset(), rerd->get_objective_offset()));

As per coding guidelines: "use epsilon comparisons for floating-point equality checks."

🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp` around lines 177 -
184, Replace the exact equality check on the floating-point objective offset
with an epsilon comparison: instead of comparing orig->get_objective_offset() ==
rerd->get_objective_offset(), compute the absolute difference and compare it to
a small tolerance (e.g., std::numeric_limits<double>::epsilon() * scale or a
fixed small value) so the result uses fabs(orig->get_objective_offset() -
rerd->get_objective_offset()) <= tol; keep the integer checks for
orig->get_n_constraints(), orig->get_n_variables(), orig->get_nnz() unchanged
and use these symbol names to locate the change.

Comment on lines 29 to 33
def __dir__():
return __all__ + _submodules


__all__ = ["__git_commit__", "__version__", "__version_major_minor__", *_submodules]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

__dir__() returns duplicate entries for submodules.

__all__ already includes the submodules via *_submodules unpacking on line 33, so __dir__() returning __all__ + _submodules will list them twice in dir(cuopt) output.

🐛 Proposed fix
 def __dir__():
-    return __all__ + _submodules
+    return list(__all__)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __dir__():
return __all__ + _submodules
__all__ = ["__git_commit__", "__version__", "__version_major_minor__", *_submodules]
def __dir__():
return list(__all__)
__all__ = ["__git_commit__", "__version__", "__version_major_minor__", *_submodules]
🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/__init__.py` around lines 29 - 33, The __dir__() function
currently returns __all__ + _submodules which duplicates submodule names because
__all__ already unpacks _submodules; change __dir__ to return a deduplicated
list (preferably simply return __all__) or otherwise return an order-preserving
deduplicated sequence (e.g., using dict.fromkeys) so names from __all__ and
_submodules aren’t repeated; update the __dir__ definition accordingly
(referencing __dir__, __all__, and _submodules).

Comment on lines 167 to 168
var_types = np.array([1]) # Integer
data_model_obj.set_variable_types(var_types)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n python/cuopt/cuopt/tests/linear_programming/test_memory_model.py | sed -n '160,175p'

Repository: NVIDIA/cuopt

Length of output: 618


🏁 Script executed:

# Find where variable type codes are defined
rg -n "set_variable_types" -g '*.py' -g '*.pyx' python/cuopt | head -20

Repository: NVIDIA/cuopt

Length of output: 1001


🏁 Script executed:

# Find variable type constant definitions
rg -n "BINARY|INTEGER|CONTINUOUS" -g '*.py' python/cuopt | grep -E "(^[^:]+:\s*(BINARY|INTEGER|CONTINUOUS)\s*=|def.*variable_type)" | head -30

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check what dtype np.array([1]) creates and look for examples in the codebase
rg -n "np\.array\(\[" -g '*.py' python/cuopt/tests | grep -i "var_type\|variable_type" -A 2 -B 2 | head -40

Repository: NVIDIA/cuopt

Length of output: 114


🏁 Script executed:

# Find the actual variable type values used in tests
rg -n "variable_types\s*=" -g '*.py' python/cuopt/tests -A 1 | head -50

Repository: NVIDIA/cuopt

Length of output: 114


🏁 Script executed:

# Look at the set_variable_types implementation
sed -n '364,390p' python/cuopt/cuopt/linear_programming/data_model/data_model.py

Repository: NVIDIA/cuopt

Length of output: 958


🏁 Script executed:

# Check the wrapper implementation
sed -n '133,160p' python/cuopt/cuopt/linear_programming/data_model/data_model_wrapper.pyx

Repository: NVIDIA/cuopt

Length of output: 827


🏁 Script executed:

# Find variable type constants
rg -n "VarType|VariableType|var_type" -g '*.py' -g '*.pyx' python/cuopt/cuopt/linear_programming --max-count=100 | head -50

Repository: NVIDIA/cuopt

Length of output: 2773


🏁 Script executed:

# Look for where variable types are actually used/parsed
rg -n "dtype.*S1|dtype.*str|int.*1.*Integer|VarType\s*=" -g '*.py' -g '*.pyx' python/cuopt | head -30

Repository: NVIDIA/cuopt

Length of output: 1372


🏁 Script executed:

# Check test cases for other MIP tests to see how they use variable types
rg -n "set_variable_types" -g '*.py' python/cuopt --context=3 | head -80

Repository: NVIDIA/cuopt

Length of output: 3546


🏁 Script executed:

sed -n '84,96p' python/cuopt/cuopt/linear_programming/solver/solver.py

Repository: NVIDIA/cuopt

Length of output: 581


🏁 Script executed:

# Check if there are any other tests using np.array([1]) for variable types
rg -n "np\.array\(\[1\]\)" -g '*.py' python/cuopt/tests

Repository: NVIDIA/cuopt

Length of output: 114


Fix variable type encoding in test to use explicit string codes.
np.array([1]) creates an integer array, not a string array. The is_mip() function (solver.py:84-96) checks for the string "I" or bytes b"I", so integer 1 is not recognized as an integer variable type. This prevents the test from exercising the MIP path. Use np.array(["I"], dtype="S1") to match the documented requirement (data_model.py:364) and patterns used throughout the codebase.

Proposed fix
-        var_types = np.array([1])  # Integer
+        var_types = np.array(["I"], dtype="S1")  # Integer
🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/tests/linear_programming/test_memory_model.py` around
lines 167 - 168, The test sets variable types using an integer array which
prevents is_mip() from recognizing MIP variables; update the test to provide
string-encoded type codes (e.g., use a one-byte string array) when calling
data_model_obj.set_variable_types so the solver's is_mip() (solver.py) will
detect the "I" code; replace the np.array([1]) usage with a string bytes array
like np.array(["I"], dtype="S1") (or equivalent single-byte string encoding) so
the test exercises the MIP path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (1)

41-52: ⚠️ Potential issue | 🟡 Minor

Add null checks before Py_DECREF to prevent crashes on Python errors.

PyObject_CallMethod can return NULL if the Python call fails (e.g., method not found, exception raised). Calling Py_DECREF(NULL) causes undefined behavior. The same issue exists in set_solution.

🛡️ Proposed fix
   void get_solution(void* data, void* objective_value) override
   {
     PyObject* numba_matrix =
       data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables);
     PyObject* numpy_array =
       data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1);
+    if (numba_matrix == nullptr || numpy_array == nullptr) {
+      Py_XDECREF(numba_matrix);
+      Py_XDECREF(numpy_array);
+      return;
+    }
     PyObject* res =
       PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array);
     Py_DECREF(numba_matrix);
     Py_DECREF(numpy_array);
-    Py_DECREF(res);
+    Py_XDECREF(res);
   }
cpp/libmps_parser/CMakeLists.txt (1)

70-136: ⚠️ Potential issue | 🟠 Major

mps_parser_static misses C++ standard and compile options.

The shared target sets CXX_STANDARD and MPS_PARSER_CXX_FLAGS, but the static target does not. This can break static builds if the sources rely on C++20 features or warning flags. Consider applying the same properties (and optional include/link settings) to mps_parser_static.

🔧 Proposed fix (keep targets consistent)
 add_library(mps_parser SHARED ${MPS_PARSER_SOURCES})
 
 # Static library for linking into libcuopt
 add_library(mps_parser_static STATIC ${MPS_PARSER_SOURCES})
 
 set_target_properties(mps_parser
@@
   CXX_SCAN_FOR_MODULES OFF
 )
+
+set_target_properties(mps_parser_static
+  PROPERTIES
+  CXX_STANDARD 20
+  CXX_STANDARD_REQUIRED ON
+  INTERFACE_POSITION_INDEPENDENT_CODE ON
+  CXX_SCAN_FOR_MODULES OFF
+)
@@
 target_compile_options(mps_parser
   PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
 )
+
+target_compile_options(mps_parser_static
+  PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
+)
cpp/src/linear_programming/cuopt_c.cpp (1)

878-903: ⚠️ Potential issue | 🟠 Major

Handle host‑memory solutions in C API getters.

Remote solves can return host-only solutions, but the getters always read device buffers. This will throw or return invalid data when is_device_memory() is false. Add a host branch (for MIP and LP) and use the new host accessors.

🐛 Proposed fix (example for cuOptGetPrimalSolution)
   if (solution_and_stream_view->is_mip) {
     mip_solution_t<cuopt_int_t, cuopt_float_t>* mip_solution =
       static_cast<mip_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->mip_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (mip_solution->is_device_memory()) {
+      const rmm::device_uvector<cuopt_float_t>& solution_values = mip_solution->get_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& host_solution = mip_solution->get_solution_host();
+      std::copy(host_solution.begin(), host_solution.end(), solution_values_ptr);
+    }
   } else {
     optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>* optimization_problem_solution =
       static_cast<optimization_problem_solution_t<cuopt_int_t, cuopt_float_t>*>(
         solution_and_stream_view->lp_solution_ptr);
-    const rmm::device_uvector<cuopt_float_t>& solution_values =
-      optimization_problem_solution->get_primal_solution();
-    rmm::cuda_stream_view stream =
-      solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
-    raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
-    stream.synchronize();
+    if (optimization_problem_solution->is_device_memory()) {
+      const rmm::device_uvector<cuopt_float_t>& solution_values =
+        optimization_problem_solution->get_primal_solution();
+      rmm::cuda_stream_view stream =
+        solution_and_stream_view->stream_view.value_or(rmm::cuda_stream_view{});
+      raft::copy(solution_values_ptr, solution_values.data(), solution_values.size(), stream);
+      stream.synchronize();
+    } else {
+      const auto& host_solution = optimization_problem_solution->get_primal_solution_host();
+      std::copy(host_solution.begin(), host_solution.end(), solution_values_ptr);
+    }
   }

Please mirror the same device/host branching in cuOptGetDualSolution and cuOptGetReducedCosts.

Also applies to: 981-999, 1021-1039

cpp/src/linear_programming/solver_solution.cu (1)

191-265: ⚠️ Potential issue | 🟠 Major

Clear stale buffers when switching memory domains in copy_from.
When copying from a different memory domain, the old buffers remain allocated and can be returned by getters, leading to stale data and wasted memory. Reset the inactive domain in each branch to preserve invariants.

🧹 Proposed fix to clear inactive-domain buffers
   if (other.is_device_memory_) {
     // Copy GPU data
@@
     if (other.reduced_cost_) {
       reduced_cost_->resize(other.reduced_cost_->size(), handle_ptr->get_stream());
       raft::copy(reduced_cost_->data(),
                  other.reduced_cost_->data(),
                  reduced_cost_->size(),
                  handle_ptr->get_stream());
     } else {
       reduced_cost_->resize(0, handle_ptr->get_stream());
     }
 
+    // Clear stale host buffers when switching to device memory
+    primal_solution_host_.reset();
+    dual_solution_host_.reset();
+    reduced_cost_host_.reset();
+
     handle_ptr->sync_stream();
   } else {
     // Copy CPU data
@@
     if (other.reduced_cost_host_) {
       *reduced_cost_host_ = *other.reduced_cost_host_;
     } else {
       reduced_cost_host_->clear();
     }
+
+    // Clear stale device buffers when switching to host memory
+    primal_solution_.reset();
+    dual_solution_.reset();
+    reduced_cost_.reset();
   }
🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/mip/solver_solution.hpp`:
- Around line 157-169: The to_host(rmm::cuda_stream_view) implementation must
perform an asynchronous device-to-host copy on the provided stream and check
CUDA errors without synchronizing; update mip_solution_t<...>::to_host to call
rmm::device_uvector::copy_async (or cudaMemcpyAsync) into the host buffer when
is_device_memory() is true, pass stream_view.stream() for the stream, wrap the
copy call with CUDA_CHECK (or the project’s CUDA_CHECK macro) to validate the
operation, and remove any cudaDeviceSynchronize or stream synchronization calls
so the path remains async; ensure after scheduling the copy you update internal
state so get_solution_host() refers to the host buffer once the copy is
enqueued.

In `@cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp`:
- Around line 71-110: Add a precondition check in the remote stub(s) to enforce
that input data is host-resident: in solve_lp_remote(...) (and the other remote
overload) assert or throw if the provided
cuopt::mps_parser::data_model_view_t<i_t,f_t>& view is not on the host (use the
view's host-residency accessor, e.g., view.is_host_resident() or
view.is_on_host(), or equivalent), so the stub fails fast instead of silently
proceeding; place this check at the top of the functions (before computing
n_rows/n_cols) and keep behavior consistent (assert for debug builds or throw
std::invalid_argument/std::runtime_error for public API).

In `@cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu`:
- Around line 36-65: The tests mutate process-global CUOPT_REMOTE_HOST/PORT in
RemoteSolveConfigTest::SetUp/ TearDown which can race when tests run in
parallel; fix by introducing a global test-scope mutex (e.g., a static
std::mutex or a test helper ScopedEnvLock) and acquire it for the entire
duration of each test instance in SetUp and release in TearDown (or use RAII
ScopedEnvLock constructed in SetUp and destroyed in TearDown); ensure the lock
covers both the unsetenv/setenv calls and any GPU/global-state touches so
environment and global state are serialized across tests.
🧹 Nitpick comments (5)
cpp/tests/mip/mip_solution_memory_test.cu (1)

101-117: Consider adding edge case tests for empty solutions.

The termination-status-only and error constructors are well tested. For more comprehensive coverage per coding guidelines, consider adding tests for:

  • Empty variable names vector
  • Zero-variable solutions
  • Degenerate termination statuses

These could help validate edge case handling in the solution memory paths.

cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu (2)

218-331: Exercise a real device‑pointer path (or rename the test).

gpu_view_conversion marks device memory but uses host pointers, so it doesn’t validate the real GPU copy path. Consider allocating device buffers (e.g., rmm::device_uvector) when GPU tests are available, or rename to clarify it’s a “GPU‑flag only” test.

Based on learnings: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).


336-387: Strengthen stub assertions with value checks (zeros).

These tests assert sizes and status only. Add value checks for the returned vectors (all zeros) to confirm numerical content, not just shape.

Based on learnings: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

✅ Example enhancement
   EXPECT_FALSE(solution.is_device_memory());
   EXPECT_EQ(solution.get_termination_status(), pdlp_termination_status_t::Optimal);
   EXPECT_EQ(solution.get_primal_solution_host().size(), 1);
+  EXPECT_EQ(solution.get_primal_solution_host(), std::vector<double>({0.0}));
   EXPECT_NO_THROW(solution.get_primal_solution_host());
   EXPECT_NO_THROW(solution.get_dual_solution_host());
cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp (1)

40-56: Consider exception‑free port parsing with std::from_chars.

This avoids exceptions in headers and is faster/locale‑independent.

♻️ Suggested change
+#include <charconv>
 ...
   if (host != nullptr && port != nullptr && host[0] != '\0' && port[0] != '\0') {
-    try {
-      int port_num = std::stoi(port);
-      if (port_num < 1 || port_num > 65535) { return std::nullopt; }
-      return remote_solve_config_t{std::string(host), port_num};
-    } catch (...) {
-      // Invalid port number, fall back to local solve
-      return std::nullopt;
-    }
+    int port_num = 0;
+    auto [ptr, ec] = std::from_chars(port, port + std::strlen(port), port_num);
+    if (ec != std::errc{} || ptr == port || *ptr != '\0') { return std::nullopt; }
+    if (port_num < 1 || port_num > 65535) { return std::nullopt; }
+    return remote_solve_config_t{std::string(host), port_num};
   }
cpp/src/linear_programming/solver_solution.cu (1)

733-782: Consider consolidating the repeated host-copy logic.
The device→host copy blocks are repeated in write_to_file, write_to_sol_file, and to_host. Extracting a shared helper would reduce duplication and future maintenance drift.

Based on learnings: Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

Comment on lines 157 to 169
/**
* @brief Copy solution data from GPU to CPU memory.
*
* After calling this method, is_device_memory() will return false and
* the solution can be accessed via get_solution_host().
* This is useful for remote solve scenarios where serialization requires
* CPU-accessible data.
*
* If the solution is already in CPU memory, this is a no-op.
*
* @param stream_view The CUDA stream to use for the copy
*/
void to_host(rmm::cuda_stream_view stream_view);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure to_host uses CUDA_CHECK and async copy on the provided stream.

This path will likely call cudaMemcpyAsync/rmm::device_uvector::copy_async; please confirm CUDA errors are checked and avoid cudaDeviceSynchronize in hot paths.

As per coding guidelines: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification.

#!/bin/bash
# Inspect to_host implementation for CUDA_CHECK usage and sync patterns.
rg -n "mip_solution_t<.*>::to_host\\(" -C3
🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/mip/solver_solution.hpp` around lines
157 - 169, The to_host(rmm::cuda_stream_view) implementation must perform an
asynchronous device-to-host copy on the provided stream and check CUDA errors
without synchronizing; update mip_solution_t<...>::to_host to call
rmm::device_uvector::copy_async (or cudaMemcpyAsync) into the host buffer when
is_device_memory() is true, pass stream_view.stream() for the stream, wrap the
copy call with CUDA_CHECK (or the project’s CUDA_CHECK macro) to validate the
operation, and remove any cudaDeviceSynchronize or stream synchronization calls
so the path remains async; ensure after scheduling the copy you update internal
state so get_solution_host() refers to the host buffer once the copy is
enqueued.

Comment on lines 71 to 110
optimization_problem_solution_t<i_t, f_t> solve_lp_remote(
const remote_solve_config_t&,
const cuopt::mps_parser::data_model_view_t<i_t, f_t>& view,
const pdlp_solver_settings_t<i_t, f_t>&)
{
auto n_rows = view.get_constraint_matrix_offsets().size() > 0
? static_cast<i_t>(view.get_constraint_matrix_offsets().size()) - 1
: 0;
auto n_cols = static_cast<i_t>(view.get_objective_coefficients().size());

std::vector<f_t> primal_solution(static_cast<size_t>(n_cols), f_t{0});
std::vector<f_t> dual_solution(static_cast<size_t>(n_rows), f_t{0});
std::vector<f_t> reduced_cost(static_cast<size_t>(n_cols), f_t{0});

typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t stats;
stats.number_of_steps_taken = 0;
stats.total_number_of_attempted_steps = 0;
stats.l2_primal_residual = f_t{0};
stats.l2_relative_primal_residual = f_t{0};
stats.l2_dual_residual = f_t{0};
stats.l2_relative_dual_residual = f_t{0};
stats.primal_objective = f_t{0};
stats.dual_objective = f_t{0};
stats.gap = f_t{0};
stats.relative_gap = f_t{0};
stats.max_primal_ray_infeasibility = f_t{0};
stats.primal_ray_linear_objective = f_t{0};
stats.max_dual_ray_infeasibility = f_t{0};
stats.dual_ray_linear_objective = f_t{0};
stats.solve_time = 0.0;
stats.solved_by_pdlp = false;
return optimization_problem_solution_t<i_t, f_t>(std::move(primal_solution),
std::move(dual_solution),
std::move(reduced_cost),
"",
{},
{},
stats,
pdlp_termination_status_t::Optimal);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce host‑memory precondition in remote stubs.

Remote solve is defined to require host‑resident data; the stubs should assert that to prevent silent misuse.

🛑 Proposed guard
 optimization_problem_solution_t<i_t, f_t> solve_lp_remote(
   const remote_solve_config_t&,
   const cuopt::mps_parser::data_model_view_t<i_t, f_t>& view,
   const pdlp_solver_settings_t<i_t, f_t>&)
 {
+  cuopt_assert(!view.is_device_memory(), "Remote solve requires host memory view");
   auto n_rows = view.get_constraint_matrix_offsets().size() > 0
                   ? static_cast<i_t>(view.get_constraint_matrix_offsets().size()) - 1
                   : 0;
@@
 mip_solution_t<i_t, f_t> solve_mip_remote(
   const remote_solve_config_t&,
   const cuopt::mps_parser::data_model_view_t<i_t, f_t>& view,
   const mip_solver_settings_t<i_t, f_t>&)
 {
+  cuopt_assert(!view.is_device_memory(), "Remote solve requires host memory view");
   auto n_cols = static_cast<i_t>(view.get_objective_coefficients().size());

Also applies to: 117-139

🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp` around lines
71 - 110, Add a precondition check in the remote stub(s) to enforce that input
data is host-resident: in solve_lp_remote(...) (and the other remote overload)
assert or throw if the provided cuopt::mps_parser::data_model_view_t<i_t,f_t>&
view is not on the host (use the view's host-residency accessor, e.g.,
view.is_host_resident() or view.is_on_host(), or equivalent), so the stub fails
fast instead of silently proceeding; place this check at the top of the
functions (before computing n_rows/n_cols) and keep behavior consistent (assert
for debug builds or throw std::invalid_argument/std::runtime_error for public
API).

Comment on lines 36 to 65
class RemoteSolveConfigTest : public ::testing::Test {
protected:
void SetUp() override
{
// Save original environment variables
const char* host_env = std::getenv("CUOPT_REMOTE_HOST");
const char* port_env = std::getenv("CUOPT_REMOTE_PORT");
saved_host_ = host_env ? std::string(host_env) : std::string();
saved_port_ = port_env ? std::string(port_env) : std::string();
had_host_ = (host_env != nullptr);
had_port_ = (port_env != nullptr);

// Clean environment for test
unsetenv("CUOPT_REMOTE_HOST");
unsetenv("CUOPT_REMOTE_PORT");
}

void TearDown() override
{
// Restore original environment variables
if (had_host_) {
setenv("CUOPT_REMOTE_HOST", saved_host_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_HOST");
}
if (had_port_) {
setenv("CUOPT_REMOTE_PORT", saved_port_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_PORT");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard global env var mutations across tests.

CUOPT_REMOTE_* is process‑global; if tests run in parallel, SetUp/TearDown can interleave and cause flaky behavior. Consider holding a mutex for the full test scope.
Based on learnings: Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment.

🔒 Proposed fix: scoped env lock
+#include <mutex>
 ...
 namespace cuopt::linear_programming::test {
 
+namespace {
+std::mutex env_mutex;
+}
+
 class RemoteSolveConfigTest : public ::testing::Test {
  protected:
   void SetUp() override
   {
+    env_lock_ = std::unique_lock<std::mutex>(env_mutex);
     // Save original environment variables
     const char* host_env = std::getenv("CUOPT_REMOTE_HOST");
     const char* port_env = std::getenv("CUOPT_REMOTE_PORT");
@@
   void TearDown() override
   {
     // Restore original environment variables
     if (had_host_) {
       setenv("CUOPT_REMOTE_HOST", saved_host_.c_str(), 1);
     } else {
       unsetenv("CUOPT_REMOTE_HOST");
     }
@@
     if (had_port_) {
       setenv("CUOPT_REMOTE_PORT", saved_port_.c_str(), 1);
     } else {
       unsetenv("CUOPT_REMOTE_PORT");
     }
+    env_lock_.unlock();
   }
 
  private:
   std::string saved_host_;
   std::string saved_port_;
   bool had_host_ = false;
   bool had_port_ = false;
+  std::unique_lock<std::mutex> env_lock_;
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class RemoteSolveConfigTest : public ::testing::Test {
protected:
void SetUp() override
{
// Save original environment variables
const char* host_env = std::getenv("CUOPT_REMOTE_HOST");
const char* port_env = std::getenv("CUOPT_REMOTE_PORT");
saved_host_ = host_env ? std::string(host_env) : std::string();
saved_port_ = port_env ? std::string(port_env) : std::string();
had_host_ = (host_env != nullptr);
had_port_ = (port_env != nullptr);
// Clean environment for test
unsetenv("CUOPT_REMOTE_HOST");
unsetenv("CUOPT_REMOTE_PORT");
}
void TearDown() override
{
// Restore original environment variables
if (had_host_) {
setenv("CUOPT_REMOTE_HOST", saved_host_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_HOST");
}
if (had_port_) {
setenv("CUOPT_REMOTE_PORT", saved_port_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_PORT");
}
`#include` <mutex>
// ... other includes ...
namespace cuopt::linear_programming::test {
namespace {
std::mutex env_mutex;
}
class RemoteSolveConfigTest : public ::testing::Test {
protected:
void SetUp() override
{
env_lock_ = std::unique_lock<std::mutex>(env_mutex);
// Save original environment variables
const char* host_env = std::getenv("CUOPT_REMOTE_HOST");
const char* port_env = std::getenv("CUOPT_REMOTE_PORT");
saved_host_ = host_env ? std::string(host_env) : std::string();
saved_port_ = port_env ? std::string(port_env) : std::string();
had_host_ = (host_env != nullptr);
had_port_ = (port_env != nullptr);
// Clean environment for test
unsetenv("CUOPT_REMOTE_HOST");
unsetenv("CUOPT_REMOTE_PORT");
}
void TearDown() override
{
// Restore original environment variables
if (had_host_) {
setenv("CUOPT_REMOTE_HOST", saved_host_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_HOST");
}
if (had_port_) {
setenv("CUOPT_REMOTE_PORT", saved_port_.c_str(), 1);
} else {
unsetenv("CUOPT_REMOTE_PORT");
}
env_lock_.unlock();
}
private:
std::string saved_host_;
std::string saved_port_;
bool had_host_ = false;
bool had_port_ = false;
std::unique_lock<std::mutex> env_lock_;
};
🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/unit_tests/memory_model_infrastructure_test.cu`
around lines 36 - 65, The tests mutate process-global CUOPT_REMOTE_HOST/PORT in
RemoteSolveConfigTest::SetUp/ TearDown which can race when tests run in
parallel; fix by introducing a global test-scope mutex (e.g., a static
std::mutex or a test helper ScopedEnvLock) and acquire it for the entire
duration of each test instance in SetUp and release in TearDown (or use RAII
ScopedEnvLock constructed in SetUp and destroyed in TearDown); ensure the lock
covers both the unsetenv/setenv calls and any GPU/global-state touches so
environment and global state are serialized across tests.

@tmckayus tmckayus force-pushed the memory-model-clean branch 2 times, most recently from 97dc9a3 to 44f16ea Compare February 1, 2026 19:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 66-107: The const getters get_primal_solution_host,
get_dual_solution_host, and get_reduced_cost_host currently mutate shared
optional caches without synchronization; protect initialization of
primal_solution_host_cache_, dual_solution_host_cache_, and
reduced_cost_host_cache_ by adding thread-safety (e.g., introduce mutable
std::mutex members or per-cache std::once_flag/mutexes and lock/guard them
inside each getter before testing/initializing the cache) so only one thread
performs the device->host copy and other threads safely read the cached
std::vector; ensure the mutex/once_flag members are mutable to allow use in
const methods and apply the same pattern to all three getters.

In
`@cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp`:
- Around line 57-62: The base class optimization_problem_solution_interface_t
declares virtual double get_solve_time() which conflicts with
lp_solution_interface_t::get_solve_time() returning f_t; remove the base-class
get_solve_time() declaration from optimization_problem_solution_interface_t (or
alternatively make it return f_t if f_t is a common type available) so the
derived lp_solution_interface_t can provide the single consistent accessor
get_solve_time() with return type f_t and avoid hiding/override mismatch.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 84-94: char_variable_types may have a different length than
n_vars, so calling problem->set_variable_types(enum_variable_types.data(),
n_vars) can read past enum_variable_types; ensure you use the actual length of
enum_variable_types (e.g., enum_variable_types.size() or
char_variable_types.size()) when calling set_variable_types. Update the
conversion block that builds enum_variable_types (from
data_model.get_variable_types() and var_t) to compute a size_t count =
enum_variable_types.size() and pass that count to problem->set_variable_types
instead of n_vars, and optionally validate/early-return if sizes mismatch.

In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 763-768: The CSR matrix comparison in is_equivalent() only checks
A_.size() against other_A_values.size() which can miss differing nonzero values;
update is_equivalent() to perform a full CSR value comparison (respecting
row/column permutations used elsewhere) or, if keeping the simplified check for
now, make the comment explicit and add a TODO and an assertion guard for
intended CI-only use: modify is_equivalent() to either implement value-wise
comparison of A_ against other_A_values using the same permutations applied to
row_ptr/col_idx, or add a clear TODO and runtime assertion/guard that this
function is only used for MPS roundtrip tests; reference symbols:
is_equivalent(), A_, other_A_values, and the permutation arrays used earlier in
the function.

In `@cpp/src/linear_programming/cuopt_c_internal.hpp`:
- Around line 75-94: The struct solution_and_stream_view_t currently owns raw
pointers (mip_solution_interface_ptr, lp_solution_interface_ptr) and defines a
destructor but not the other special members; to avoid double-free, delete the
copy constructor and copy assignment, and implement move constructor and move
assignment that transfer ownership of mip_solution_interface_ptr,
lp_solution_interface_ptr, and backend_type while nulling the source pointers
and preserving is_mip; keep the destructor to delete any remaining owned
pointers. Update symbols: solution_and_stream_view_t,
mip_solution_interface_ptr, lp_solution_interface_ptr, is_mip, backend_type, and
the existing ~solution_and_stream_view_t to ensure safe move semantics and
disabled copying.
- Around line 23-73: The struct problem_and_stream_view_t currently owns raw
pointers (gpu_problem, cpu_problem, stream_view_ptr, handle_ptr) but lacks
copy/move special members, risking double-free on copy; delete the copy
constructor and copy assignment operator and implement a move constructor and
move assignment that transfer ownership by moving gpu_problem, cpu_problem,
stream_view_ptr, and handle_ptr from the source to *this and set the source
pointers to nullptr, ensuring the destructor (already null-safe) will only free
moved-in resources; keep get_handle_ptr, get_problem, and
to_optimization_problem behavior intact after the move.

In `@cpp/src/linear_programming/gpu_optimization_problem.cu`:
- Around line 756-846: The is_equivalent implementation must (1) fallback to
direct-order comparison when variable/row names are missing instead of returning
false, and (2) actually compare CSR matrix data instead of only sizes. Update
gpu_optimization_problem_t<i_t,f_t>::is_equivalent to: if get_variable_names()
or get_row_names() are empty, skip building var_perm/row_perm and compare
objective coefficients, bounds, types and full CSR arrays in direct index order;
otherwise build var_perm and row_perm as now and use them to permute columns and
rows when comparing the CSR components returned by
get_constraint_matrix_values_host(), get_constraint_matrix_col_indices_host(),
and get_constraint_matrix_row_offsets_host() (compare values with fabs tolerance
and exact match for indices/offsets/types), and ensure you also check
sizes/offsets consistently to avoid out-of-bounds when permuting.
- Around line 67-83: In
gpu_optimization_problem_t<i_t,f_t>::set_csr_constraint_matrix validate that
size_offsets > 0 before computing n_constraints_ and before resizing/copying A_,
A_indices_, and A_offsets_; if size_offsets == 0, return/throw a clear error
(e.g., throw std::invalid_argument or assert) to avoid underflow of
n_constraints_ = size_offsets - 1 and prevent invalid CSR state, otherwise
proceed to set n_constraints_, resize A_, A_indices_, A_offsets_, and perform
raft::copy as currently implemented.
- Around line 114-130: The method set_quadratic_objective_matrix currently
copies Q_values/Q_indices/Q_offsets into host std::vector members (Q_values_,
Q_indices_, Q_offsets_) using std::copy, which conflicts with the class doc
claiming setters accept device pointers; either update the docs or add
device-aware copying: detect device pointers (or accept a flag), and if inputs
are device memory use raft::copy into the host vectors or change members to
rmm::device_uvector and use raft::copy to copy from host to device as
appropriate; update set_quadratic_objective_matrix implementation to mirror the
device-aware behavior used by set_variable_lower_bounds (use
raft::copy/rmm::device_uvector) or change the interface doc to state it accepts
host pointers only so behavior matches Q_values_/Q_indices_/Q_offsets storage.

In `@cpp/src/linear_programming/optimization_problem.cu`:
- Around line 620-684: Several move-setters leave metadata stale: ensure setters
that change sizes also update n_vars_ / n_constraints_ and that
set_variable_types_move also recomputes problem_category_. Specifically, keep
the existing n_constraints_ update in set_csr_constraint_matrix_move, add
n_constraints_ updates to any other constraint-buffer setters if they change
A_offsets_, set n_vars_ = variable_types_.size() (or at least ensure n_vars_
matches c_.size()/variable_types_.size()) inside set_variable_types_move, and
after assigning variable_types_ recompute problem_category_ using the same
logic/path used elsewhere when variable types are set (i.e., mirror the
classification code used when building the problem from non-move APIs so
LP/MIP/IP category and bounds checks remain consistent). Ensure each move-based
setter that affects counts mirrors the non-move setter behaviors (use the same
symbols: n_vars_, n_constraints_, variable_types_, problem_category_).

In `@cpp/src/linear_programming/solve_remote.cu`:
- Around line 151-169: The GPU->CPU conversion is missing copying of row types
and constraint bounds; call the GPU getters (e.g.,
gpu_problem.get_row_types_host() and gpu_problem.get_constraint_bounds_host())
and, if non-empty, pass their data and sizes into the corresponding CPU setters
(e.g., cpu_problem.set_row_types(..., ...) and
cpu_problem.set_constraint_bounds(..., ...)); place these after the existing
constraint bounds copy so the CPU problem receives the row_types (constraint
sense 'E'/'L'/'G') and the RHS b-vector for non-ranged constraints.

In `@cpp/src/linear_programming/utilities/cython_solve.cu`:
- Around line 222-224: The code currently calls lp_solution_ptr.release() before
invoking std::move(*gpu_lp_sol).to_linear_programming_ret_t(), which leaks
memory if to_linear_programming_ret_t() throws; instead, keep ownership until
the conversion succeeds: do not call lp_solution_ptr.release() prior to
conversion, call std::move(*gpu_lp_sol).to_linear_programming_ret_t() while
lp_solution_ptr still owns the memory (access via lp_solution_ptr.get() or by
dereferencing), assign response.lp_ret from that result, and only call
lp_solution_ptr.release() (or std::unique_ptr reset/move) after the conversion
completes successfully; apply the same change for the other occurrences around
to_linear_programming_ret_t() at the mentioned locations.

In `@python/cuopt/cuopt/linear_programming/solver/solver.pxd`:
- Around line 128-130: Remove the redundant C++ redeclaration of device_buffer:
delete the entire cdef extern from "<rmm/device_buffer.hpp>" namespace "rmm":
block that defines cppclass device_buffer and rely on the existing cimport of
device_buffer (from rmm.librmm.device_buffer on line 16) so there is a single
consistent declaration across .pxd files and no symbol conflicts.
🧹 Nitpick comments (8)
cpp/include/cuopt/linear_programming/pdlp/pdlp_warm_start_data.hpp (1)

73-74: LGTM - Consider aligning style with CPU version.

The is_populated() sentinel check is correct. For consistency with cpu_pdlp_warm_start_data.hpp (line 98) which uses !...empty(), consider:

-  bool is_populated() const { return last_restart_duality_gap_dual_solution_.size() > 0; }
+  bool is_populated() const { return !last_restart_duality_gap_dual_solution_.empty(); }

Both are functionally equivalent; !empty() is the more idiomatic C++ style for container non-emptiness checks.

cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu (1)

40-73: Consider batching stream synchronization for better performance.

Each device_to_host_vector call synchronizes the stream independently. For the 9 vector fields being copied, this results in 9 separate synchronizations. Batching the copies and performing a single sync at the end would reduce synchronization overhead.

♻️ Suggested approach

Create a variant that doesn't synchronize internally, then synchronize once after all copies:

 // Helper to copy device_uvector to std::vector (D2H)
 template <typename T>
 std::vector<T> device_to_host_vector(const rmm::device_uvector<T>& device_vec,
-                                     rmm::cuda_stream_view stream)
+                                     rmm::cuda_stream_view stream,
+                                     bool sync = true)
 {
   if (device_vec.size() == 0) return std::vector<T>();
 
   std::vector<T> host_vec(device_vec.size());
   raft::copy(host_vec.data(), device_vec.data(), device_vec.size(), stream);
-  stream.synchronize();
+  if (sync) stream.synchronize();
   return host_vec;
 }

Then in convert_to_cpu_warmstart, pass false for sync on all calls except the last, or call stream.synchronize() once at the end.

cpp/src/linear_programming/cuopt_c.cpp (1)

413-427: Repeated backend branching pattern could benefit from abstraction.

The pattern of checking backend_type == problem_backend_t::CPU and then either using host getters or copying from device appears in many getter functions (cuOptGetObjectiveCoefficients, cuOptGetConstraintMatrix, cuOptGetConstraintSense, etc.).

Consider extracting this into a helper template or macro to reduce duplication. However, given this is the C API layer and clarity is important, the current explicit approach is acceptable.

cpp/src/linear_programming/cuopt_c_internal.hpp (1)

40-46: Destructor deletion order is correct but fragile.

The deletion order (problem → handle → stream) is appropriate since problems may reference handles. However, this implicit ordering dependency is not documented and could break if refactored.

Consider adding a brief comment documenting the deletion order rationale, or using std::unique_ptr with custom deleters to make ownership explicit.

cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp (1)

50-95: Consider helper to reduce repetitive null-check pattern.

The constructor has 9 nearly identical blocks checking data() != nullptr before copying. While functional, this could be simplified with a helper lambda or template function.

♻️ Optional refactor using helper lambda
   cpu_pdlp_warm_start_data_t(const pdlp_warm_start_data_view_t<i_t, f_t>& view)
     : initial_primal_weight_(view.initial_primal_weight_),
       // ... scalar initializers ...
   {
+    auto copy_span = [](auto& dest, const auto& src) {
+      if (src.data() != nullptr) {
+        dest.assign(src.data(), src.data() + src.size());
+      }
+    };
+    copy_span(current_primal_solution_, view.current_primal_solution_);
+    copy_span(current_dual_solution_, view.current_dual_solution_);
+    // ... etc for all 9 vectors ...
-    // Copy vector data from spans
-    if (view.current_primal_solution_.data() != nullptr) {
-      current_primal_solution_.assign(
-        view.current_primal_solution_.data(),
-        view.current_primal_solution_.data() + view.current_primal_solution_.size());
-    }
-    // ... repeated 8 more times ...
   }

Based on learnings: "Refactor code duplication in solver components (3+ occurrences) into shared utilities."

cpp/src/linear_programming/cpu_optimization_problem.cpp (1)

26-31: Use logging instead of fprintf for consistency.

The constructor uses fprintf(stderr, ...) while the rest of the codebase uses CUOPT_LOG_* macros. This could cause inconsistent output formatting.

♻️ Proposed fix
 template <typename i_t, typename f_t>
 cpu_optimization_problem_t<i_t, f_t>::cpu_optimization_problem_t(raft::handle_t const* handle_ptr)
   : handle_ptr_(handle_ptr)
 {
-  fprintf(stderr, "cpu_optimization_problem_t constructor: Using CPU backend\n");
+  CUOPT_LOG_INFO("cpu_optimization_problem_t constructor: Using CPU backend");
 }
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp (1)

159-283: Extract a shared device‑to‑host copy helper for warm‑start accessors.

The warm‑start getters repeat the same size‑check → allocate → copy → sync pattern. A small helper would reduce duplication and keep copy/sync behavior consistent.

Based on learnings: Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication.

cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp (1)

430-435: Add override on to_optimization_problem() in concrete classes.

This helps the compiler catch signature drift in the interface implementations.

♻️ Proposed diff
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;
...
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;

Also applies to: 586-592

Comment on lines +66 to +107
const std::vector<f_t>& get_primal_solution_host() const override
{
if (!primal_solution_host_cache_) {
// Copy from device to host using per-thread stream
auto stream = rmm::cuda_stream_per_thread;
primal_solution_host_cache_ = std::vector<f_t>(solution_.get_primal_solution().size());
raft::copy(primal_solution_host_cache_->data(),
solution_.get_primal_solution().data(),
solution_.get_primal_solution().size(),
stream);
stream.synchronize();
}
return *primal_solution_host_cache_;
}

const std::vector<f_t>& get_dual_solution_host() const override
{
if (!dual_solution_host_cache_) {
auto stream = rmm::cuda_stream_per_thread;
dual_solution_host_cache_ = std::vector<f_t>(solution_.get_dual_solution().size());
raft::copy(dual_solution_host_cache_->data(),
solution_.get_dual_solution().data(),
solution_.get_dual_solution().size(),
stream);
stream.synchronize();
}
return *dual_solution_host_cache_;
}

const std::vector<f_t>& get_reduced_cost_host() const override
{
if (!reduced_cost_host_cache_) {
auto stream = rmm::cuda_stream_per_thread;
auto& reduced_cost =
const_cast<optimization_problem_solution_t<i_t, f_t>&>(solution_).get_reduced_cost();
reduced_cost_host_cache_ = std::vector<f_t>(reduced_cost.size());
raft::copy(
reduced_cost_host_cache_->data(), reduced_cost.data(), reduced_cost.size(), stream);
stream.synchronize();
}
return *reduced_cost_host_cache_;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard lazy host caches against concurrent access.

These const getters mutate shared optional caches without synchronization; if the wrapper is shared across threads (callbacks, async consumers), this is a data race. Add a mutex/once-flag around cache init or clearly document single-threaded usage.

As per coding guidelines: Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state.

🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`
around lines 66 - 107, The const getters get_primal_solution_host,
get_dual_solution_host, and get_reduced_cost_host currently mutate shared
optional caches without synchronization; protect initialization of
primal_solution_host_cache_, dual_solution_host_cache_, and
reduced_cost_host_cache_ by adding thread-safety (e.g., introduce mutable
std::mutex members or per-cache std::once_flag/mutexes and lock/guard them
inside each getter before testing/initializing the cache) so only one thread
performs the device->host copy and other threads safely read the cached
std::vector; ensure the mutex/once_flag members are mutable to allow use in
const methods and apply the same pattern to all three getters.

Comment on lines 57 to 133
/**
* @brief Get the solve time in seconds
* @return Time in seconds
*/
virtual double get_solve_time() const = 0;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return type mismatch for get_solve_time() in inheritance hierarchy.

The base class optimization_problem_solution_interface_t declares get_solve_time() returning double (line 61), but the derived lp_solution_interface_t redeclares it returning f_t (line 112). This causes:

  1. Potential hiding of the base class method (not a valid override due to different return type)
  2. Ambiguity when f_t is not double

The base class method should likely be removed or made consistent:

🐛 Proposed fix - remove base class declaration
 template <typename i_t, typename f_t>
 class optimization_problem_solution_interface_t {
  public:
   virtual ~optimization_problem_solution_interface_t() = default;

   virtual bool is_mip() const = 0;
   virtual cuopt::logic_error get_error_status() const = 0;
-
-  /**
-   * `@brief` Get the solve time in seconds
-   * `@return` Time in seconds
-   */
-  virtual double get_solve_time() const = 0;
 };

Alternatively, use f_t consistently or add a separate accessor in the base class.

Also applies to: 108-113

🤖 Prompt for AI Agents
In
`@cpp/include/cuopt/linear_programming/optimization_problem_solution_interface.hpp`
around lines 57 - 62, The base class optimization_problem_solution_interface_t
declares virtual double get_solve_time() which conflicts with
lp_solution_interface_t::get_solve_time() returning f_t; remove the base-class
get_solve_time() declaration from optimization_problem_solution_interface_t (or
alternatively make it return f_t if f_t is a common type available) so the
derived lp_solution_interface_t can provide the single consistent accessor
get_solve_time() with return type f_t and avoid hiding/override mismatch.

Comment on lines +763 to +769
// For CSR matrix, we'd need more complex comparison - for now just check size matches
// A full implementation would need to compare matrix entries with row/column permutations
if (A_.size() != other_A_values.size()) return false;

return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the is_equivalent() function
rg -n "is_equivalent" --type cpp -B 5 -A 15 cpp/src/linear_programming/cpu_optimization_problem.cpp

Repository: NVIDIA/cuopt

Length of output: 1123


🏁 Script executed:

# Check where is_equivalent is used
rg -n "is_equivalent" --type cpp

Repository: NVIDIA/cuopt

Length of output: 957


🏁 Script executed:

# Look for the complete function context around lines 763-768
sed -n '740,800p' cpp/src/linear_programming/cpu_optimization_problem.cpp

Repository: NVIDIA/cuopt

Length of output: 1592


🏁 Script executed:

# Look at the test usage
sed -n '170,200p' cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp

Repository: NVIDIA/cuopt

Length of output: 1204


🏁 Script executed:

# Check what A_ is - look at the class definition and CSR matrix type
rg -n "A_" cpp/src/linear_programming/cpu_optimization_problem.cpp | head -20

Repository: NVIDIA/cuopt

Length of output: 1116


🏁 Script executed:

# Find the class definition to understand A_ member
sed -n '1,100p' cpp/src/linear_programming/cpu_optimization_problem.cpp | grep -A 50 "class cpu_optimization_problem_t"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at the complete class header
rg -n "class cpu_optimization_problem_t" --type cpp -A 30

Repository: NVIDIA/cuopt

Length of output: 4948


🏁 Script executed:

# Check all usages of is_equivalent to understand the full scope
rg -n "is_equivalent\(" --type cpp -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 4827


🏁 Script executed:

# Look at the header file to understand what this method is supposed to do
sed -n '320,340p' cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp

Repository: NVIDIA/cuopt

Length of output: 999


🏁 Script executed:

# Check if there's any related CSR comparison function mentioned in the learnings
rg -n "csr_matrices_equivalent_with_permutation" --type cpp -A 20

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check if there are other implementations of is_equivalent in GPU or other backends
rg -n "is_equivalent" --type cpp -A 50 | grep -A 50 "gpu_optimization_problem_t"

Repository: NVIDIA/cuopt

Length of output: 6633


🏁 Script executed:

# Look at the test file more to understand the roundtrip test context
sed -n '140,210p' cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp

Repository: NVIDIA/cuopt

Length of output: 3132


CSR matrix comparison in is_equivalent() is incomplete but acceptable for current roundtrip testing.

The method only checks CSR matrix size (non-zero count), not actual values, despite implementing permutation-based comparison for all other arrays. While this could theoretically miss matrices with identical sparsity but different values, it is sufficient for the intended use case: MPS file roundtrip testing with deterministic read/write behavior. The limitation is acknowledged in the code comment. If this method is used in the future for general problem equivalence validation outside CI tests, add explicit CSR value comparison with row/column permutation support.

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 763 -
768, The CSR matrix comparison in is_equivalent() only checks A_.size() against
other_A_values.size() which can miss differing nonzero values; update
is_equivalent() to perform a full CSR value comparison (respecting row/column
permutations used elsewhere) or, if keeping the simplified check for now, make
the comment explicit and add a TODO and an assertion guard for intended CI-only
use: modify is_equivalent() to either implement value-wise comparison of A_
against other_A_values using the same permutations applied to row_ptr/col_idx,
or add a clear TODO and runtime assertion/guard that this function is only used
for MPS roundtrip tests; reference symbols: is_equivalent(), A_, other_A_values,
and the permutation arrays used earlier in the function.

Comment on lines 23 to 74
struct problem_and_stream_view_t {
problem_and_stream_view_t()
: op_problem(nullptr), stream_view(rmm::cuda_stream_per_thread), handle(stream_view)
problem_and_stream_view_t(problem_backend_t backend)
: backend_type(backend), stream_view_ptr(nullptr), handle_ptr(nullptr)
{
if (backend == problem_backend_t::GPU) {
// GPU backend: Allocate CUDA resources
stream_view_ptr = new rmm::cuda_stream_view(rmm::cuda_stream_per_thread);
handle_ptr = new raft::handle_t(*stream_view_ptr);
gpu_problem = new gpu_optimization_problem_t<cuopt_int_t, cuopt_float_t>(handle_ptr);
cpu_problem = nullptr;
} else {
// CPU backend: No CUDA resources allocated (for remote execution on CPU-only hosts)
cpu_problem = new cpu_optimization_problem_t<cuopt_int_t, cuopt_float_t>(nullptr);
gpu_problem = nullptr;
}
}
raft::handle_t* get_handle_ptr() { return &handle; }
optimization_problem_t<cuopt_int_t, cuopt_float_t>* op_problem;
rmm::cuda_stream_view stream_view;
raft::handle_t handle;

~problem_and_stream_view_t()
{
if (gpu_problem) delete gpu_problem;
if (cpu_problem) delete cpu_problem;
if (handle_ptr) delete handle_ptr;
if (stream_view_ptr) delete stream_view_ptr;
}

raft::handle_t* get_handle_ptr() { return handle_ptr; }

optimization_problem_interface_t<cuopt_int_t, cuopt_float_t>* get_problem()
{
return backend_type == problem_backend_t::GPU
? static_cast<optimization_problem_interface_t<cuopt_int_t, cuopt_float_t>*>(
gpu_problem)
: static_cast<optimization_problem_interface_t<cuopt_int_t, cuopt_float_t>*>(
cpu_problem);
}

optimization_problem_t<cuopt_int_t, cuopt_float_t> to_optimization_problem()
{
if (backend_type == problem_backend_t::GPU) {
return gpu_problem->to_optimization_problem();
} else {
return cpu_problem->to_optimization_problem();
}
}

problem_backend_t backend_type;
gpu_optimization_problem_t<cuopt_int_t, cuopt_float_t>* gpu_problem;
cpu_optimization_problem_t<cuopt_int_t, cuopt_float_t>* cpu_problem;
rmm::cuda_stream_view* stream_view_ptr; // nullptr for CPU backend to avoid CUDA initialization
raft::handle_t* handle_ptr; // nullptr for CPU backend to avoid CUDA initialization
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing Rule of Five: potential double-free on copy/move.

problem_and_stream_view_t manages raw owning pointers but lacks copy/move constructors and assignment operators. If accidentally copied, the destructor will double-free resources.

Consider deleting copy operations and implementing move semantics:

🛡️ Proposed fix
 struct problem_and_stream_view_t {
   problem_and_stream_view_t(problem_backend_t backend)
     : backend_type(backend), stream_view_ptr(nullptr), handle_ptr(nullptr)
   {
     // ... existing constructor body
   }
+
+  // Prevent accidental copies (would double-free)
+  problem_and_stream_view_t(const problem_and_stream_view_t&) = delete;
+  problem_and_stream_view_t& operator=(const problem_and_stream_view_t&) = delete;
+
+  // Allow moves
+  problem_and_stream_view_t(problem_and_stream_view_t&& other) noexcept
+    : backend_type(other.backend_type),
+      gpu_problem(other.gpu_problem),
+      cpu_problem(other.cpu_problem),
+      stream_view_ptr(other.stream_view_ptr),
+      handle_ptr(other.handle_ptr)
+  {
+    other.gpu_problem = nullptr;
+    other.cpu_problem = nullptr;
+    other.stream_view_ptr = nullptr;
+    other.handle_ptr = nullptr;
+  }
+  problem_and_stream_view_t& operator=(problem_and_stream_view_t&&) = delete;
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cuopt_c_internal.hpp` around lines 23 - 73, The
struct problem_and_stream_view_t currently owns raw pointers (gpu_problem,
cpu_problem, stream_view_ptr, handle_ptr) but lacks copy/move special members,
risking double-free on copy; delete the copy constructor and copy assignment
operator and implement a move constructor and move assignment that transfer
ownership by moving gpu_problem, cpu_problem, stream_view_ptr, and handle_ptr
from the source to *this and set the source pointers to nullptr, ensuring the
destructor (already null-safe) will only free moved-in resources; keep
get_handle_ptr, get_problem, and to_optimization_problem behavior intact after
the move.

Comment on lines +756 to +847
template <typename i_t, typename f_t>
bool gpu_optimization_problem_t<i_t, f_t>::is_equivalent(
const optimization_problem_interface_t<i_t, f_t>& other) const
{
// Compare scalar properties
if (maximize_ != other.get_sense()) return false;
if (n_vars_ != other.get_n_variables()) return false;
if (n_constraints_ != other.get_n_constraints()) return false;
if (objective_scaling_factor_ != other.get_objective_scaling_factor()) return false;
if (objective_offset_ != other.get_objective_offset()) return false;
if (problem_category_ != other.get_problem_category()) return false;

// Get host data from both problems
auto this_c = get_objective_coefficients_host();
auto other_c = other.get_objective_coefficients_host();
if (this_c.size() != other_c.size()) return false;

auto this_var_lb = get_variable_lower_bounds_host();
auto other_var_lb = other.get_variable_lower_bounds_host();
if (this_var_lb.size() != other_var_lb.size()) return false;

auto this_var_ub = get_variable_upper_bounds_host();
auto other_var_ub = other.get_variable_upper_bounds_host();
if (this_var_ub.size() != other_var_ub.size()) return false;

auto this_var_types = get_variable_types_host();
auto other_var_types = other.get_variable_types_host();
if (this_var_types.size() != other_var_types.size()) return false;

auto this_b = get_constraint_bounds_host();
auto other_b = other.get_constraint_bounds_host();
if (this_b.size() != other_b.size()) return false;

auto this_A_values = get_constraint_matrix_values_host();
auto other_A_values = other.get_constraint_matrix_values_host();
if (this_A_values.size() != other_A_values.size()) return false;

// Check if we have variable and row names for permutation matching
const auto& this_var_names = get_variable_names();
const auto& other_var_names = other.get_variable_names();
const auto& this_row_names = get_row_names();
const auto& other_row_names = other.get_row_names();

if (this_var_names.empty() || other_var_names.empty()) return false;
if (this_row_names.empty() || other_row_names.empty()) return false;

// Build variable permutation map
std::unordered_map<std::string, i_t> other_var_idx;
for (size_t j = 0; j < other_var_names.size(); ++j) {
other_var_idx[other_var_names[j]] = static_cast<i_t>(j);
}

std::vector<i_t> var_perm(n_vars_);
for (i_t i = 0; i < n_vars_; ++i) {
auto it = other_var_idx.find(this_var_names[i]);
if (it == other_var_idx.end()) return false;
var_perm[i] = it->second;
}

// Build row permutation map
std::unordered_map<std::string, i_t> other_row_idx;
for (size_t j = 0; j < other_row_names.size(); ++j) {
other_row_idx[other_row_names[j]] = static_cast<i_t>(j);
}

std::vector<i_t> row_perm(n_constraints_);
for (i_t i = 0; i < n_constraints_; ++i) {
auto it = other_row_idx.find(this_row_names[i]);
if (it == other_row_idx.end()) return false;
row_perm[i] = it->second;
}

// Compare variable-indexed arrays with permutation
for (i_t i = 0; i < n_vars_; ++i) {
i_t j = var_perm[i];
if (std::abs(this_c[i] - other_c[j]) > 1e-9) return false;
if (std::abs(this_var_lb[i] - other_var_lb[j]) > 1e-9) return false;
if (std::abs(this_var_ub[i] - other_var_ub[j]) > 1e-9) return false;
if (this_var_types[i] != other_var_types[j]) return false;
}

// Compare constraint-indexed arrays with permutation
for (i_t i = 0; i < n_constraints_; ++i) {
i_t j = row_perm[i];
if (std::abs(this_b[i] - other_b[j]) > 1e-9) return false;
}

// For CSR matrix, we'd need more complex comparison - for now just check size matches
// A full implementation would need to compare matrix entries with row/column permutations
if (this_A_values.size() != other_A_values.size()) return false;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

is_equivalent can return false positives/negatives.

Two issues:

  1. It returns false when names are missing, even if data is identical (no fallback to direct-order comparison).
  2. CSR matrices are only checked for size equality, so different matrices can compare as equivalent.

Please add a direct-order fallback when names are absent and compare CSR values/indices/offsets (with permutation when names exist).

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/gpu_optimization_problem.cu` around lines 756 -
846, The is_equivalent implementation must (1) fallback to direct-order
comparison when variable/row names are missing instead of returning false, and
(2) actually compare CSR matrix data instead of only sizes. Update
gpu_optimization_problem_t<i_t,f_t>::is_equivalent to: if get_variable_names()
or get_row_names() are empty, skip building var_perm/row_perm and compare
objective coefficients, bounds, types and full CSR arrays in direct index order;
otherwise build var_perm and row_perm as now and use them to permute columns and
rows when comparing the CSR components returned by
get_constraint_matrix_values_host(), get_constraint_matrix_col_indices_host(),
and get_constraint_matrix_row_offsets_host() (compare values with fabs tolerance
and exact match for indices/offsets/types), and ensure you also check
sizes/offsets consistently to avoid out-of-bounds when permuting.

Comment on lines +620 to +684
template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_csr_constraint_matrix_move(
rmm::device_uvector<f_t>&& A_values,
rmm::device_uvector<i_t>&& A_indices,
rmm::device_uvector<i_t>&& A_offsets)
{
A_ = std::move(A_values);
A_indices_ = std::move(A_indices);
A_offsets_ = std::move(A_offsets);
n_constraints_ = A_offsets_.size() - 1;
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_constraint_bounds_move(rmm::device_uvector<f_t>&& b)
{
b_ = std::move(b);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_objective_coefficients_move(rmm::device_uvector<f_t>&& c)
{
c_ = std::move(c);
n_vars_ = c_.size();
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_variable_lower_bounds_move(
rmm::device_uvector<f_t>&& variable_lower_bounds)
{
variable_lower_bounds_ = std::move(variable_lower_bounds);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_variable_upper_bounds_move(
rmm::device_uvector<f_t>&& variable_upper_bounds)
{
variable_upper_bounds_ = std::move(variable_upper_bounds);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_variable_types_move(
rmm::device_uvector<var_t>&& variable_types)
{
variable_types_ = std::move(variable_types);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_constraint_lower_bounds_move(
rmm::device_uvector<f_t>&& constraint_lower_bounds)
{
constraint_lower_bounds_ = std::move(constraint_lower_bounds);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_constraint_upper_bounds_move(
rmm::device_uvector<f_t>&& constraint_upper_bounds)
{
constraint_upper_bounds_ = std::move(constraint_upper_bounds);
}

template <typename i_t, typename f_t>
void optimization_problem_t<i_t, f_t>::set_row_types_move(rmm::device_uvector<char>&& row_types)
{
row_types_ = std::move(row_types);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update size/category metadata in the move-based setters.
Several move setters don't update n_vars_ / n_constraints_ (and set_variable_types_move doesn't recompute problem_category_). If callers rely on the move-based API alone, metadata can stay stale and skew bounds checks or LP/MIP/IP classification.

🛠️ Proposed fix
 void optimization_problem_t<i_t, f_t>::set_constraint_bounds_move(rmm::device_uvector<f_t>&& b)
 {
-  b_ = std::move(b);
+  b_ = std::move(b);
+  n_constraints_ = b_.size();
 }

 void optimization_problem_t<i_t, f_t>::set_variable_lower_bounds_move(
   rmm::device_uvector<f_t>&& variable_lower_bounds)
 {
-  variable_lower_bounds_ = std::move(variable_lower_bounds);
+  variable_lower_bounds_ = std::move(variable_lower_bounds);
+  n_vars_ = variable_lower_bounds_.size();
 }

 void optimization_problem_t<i_t, f_t>::set_variable_upper_bounds_move(
   rmm::device_uvector<f_t>&& variable_upper_bounds)
 {
-  variable_upper_bounds_ = std::move(variable_upper_bounds);
+  variable_upper_bounds_ = std::move(variable_upper_bounds);
+  n_vars_ = variable_upper_bounds_.size();
 }

 void optimization_problem_t<i_t, f_t>::set_variable_types_move(
   rmm::device_uvector<var_t>&& variable_types)
 {
-  variable_types_ = std::move(variable_types);
+  variable_types_ = std::move(variable_types);
+  n_vars_ = variable_types_.size();
+  i_t n_integer = thrust::count_if(handle_ptr_->get_thrust_policy(),
+                                   variable_types_.begin(),
+                                   variable_types_.end(),
+                                   [] __device__(auto val) { return val == var_t::INTEGER; });
+  if (n_integer == n_vars_) {
+    problem_category_ = problem_category_t::IP;
+  } else if (n_integer > 0) {
+    problem_category_ = problem_category_t::MIP;
+  }
 }

 void optimization_problem_t<i_t, f_t>::set_constraint_lower_bounds_move(
   rmm::device_uvector<f_t>&& constraint_lower_bounds)
 {
-  constraint_lower_bounds_ = std::move(constraint_lower_bounds);
+  constraint_lower_bounds_ = std::move(constraint_lower_bounds);
+  n_constraints_ = constraint_lower_bounds_.size();
 }

 void optimization_problem_t<i_t, f_t>::set_constraint_upper_bounds_move(
   rmm::device_uvector<f_t>&& constraint_upper_bounds)
 {
-  constraint_upper_bounds_ = std::move(constraint_upper_bounds);
+  constraint_upper_bounds_ = std::move(constraint_upper_bounds);
+  n_constraints_ = constraint_upper_bounds_.size();
 }

 void optimization_problem_t<i_t, f_t>::set_row_types_move(rmm::device_uvector<char>&& row_types)
 {
-  row_types_ = std::move(row_types);
+  row_types_ = std::move(row_types);
+  n_constraints_ = row_types_.size();
 }
As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)."
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/optimization_problem.cu` around lines 620 - 684,
Several move-setters leave metadata stale: ensure setters that change sizes also
update n_vars_ / n_constraints_ and that set_variable_types_move also recomputes
problem_category_. Specifically, keep the existing n_constraints_ update in
set_csr_constraint_matrix_move, add n_constraints_ updates to any other
constraint-buffer setters if they change A_offsets_, set n_vars_ =
variable_types_.size() (or at least ensure n_vars_ matches
c_.size()/variable_types_.size()) inside set_variable_types_move, and after
assigning variable_types_ recompute problem_category_ using the same logic/path
used elsewhere when variable types are set (i.e., mirror the classification code
used when building the problem from non-move APIs so LP/MIP/IP category and
bounds checks remain consistent). Ensure each move-based setter that affects
counts mirrors the non-move setter behaviors (use the same symbols: n_vars_,
n_constraints_, variable_types_, problem_category_).

Comment on lines +151 to +169
// Copy constraint bounds
auto constraint_lb = gpu_problem.get_constraint_lower_bounds_host();
auto constraint_ub = gpu_problem.get_constraint_upper_bounds_host();
if (!constraint_lb.empty()) {
cpu_problem.set_constraint_lower_bounds(constraint_lb.data(), constraint_lb.size());
}
if (!constraint_ub.empty()) {
cpu_problem.set_constraint_upper_bounds(constraint_ub.data(), constraint_ub.size());
}

// Copy variable bounds
auto var_lb = gpu_problem.get_variable_lower_bounds_host();
auto var_ub = gpu_problem.get_variable_upper_bounds_host();
if (!var_lb.empty()) { cpu_problem.set_variable_lower_bounds(var_lb.data(), var_lb.size()); }
if (!var_ub.empty()) { cpu_problem.set_variable_upper_bounds(var_ub.data(), var_ub.size()); }

// Copy variable types
auto var_types = gpu_problem.get_variable_types_host();
if (!var_types.empty()) { cpu_problem.set_variable_types(var_types.data(), var_types.size()); }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing row_types and constraint_bounds in GPU-to-CPU conversion.

The GPU problem conversion copies constraint_lower_bounds and constraint_upper_bounds but does not copy:

  1. row_types (constraint sense: 'E', 'L', 'G')
  2. constraint_bounds (the 'b' vector / RHS for non-ranged constraints)

For problems using row_types-based constraint specification (non-ranged), the converted CPU problem will be missing essential constraint information.

🐛 Proposed fix
   // Copy constraint bounds
   auto constraint_lb = gpu_problem.get_constraint_lower_bounds_host();
   auto constraint_ub = gpu_problem.get_constraint_upper_bounds_host();
   if (!constraint_lb.empty()) {
     cpu_problem.set_constraint_lower_bounds(constraint_lb.data(), constraint_lb.size());
   }
   if (!constraint_ub.empty()) {
     cpu_problem.set_constraint_upper_bounds(constraint_ub.data(), constraint_ub.size());
   }
+
+  // Copy row types (constraint sense)
+  auto row_types = gpu_problem.get_row_types_host();
+  if (!row_types.empty()) {
+    cpu_problem.set_row_types(row_types.data(), row_types.size());
+  }
+
+  // Copy constraint bounds (RHS for non-ranged problems)
+  auto constraint_bounds = gpu_problem.get_constraint_bounds_host();
+  if (!constraint_bounds.empty()) {
+    cpu_problem.set_constraint_bounds(constraint_bounds.data(), constraint_bounds.size());
+  }
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solve_remote.cu` around lines 151 - 169, The
GPU->CPU conversion is missing copying of row types and constraint bounds; call
the GPU getters (e.g., gpu_problem.get_row_types_host() and
gpu_problem.get_constraint_bounds_host()) and, if non-empty, pass their data and
sizes into the corresponding CPU setters (e.g., cpu_problem.set_row_types(...,
...) and cpu_problem.set_constraint_bounds(..., ...)); place these after the
existing constraint bounds copy so the CPU problem receives the row_types
(constraint sense 'E'/'L'/'G') and the RHS b-vector for non-ranged constraints.

Comment on lines +128 to +131
cdef extern from "<rmm/device_buffer.hpp>" namespace "rmm":
cdef cppclass device_buffer:
pass
Copy link

@coderabbitai coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n python/cuopt/cuopt/linear_programming/solver/solver.pxd | head -50

Repository: NVIDIA/cuopt

Length of output: 2534


🏁 Script executed:

cat -n python/cuopt/cuopt/linear_programming/solver/solver.pxd | sed -n '120,145p'

Repository: NVIDIA/cuopt

Length of output: 1316


🏁 Script executed:

# Check for all imports and declarations related to device_buffer
rg -n "device_buffer" python/cuopt/cuopt/linear_programming/solver/solver.pxd -B2 -A2

Repository: NVIDIA/cuopt

Length of output: 1598


🏁 Script executed:

# Search for any Cython build output or compilation warnings related to device_buffer
find python/cuopt -name "*.pyx" -o -name "*.pxd" | xargs grep -l "device_buffer" | head -10

Repository: NVIDIA/cuopt

Length of output: 452


🏁 Script executed:

# Check if there are any build files or setup.py that might indicate compilation issues
cat -n python/cuopt/setup.py 2>/dev/null | head -100 || echo "setup.py not found in expected location"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Verify the RMM binding definition to understand what line 16 imports
python3 -c "import sys; print('Python available')" 2>/dev/null && rg -n "device_buffer" --type=pyx --type=pxd | grep -E "(class|cdef)" | head -20

Repository: NVIDIA/cuopt

Length of output: 117


🏁 Script executed:

# Check how device_buffer is declared in another .pxd file
cat -n python/cuopt/cuopt/distance_engine/waypoint_matrix.pxd | grep -A5 -B5 "device_buffer"

Repository: NVIDIA/cuopt

Length of output: 958


🏁 Script executed:

# Check routing_utilities.pxd as well
cat -n python/cuopt/cuopt/routing/structure/routing_utilities.pxd | grep -A5 -B5 "device_buffer"

Repository: NVIDIA/cuopt

Length of output: 2727


🏁 Script executed:

# Check if the solver.pxd pattern of having both cimport AND extern declaration is intentional
# by examining the actual C++ header that's being included
rg -n "class device_buffer" --type=cpp --type=hpp

Repository: NVIDIA/cuopt

Length of output: 83


Remove redundant device_buffer declaration on lines 128-130.

Line 16 already imports device_buffer from rmm.librmm.device_buffer via Cython's cimport statement. The cdef extern from block redeclaring it from the C++ header is redundant and inconsistent with the pattern used in other .pxd files in the codebase (e.g., waypoint_matrix.pxd, routing_utilities.pxd). This redundancy can cause symbol conflicts during compilation.

Delete lines 128-130 and rely solely on the import on line 16.

🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/linear_programming/solver/solver.pxd` around lines 128 -
130, Remove the redundant C++ redeclaration of device_buffer: delete the entire
cdef extern from "<rmm/device_buffer.hpp>" namespace "rmm": block that defines
cppclass device_buffer and rely on the existing cimport of device_buffer (from
rmm.librmm.device_buffer on line 16) so there is a single consistent declaration
across .pxd files and no symbol conflicts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus this is a valid concern

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes will implement, I need to take a pass over CodeRabbit comments

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 638-644: The code only writes variable_upper_bounds_ when
variable_lower_bounds_ is non-empty; change the guards so each bound array is
checked independently: call data_model_view.set_variable_lower_bounds(...)
inside an if (!variable_lower_bounds_.empty()) block and call
data_model_view.set_variable_upper_bounds(...) inside a separate if
(!variable_upper_bounds_.empty()) block, referencing the existing symbols
variable_lower_bounds_, variable_upper_bounds_, and
data_model_view.set_variable_lower_bounds /
data_model_view.set_variable_upper_bounds to locate and update the logic.
- Around line 90-107: The parameter validate_positive_semi_definite on
cpu_optimization_problem_t<i_t,f_t>::set_quadratic_objective_matrix is currently
ignored; either implement PSD validation or explicitly mark it unused and
document the decision — to be minimal, add a single-line use
(void)validate_positive_semi_definite; plus a short comment inside
set_quadratic_objective_matrix noting the CPU backend does not perform PSD
validation (or alternatively implement a PSD check by converting
Q_values/Q_indices/Q_offsets to a dense symmetric matrix and verifying
non-negative eigenvalues via a helper like
validate_matrix_positive_semi_definite and call it from
set_quadratic_objective_matrix).
- Line 30: Remove the debug stderr print from the cpu_optimization_problem_t
constructor (the fprintf(stderr, ...) call) and either delete it or replace it
with the project's logging macro/function from <utilities/logger.hpp> (use the
appropriate debug/info logger such as LOG_DEBUG or equivalent) so construction
no longer writes to stderr; locate the call inside the
cpu_optimization_problem_t constructor in cpu_optimization_problem.cpp and
update it accordingly.
- Around line 43-60: The CPU setter
cpu_optimization_problem_t::set_csr_constraint_matrix currently dereferences
A_values/A_indices/A_offsets without null checks; add cuopt_expects-style
validations so that when size_values>0 you assert A_values != nullptr, when
size_indices>0 assert A_indices != nullptr, and when size_offsets>0 assert
A_offsets != nullptr (mirror the GPU optimization_problem_t checks), then
proceed to resize/copy; apply the same pattern to set_constraint_bounds,
set_objective_coefficients, and set_quadratic_objective_matrix for their
pointer/size pairs; finally address the unused validate_positive_semi_definite
parameter on set_quadratic_objective_matrix by either removing it or
implementing the intended validation branch (choose one and make the
signature/implementation consistent).
🧹 Nitpick comments (1)
cpp/src/linear_programming/cpu_optimization_problem.cpp (1)

749-761: Consider using relative tolerance for large-magnitude values.

The hardcoded 1e-9 absolute tolerance works for typical problem values but may cause false negatives for very large coefficients (e.g., comparing 1e12 vs 1e12 + 1e-9 would incorrectly pass, while 1e12 vs 1e12 * (1 + 1e-15) might incorrectly fail). For MPS roundtrip testing this is likely acceptable, but a combined relative/absolute tolerance would be more robust.

cpu_optimization_problem_t<i_t, f_t>::cpu_optimization_problem_t(raft::handle_t const* handle_ptr)
: handle_ptr_(handle_ptr)
{
fprintf(stderr, "cpu_optimization_problem_t constructor: Using CPU backend\n");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print statement from constructor.

This fprintf(stderr, ...) will output to stderr on every object construction, polluting user output. Since <utilities/logger.hpp> is already included, use the project's logging facility instead, or remove this debug statement entirely for production code.

🧹 Proposed fix
 cpu_optimization_problem_t<i_t, f_t>::cpu_optimization_problem_t(raft::handle_t const* handle_ptr)
   : handle_ptr_(handle_ptr)
 {
-  fprintf(stderr, "cpu_optimization_problem_t constructor: Using CPU backend\n");
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fprintf(stderr, "cpu_optimization_problem_t constructor: Using CPU backend\n");
cpu_optimization_problem_t<i_t, f_t>::cpu_optimization_problem_t(raft::handle_t const* handle_ptr)
: handle_ptr_(handle_ptr)
{
}
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` at line 30, Remove
the debug stderr print from the cpu_optimization_problem_t constructor (the
fprintf(stderr, ...) call) and either delete it or replace it with the project's
logging macro/function from <utilities/logger.hpp> (use the appropriate
debug/info logger such as LOG_DEBUG or equivalent) so construction no longer
writes to stderr; locate the call inside the cpu_optimization_problem_t
constructor in cpu_optimization_problem.cpp and update it accordingly.

Comment on lines +43 to +61
template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::set_csr_constraint_matrix(const f_t* A_values,
i_t size_values,
const i_t* A_indices,
i_t size_indices,
const i_t* A_offsets,
i_t size_offsets)
{
n_constraints_ = size_offsets - 1;

A_.resize(size_values);
A_indices_.resize(size_indices);
A_offsets_.resize(size_offsets);

std::copy(A_values, A_values + size_values, A_.begin());
std::copy(A_indices, A_indices + size_indices, A_indices_.begin());
std::copy(A_offsets, A_offsets + size_offsets, A_offsets_.begin());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the actual file and examine the full implementation
cd /tmp
find . -name "cpu_optimization_problem.cpp" -o -name "cpu_optimization_problem.hpp" 2>/dev/null | head -20

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for the file in the repository
git ls-files | grep -E "(cpu_optimization_problem|optimization_problem)" | head -20

Repository: NVIDIA/cuopt

Length of output: 719


🏁 Script executed:

# Check how set_csr_constraint_matrix is called
rg "set_csr_constraint_matrix" -A 3 -B 3 --type cpp --type cu

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Examine the full implementation of the cpu_optimization_problem class
fd "cpu_optimization_problem" --type f

Repository: NVIDIA/cuopt

Length of output: 186


🏁 Script executed:

# Search for calls to set_csr_constraint_matrix without specifying cu type
rg "set_csr_constraint_matrix" -A 3 -B 3 --type cpp

Repository: NVIDIA/cuopt

Length of output: 16177


🏁 Script executed:

# Read the implementation file to see current state
cat -n cpp/src/linear_programming/cpu_optimization_problem.cpp | head -100

Repository: NVIDIA/cuopt

Length of output: 4013


🏁 Script executed:

# Find and read the header file
fd "cpu_optimization_problem" -e hpp

Repository: NVIDIA/cuopt

Length of output: 130


🏁 Script executed:

# Read the header file
cat cpp/include/cuopt/linear_programming/cpu_optimization_problem.hpp | head -150

Repository: NVIDIA/cuopt

Length of output: 153


🏁 Script executed:

# Look for the actual header file with correct search
fd "cpu_optimization_problem" -e hpp -e h | head -20

Repository: NVIDIA/cuopt

Length of output: 130


🏁 Script executed:

# Examine more of the cpp file, especially around set_quadratic_objective_matrix
cat -n cpp/src/linear_programming/cpu_optimization_problem.cpp | sed -n '90,110p'

Repository: NVIDIA/cuopt

Length of output: 908


🏁 Script executed:

# Check cuopt_c.cpp to see how it validates inputs before calling set_csr_constraint_matrix
rg -n "set_csr_constraint_matrix" -B 15 -A 5 cpp/src/linear_programming/cuopt_c.cpp | head -80

Repository: NVIDIA/cuopt

Length of output: 4964


🏁 Script executed:

# Look at optimization_problem_utils to understand the safe usage patterns
rg -n "set_csr_constraint_matrix" -B 5 -A 5 cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp

Repository: NVIDIA/cuopt

Length of output: 1563


🏁 Script executed:

# Check other setters in cpu_optimization_problem.cpp for validation patterns
cat -n cpp/src/linear_programming/cpu_optimization_problem.cpp | sed -n '62,90p'

Repository: NVIDIA/cuopt

Length of output: 1070


🏁 Script executed:

# Check if optimization_problem.cu (GPU version) has validation
rg "set_csr_constraint_matrix" -A 10 cpp/src/linear_programming/optimization_problem.cu | head -40

Repository: NVIDIA/cuopt

Length of output: 1744


🏁 Script executed:

# Check if other public setters in the C++ interface validate
rg "set_variable_lower_bounds|set_constraint_bounds|set_objective_coefficients" -A 5 cpp/src/linear_programming/cpu_optimization_problem.cpp | head -50

Repository: NVIDIA/cuopt

Length of output: 1822


🏁 Script executed:

# Verify the validate_positive_semi_definite parameter is ignored
rg "validate_positive_semi_definite" cpp/src/linear_programming/cpu_optimization_problem.cpp

Repository: NVIDIA/cuopt

Length of output: 95


🏁 Script executed:

# Check which class is actually used as the public API - check cpp/include headers
rg "class optimization_problem_t" -A 5 cpp/include/cuopt/linear_programming/optimization_problem.hpp | head -30

Repository: NVIDIA/cuopt

Length of output: 450


🏁 Script executed:

# Check the interface class to understand the design
cat -n cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp | sed -n '1,100p'

Repository: NVIDIA/cuopt

Length of output: 4452


🏁 Script executed:

# Verify: does optimization_problem_t in optimization_problem.cu actually check A_indices and A_offsets too?
rg "set_csr_constraint_matrix" -A 20 cpp/src/linear_programming/optimization_problem.cu | grep -A 20 "^cpp"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check when this code was added - was cpu version ever meant to be production?
git log --oneline --follow cpp/src/linear_programming/cpu_optimization_problem.cpp | head -10

Repository: NVIDIA/cuopt

Length of output: 134


🏁 Script executed:

# Check if A_indices and A_offsets are also validated in GPU version
rg "set_csr_constraint_matrix" -A 30 cpp/src/linear_programming/optimization_problem.cu | head -50

Repository: NVIDIA/cuopt

Length of output: 2144


🏁 Script executed:

# Verify the validate_positive_semi_definite is also ignored in GPU version
rg "set_quadratic_objective_matrix" -A 20 cpp/src/linear_programming/optimization_problem.cu | grep -A 15 "validate_positive"

Repository: NVIDIA/cuopt

Length of output: 566


🏁 Script executed:

# Check if there's documentation about intended use of cpu_optimization_problem_t
rg -i "cpu_optimization_problem" cpp/include/ | head -20

Repository: NVIDIA/cuopt

Length of output: 727


Add null pointer validation matching the GPU implementation.

The GPU implementation (optimization_problem_t in optimization_problem.cu) validates pointer arguments using cuopt_expects(), but the CPU implementation lacks these checks. All three pointers (A_values, A_indices, A_offsets) can be dereferenced without validation, creating inconsistency across implementations of the same interface.

Update set_csr_constraint_matrix and other setters to match the GPU validation pattern:

  • Check A_values, A_indices, A_offsets for nullptr when their respective sizes > 0
  • Apply same pattern to set_constraint_bounds, set_objective_coefficients, and set_quadratic_objective_matrix

Additionally, the validate_positive_semi_definite parameter in set_quadratic_objective_matrix is accepted but never used; clarify intent or remove it.

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 43 -
60, The CPU setter cpu_optimization_problem_t::set_csr_constraint_matrix
currently dereferences A_values/A_indices/A_offsets without null checks; add
cuopt_expects-style validations so that when size_values>0 you assert A_values
!= nullptr, when size_indices>0 assert A_indices != nullptr, and when
size_offsets>0 assert A_offsets != nullptr (mirror the GPU
optimization_problem_t checks), then proceed to resize/copy; apply the same
pattern to set_constraint_bounds, set_objective_coefficients, and
set_quadratic_objective_matrix for their pointer/size pairs; finally address the
unused validate_positive_semi_definite parameter on
set_quadratic_objective_matrix by either removing it or implementing the
intended validation branch (choose one and make the signature/implementation
consistent).

Comment on lines +90 to +108
template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::set_quadratic_objective_matrix(
const f_t* Q_values,
i_t size_values,
const i_t* Q_indices,
i_t size_indices,
const i_t* Q_offsets,
i_t size_offsets,
bool validate_positive_semi_definite)
{
Q_values_.resize(size_values);
Q_indices_.resize(size_indices);
Q_offsets_.resize(size_offsets);

std::copy(Q_values, Q_values + size_values, Q_values_.begin());
std::copy(Q_indices, Q_indices + size_indices, Q_indices_.begin());
std::copy(Q_offsets, Q_offsets + size_offsets, Q_offsets_.begin());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

validate_positive_semi_definite parameter is ignored.

The parameter validate_positive_semi_definite is accepted but never used. This could mislead callers who expect PSD validation when setting a quadratic objective. Either implement the validation, document that it's intentionally ignored on the CPU backend, or remove the parameter if validation is never needed here.

📝 Proposed fix: Add documentation or use the parameter
 template <typename i_t, typename f_t>
 void cpu_optimization_problem_t<i_t, f_t>::set_quadratic_objective_matrix(
   const f_t* Q_values,
   i_t size_values,
   const i_t* Q_indices,
   i_t size_indices,
   const i_t* Q_offsets,
   i_t size_offsets,
-  bool validate_positive_semi_definite)
+  [[maybe_unused]] bool validate_positive_semi_definite)
 {
+  // Note: PSD validation is deferred to GPU backend during conversion or remote solve
   Q_values_.resize(size_values);
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 90 -
107, The parameter validate_positive_semi_definite on
cpu_optimization_problem_t<i_t,f_t>::set_quadratic_objective_matrix is currently
ignored; either implement PSD validation or explicitly mark it unused and
document the decision — to be minimal, add a single-line use
(void)validate_positive_semi_definite; plus a short comment inside
set_quadratic_objective_matrix noting the CPU backend does not perform PSD
validation (or alternatively implement a PSD check by converting
Q_values/Q_indices/Q_offsets to a dense symmetric matrix and verifying
non-negative eigenvalues via a helper like
validate_matrix_positive_semi_definite and call it from
set_quadratic_objective_matrix).

Comment on lines +638 to +645
// Set variable bounds
if (!variable_lower_bounds_.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds_.data(),
variable_lower_bounds_.size());
data_model_view.set_variable_upper_bounds(variable_upper_bounds_.data(),
variable_upper_bounds_.size());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Variable upper bounds incorrectly guarded by lower bounds check.

The upper bounds are only set when lower bounds are non-empty. If a problem has upper bounds defined but no explicit lower bounds, the upper bounds will not be written to the MPS file. Each bounds array should be checked independently.

🐛 Proposed fix
   // Set variable bounds
   if (!variable_lower_bounds_.empty()) {
     data_model_view.set_variable_lower_bounds(variable_lower_bounds_.data(),
                                               variable_lower_bounds_.size());
+  }
+  if (!variable_upper_bounds_.empty()) {
     data_model_view.set_variable_upper_bounds(variable_upper_bounds_.data(),
                                               variable_upper_bounds_.size());
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Set variable bounds
if (!variable_lower_bounds_.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds_.data(),
variable_lower_bounds_.size());
data_model_view.set_variable_upper_bounds(variable_upper_bounds_.data(),
variable_upper_bounds_.size());
}
// Set variable bounds
if (!variable_lower_bounds_.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds_.data(),
variable_lower_bounds_.size());
}
if (!variable_upper_bounds_.empty()) {
data_model_view.set_variable_upper_bounds(variable_upper_bounds_.data(),
variable_upper_bounds_.size());
}
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 638 -
644, The code only writes variable_upper_bounds_ when variable_lower_bounds_ is
non-empty; change the guards so each bound array is checked independently: call
data_model_view.set_variable_lower_bounds(...) inside an if
(!variable_lower_bounds_.empty()) block and call
data_model_view.set_variable_upper_bounds(...) inside a separate if
(!variable_upper_bounds_.empty()) block, referencing the existing symbols
variable_lower_bounds_, variable_upper_bounds_, and
data_model_view.set_variable_lower_bounds /
data_model_view.set_variable_upper_bounds to locate and update the logic.

@tmckayus tmckayus requested a review from Kh4ster February 1, 2026 21:02
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`:
- Around line 333-337: The to_gpu_solution(rmm::cuda_stream_view) implementation
currently moves solution_ but is callable on lvalues, enabling a use-after-move;
change its signature to be rvalue-qualified (add &&) so it can only be invoked
on temporaries, and update the base class method declaration in
lp_solution_interface_t to match (rvalue-qualified) so the override remains
valid; ensure the return type stays optimization_problem_solution_t<i_t,f_t> and
that any callers are adjusted to call on rvalues or use std::move if they
deliberately transfer ownership.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 189-203: When target is CPU but has_gpu_warmstart_view is true and
has_gpu_warmstart_data is false, the code must first populate the GPU warmstart
buffers from the Python view (the same routine used in the GPU backend branch
that fills solver_settings->get_pdlp_settings().get_pdlp_warm_start_data())
before calling convert_to_cpu_warmstart; modify the CPU-target branch (around
the convert_to_cpu_warmstart call) to detect has_gpu_warmstart_view &&
!has_gpu_warmstart_data and invoke the same view-to-warmstart population logic
that the GPU branch uses, then call convert_to_cpu_warmstart(...) and assign to
get_cpu_pdlp_warm_start_data().
🧹 Nitpick comments (1)
cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp (1)

59-64: const_cast usage indicates underlying API const-correctness issue.

The get_reduced_cost_size() method casts away const to call get_reduced_cost(). This pattern repeats throughout the class (lines 99-100, 146-150, 154-156, 162-163, etc.). While functional, this suggests optimization_problem_solution_t::get_reduced_cost() and get_pdlp_warm_start_data() should be const-qualified in the underlying class.

Consider marking get_reduced_cost() and get_pdlp_warm_start_data() as const in optimization_problem_solution_t to eliminate the need for const_cast throughout this wrapper.

Comment on lines +333 to +337
optimization_problem_solution_t<i_t, f_t> to_gpu_solution(rmm::cuda_stream_view) override
{
// Already GPU, just move
return std::move(solution_);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

to_gpu_solution() should be rvalue-qualified to prevent use-after-move.

This method moves solution_ and returns it, leaving the wrapper in an invalid state. However, unlike to_linear_programming_ret_t() (line 343) which is rvalue-qualified (&&), this method can be called on lvalues. Subsequent access to the wrapper after calling to_gpu_solution() would be undefined behavior.

🛠️ Proposed fix
-  optimization_problem_solution_t<i_t, f_t> to_gpu_solution(rmm::cuda_stream_view) override
+  optimization_problem_solution_t<i_t, f_t> to_gpu_solution(rmm::cuda_stream_view) && override
   {
     // Already GPU, just move
     return std::move(solution_);
   }

Note: This requires updating the base class interface lp_solution_interface_t to declare the method as rvalue-qualified as well.

🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/gpu_optimization_problem_solution.hpp`
around lines 333 - 337, The to_gpu_solution(rmm::cuda_stream_view)
implementation currently moves solution_ but is callable on lvalues, enabling a
use-after-move; change its signature to be rvalue-qualified (add &&) so it can
only be invoked on temporaries, and update the base class method declaration in
lp_solution_interface_t to match (rvalue-qualified) so the override remains
valid; ensure the return type stays optimization_problem_solution_t<i_t,f_t> and
that any callers are adjusted to call on rvalues or use std::move if they
deliberately transfer ownership.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/linear_programming/cuopt_c.cpp (1)

48-71: ⚠️ Potential issue | 🟠 Major

Ensure exception safety in cuOptReadProblem.

populate_from_mps_data_model(...) is outside the try/catch; if it throws, the exception crosses the C ABI and problem_and_stream leaks. Wrap the populate call and clean up on failure.

🛡️ Suggested fix
-  // Populate interface directly from MPS data model (avoids temporary GPU allocation)
-  populate_from_mps_data_model(problem_and_stream->get_problem(), *mps_data_model_ptr);
+  try {
+    // Populate interface directly from MPS data model (avoids temporary GPU allocation)
+    populate_from_mps_data_model(problem_and_stream->get_problem(), *mps_data_model_ptr);
+  } catch (const std::exception& e) {
+    CUOPT_LOG_INFO("Error populating problem data: %s", e.what());
+    delete problem_and_stream;
+    *problem_ptr = nullptr;
+    return CUOPT_INVALID_ARGUMENT;
+  }

As per coding guidelines “Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths.”

🤖 Fix all issues with AI agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 52-56: The code computes n_constraints_ = size_offsets - 1 without
guarding for size_offsets == 0, which underflows and leaves the CSR arrays (A_,
A_indices_, A_offsets_) in an invalid state; update the beginning of the block
around n_constraints_ to validate size_offsets and either return/throw on zero
(or set n_constraints_ to 0 and clear the CSR vectors) before resizing A_,
A_indices_, and A_offsets_, ensuring any downstream logic sees a consistent
empty CSR state; reference n_constraints_, size_offsets, A_, A_indices_, and
A_offsets_ when applying the guard.

In `@cpp/src/linear_programming/gpu_optimization_problem.cu`:
- Around line 711-716: The current block only writes both bounds when
variable_lower_bounds is non-empty, causing valid upper-only bounds to be
dropped; change the logic so you call
data_model_view.set_variable_lower_bounds(...) only if variable_lower_bounds is
non-empty and call data_model_view.set_variable_upper_bounds(...) independently
if variable_upper_bounds is non-empty (i.e., check variable_upper_bounds.empty()
separately and invoke set_variable_upper_bounds when present), leaving the two
calls decoupled.

In `@cpp/src/linear_programming/solve.cu`:
- Around line 1338-1366: The remote-vs-local routing currently only checks
is_remote_execution_enabled(), ignoring the configured
backend/CUOPT_USE_GPU_MEM; update the branch in solve_lp to first respect
get_backend_type()/CUOPT_USE_GPU_MEM (or check backend_type_t::GPU) so that if
the backend is GPU we force local solve (call solve_lp<i_t,f_t> path) and only
route to problem_interface->solve_lp_remote(settings) when remote is enabled AND
the backend is CPU; make the analogous change in solve_mip to ensure backend
selection overrides remote routing (use symbols is_remote_execution_enabled(),
get_backend_type(), CUOPT_USE_GPU_MEM, solve_lp, solve_mip, and
problem_interface->solve_lp_remote to locate code).

In `@cpp/src/linear_programming/utilities/cython_solve.cu`:
- Around line 249-263: call_solve currently mutates the shared warm-start object
returned by solver_settings->get_pdlp_settings().get_pdlp_warm_start_data(),
causing data races when call_batch_solve runs call_solve in parallel; fix by
ensuring each thread uses its own warm-start instance or by serializing the
stream-reset: either (A) clone the warm-start data at the start of call_solve
(create a local copy of warmstart_data and call set_stream on that copy) so
per-thread state is mutated, or (B) wrap the set_stream sequence (the block
touching current_primal_solution_, current_dual_solution_,
initial_primal_average_, initial_dual_average_, current_ATY_,
sum_primal_solutions_, sum_dual_solutions_,
last_restart_duality_gap_primal_solution_,
last_restart_duality_gap_dual_solution_) in a mutex held by
call_batch_solve/call_solve to prevent concurrent mutations; choose the
per-thread copy approach if you want concurrent solves, and ensure any copies
preserve device memory semantics (deep copy or move) when implementing.

Comment on lines +52 to +56
n_constraints_ = size_offsets - 1;

A_.resize(size_values);
A_indices_.resize(size_indices);
A_offsets_.resize(size_offsets);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard size_offsets before computing n_constraints_.
Line 52 underflows when size_offsets == 0, leaving invalid CSR state. Add a guard/early return or throw.

🐛 Suggested guard
-  n_constraints_ = size_offsets - 1;
+  if (size_offsets == 0) {
+    n_constraints_ = 0;
+    A_.clear();
+    A_indices_.clear();
+    A_offsets_.clear();
+    return;
+  }
+  n_constraints_ = size_offsets - 1;
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 52 -
56, The code computes n_constraints_ = size_offsets - 1 without guarding for
size_offsets == 0, which underflows and leaves the CSR arrays (A_, A_indices_,
A_offsets_) in an invalid state; update the beginning of the block around
n_constraints_ to validate size_offsets and either return/throw on zero (or set
n_constraints_ to 0 and clear the CSR vectors) before resizing A_, A_indices_,
and A_offsets_, ensuring any downstream logic sees a consistent empty CSR state;
reference n_constraints_, size_offsets, A_, A_indices_, and A_offsets_ when
applying the guard.

Comment on lines +711 to +716
if (!variable_lower_bounds.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds.data(),
variable_lower_bounds.size());
data_model_view.set_variable_upper_bounds(variable_upper_bounds.data(),
variable_upper_bounds.size());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Write upper bounds even when lower bounds are absent.
Upper bounds are only set when lower bounds exist; this drops valid upper-only bounds.

🐛 Proposed fix
   if (!variable_lower_bounds.empty()) {
     data_model_view.set_variable_lower_bounds(variable_lower_bounds.data(),
                                               variable_lower_bounds.size());
-    data_model_view.set_variable_upper_bounds(variable_upper_bounds.data(),
-                                              variable_upper_bounds.size());
   }
+  if (!variable_upper_bounds.empty()) {
+    data_model_view.set_variable_upper_bounds(variable_upper_bounds.data(),
+                                              variable_upper_bounds.size());
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!variable_lower_bounds.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds.data(),
variable_lower_bounds.size());
data_model_view.set_variable_upper_bounds(variable_upper_bounds.data(),
variable_upper_bounds.size());
}
if (!variable_lower_bounds.empty()) {
data_model_view.set_variable_lower_bounds(variable_lower_bounds.data(),
variable_lower_bounds.size());
}
if (!variable_upper_bounds.empty()) {
data_model_view.set_variable_upper_bounds(variable_upper_bounds.data(),
variable_upper_bounds.size());
}
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/gpu_optimization_problem.cu` around lines 711 -
716, The current block only writes both bounds when variable_lower_bounds is
non-empty, causing valid upper-only bounds to be dropped; change the logic so
you call data_model_view.set_variable_lower_bounds(...) only if
variable_lower_bounds is non-empty and call
data_model_view.set_variable_upper_bounds(...) independently if
variable_upper_bounds is non-empty (i.e., check variable_upper_bounds.empty()
separately and invoke set_variable_upper_bounds when present), leaving the two
calls decoupled.

Comment on lines 1338 to 1366
// Check if remote execution is enabled
if (is_remote_execution_enabled()) {
CUOPT_LOG_INFO("Remote LP solve requested");
return problem_interface->solve_lp_remote(settings);
} else {
// Local execution - convert to optimization_problem_t and call original solve_lp
CUOPT_LOG_INFO("Local LP solve");

// NOTE: We could theoretically allocate GPU resources here for a CPU problem,
// but we are not currently supporting local solve of a problem that has been
// built on the CPU. CPU problems are intended for remote execution only.
// If local solving is needed, create the problem with GPU backend from the start.
auto* cpu_prob = dynamic_cast<cpu_optimization_problem_t<i_t, f_t>*>(problem_interface);
if (cpu_prob != nullptr) {
CUOPT_LOG_ERROR("Attempted local solve of CPU-backed problem without CUDA resources");
throw cuopt::logic_error(
"Local solve of CPU-backed problems is not supported. "
"CPU problems are intended for remote execution only. "
"For local solving, create the problem with GPU backend (CUOPT_USE_GPU_MEM=true).",
cuopt::error_type_t::ValidationError);
}

auto op_problem = problem_interface->to_optimization_problem();
auto gpu_solution = solve_lp<i_t, f_t>(
op_problem, settings, problem_checking, use_pdlp_solver_mode, is_batch_mode);

// Wrap GPU solution in interface and return
return std::make_unique<gpu_lp_solution_t<i_t, f_t>>(std::move(gpu_solution));
}
Copy link

@coderabbitai coderabbitai bot Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Respect CUOPT_USE_GPU_MEM when deciding remote vs. local solve.

solve_lp/solve_mip only checks is_remote_execution_enabled(), so a user who sets CUOPT_USE_GPU_MEM=true still gets remote routing if remote env vars are present. This conflicts with get_backend_type() and makes the override ineffective.

🔧 Suggested fix (align with backend selection)
-  if (is_remote_execution_enabled()) {
+  if (is_remote_execution_enabled() && !force_gpu_memory()) {
     CUOPT_LOG_INFO("Remote LP solve requested");
     return problem_interface->solve_lp_remote(settings);
   } else {

(Apply the same change in solve_mip.)

Also applies to: 1374-1401

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solve.cu` around lines 1338 - 1366, The
remote-vs-local routing currently only checks is_remote_execution_enabled(),
ignoring the configured backend/CUOPT_USE_GPU_MEM; update the branch in solve_lp
to first respect get_backend_type()/CUOPT_USE_GPU_MEM (or check
backend_type_t::GPU) so that if the backend is GPU we force local solve (call
solve_lp<i_t,f_t> path) and only route to
problem_interface->solve_lp_remote(settings) when remote is enabled AND the
backend is CPU; make the analogous change in solve_mip to ensure backend
selection overrides remote routing (use symbols is_remote_execution_enabled(),
get_backend_type(), CUOPT_USE_GPU_MEM, solve_lp, solve_mip, and
problem_interface->solve_lp_remote to locate code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at this. The intent is to retain a path for doing remote solve of a local problem in gpu memory, in case we invent a way to manipulate problems or solutions on a GPU before/after submission, but the user wants to execute the actual solve on a remote machine with a better (or specific) GPU

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus, thank you for clarifying the design intent! That makes perfect sense—supporting remote execution of GPU-resident problems enables important workflows where users want to leverage local GPU resources for problem manipulation while offloading the actual solve to a more powerful remote GPU.

The current implementation correctly supports this by checking only is_remote_execution_enabled() without requiring CPU backend, which allows the GPU→remote→GPU pipeline you described.

To help future reviewers understand this design choice, consider adding a code comment near the remote execution check, something like:

// Check if remote execution is enabled
// Note: We intentionally don't check force_gpu_memory() here to support
// remote solving of GPU-resident problems (e.g., for GPU manipulation
// before/after submission with remote execution on a better GPU)
if (is_remote_execution_enabled()) {

This would document the rationale and prevent similar review comments in the future.


✏️ Learnings added
Learnt from: tmckayus
Repo: NVIDIA/cuopt PR: 819
File: cpp/src/linear_programming/solve.cu:1338-1366
Timestamp: 2026-02-02T17:16:48.886Z
Learning: In `cpp/src/linear_programming/solve.cu`, the remote vs. local solve routing intentionally checks only `is_remote_execution_enabled()` without checking `force_gpu_memory()` or backend type. This design supports remote solving of GPU-resident problems, enabling workflows where users manipulate problems/solutions on a local GPU but execute the solve on a remote machine with a better GPU.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 721
File: cpp/src/linear_programming/optimization_problem.cu:106-205
Timestamp: 2026-01-29T11:25:28.213Z
Learning: The function `csr_matrices_equivalent_with_permutation` in `cpp/src/linear_programming/optimization_problem.cu` is only used in CI testing with known inputs, so strict problem size validation is not required.

Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Comment on lines +249 to +171
// Reset warmstart data streams in solver_settings
auto& warmstart_data = solver_settings->get_pdlp_settings().get_pdlp_warm_start_data();
if (warmstart_data.current_primal_solution_.size() > 0) {
warmstart_data.current_primal_solution_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.current_dual_solution_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.initial_primal_average_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.initial_dual_average_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.current_ATY_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.sum_primal_solutions_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.sum_dual_solutions_.set_stream(rmm::cuda_stream_per_thread);
warmstart_data.last_restart_duality_gap_primal_solution_.set_stream(
rmm::cuda_stream_per_thread);
warmstart_data.last_restart_duality_gap_dual_solution_.set_stream(
rmm::cuda_stream_per_thread);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid data races on shared warm‑start data in batch mode.

call_batch_solve executes call_solve in parallel while call_solve mutates solver_settings->get_pdlp_settings().get_pdlp_warm_start_data(). This is a shared object and can lead to data races/UB in batch mode. Consider guarding this mutation or using per-thread settings.

🔧 Minimal guard to avoid cross‑thread mutation
-  // Reset warmstart data streams in solver_settings
-  auto& warmstart_data = solver_settings->get_pdlp_settings().get_pdlp_warm_start_data();
-  if (warmstart_data.current_primal_solution_.size() > 0) {
-    warmstart_data.current_primal_solution_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.current_dual_solution_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.initial_primal_average_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.initial_dual_average_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.current_ATY_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.sum_primal_solutions_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.sum_dual_solutions_.set_stream(rmm::cuda_stream_per_thread);
-    warmstart_data.last_restart_duality_gap_primal_solution_.set_stream(
-      rmm::cuda_stream_per_thread);
-    warmstart_data.last_restart_duality_gap_dual_solution_.set_stream(
-      rmm::cuda_stream_per_thread);
-  }
+  // Reset warmstart data streams in solver_settings (avoid cross-thread mutation in batch mode)
+  if (!is_batch_mode) {
+    auto& warmstart_data = solver_settings->get_pdlp_settings().get_pdlp_warm_start_data();
+    if (warmstart_data.current_primal_solution_.size() > 0) {
+      warmstart_data.current_primal_solution_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.current_dual_solution_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.initial_primal_average_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.initial_dual_average_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.current_ATY_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.sum_primal_solutions_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.sum_dual_solutions_.set_stream(rmm::cuda_stream_per_thread);
+      warmstart_data.last_restart_duality_gap_primal_solution_.set_stream(
+        rmm::cuda_stream_per_thread);
+      warmstart_data.last_restart_duality_gap_dual_solution_.set_stream(
+        rmm::cuda_stream_per_thread);
+    }
+  }

As per coding guidelines “Ensure race conditions are absent in multi‑GPU code and multi‑threaded server implementations; verify proper synchronization of shared state.”

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/utilities/cython_solve.cu` around lines 249 - 263,
call_solve currently mutates the shared warm-start object returned by
solver_settings->get_pdlp_settings().get_pdlp_warm_start_data(), causing data
races when call_batch_solve runs call_solve in parallel; fix by ensuring each
thread uses its own warm-start instance or by serializing the stream-reset:
either (A) clone the warm-start data at the start of call_solve (create a local
copy of warmstart_data and call set_stream on that copy) so per-thread state is
mutated, or (B) wrap the set_stream sequence (the block touching
current_primal_solution_, current_dual_solution_, initial_primal_average_,
initial_dual_average_, current_ATY_, sum_primal_solutions_, sum_dual_solutions_,
last_restart_duality_gap_primal_solution_,
last_restart_duality_gap_dual_solution_) in a mutex held by
call_batch_solve/call_solve to prevent concurrent mutations; choose the
per-thread copy approach if you want concurrent solves, and ensure any copies
preserve device memory semantics (deep copy or move) when implementing.

constraint_matrix_row_offsets == nullptr || constraint_matrix_column_indices == nullptr ||
constraint_matrix_coefficent_values == nullptr || constraint_lower_bounds == nullptr ||
constraint_upper_bounds == nullptr || variable_lower_bounds == nullptr ||
variable_upper_bounds == nullptr || variable_types == nullptr) {
Copy link
Contributor Author

@tmckayus tmckayus Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm need to check on why this change was made. variable_types null seems like a valid check.

Ah, right, below it defaults to Continuous if vtypes is null. Wdyt? Valid? Or flag it as an error

problem_and_stream_view->stream_view);
problem_and_stream_view->stream_view.synchronize();

// Use polymorphic method - no backend branching needed!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments like this can go, they're from the refactor :)

Copy link
Collaborator

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus Changes looks good on system side, have few questions. But other than that looks awesome.

Comment on lines +128 to +131
cdef extern from "<rmm/device_buffer.hpp>" namespace "rmm":
cdef cppclass device_buffer:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus this is a valid concern

settings.get_pdlp_warm_start_data().last_restart_duality_gap_dual_solution # noqa
)
)
ws_data = settings.get_pdlp_warm_start_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should use different name compared to ws_data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I can update.

data_model_obj = cuopt_mps_parser.ParseMps(file_path)

settings = solver_settings.SolverSettings()
settings.set_parameter(CUOPT_METHOD, SolverMethod.PDLP)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can test CPU and GPU structure ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am working on unit tests now, it should test the conversions between all the types we use and also that when remote solve is enabled we can run with CUDA_VISIBLE_DEVICES=""

@anandhkb anandhkb added this to the 26.02 milestone Feb 2, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@ci/run_ctests.sh`:
- Around line 30-32: The added unconditional invocation of C_API_TEST using
CUOPT_USE_CPU_MEM_FOR_LOCAL can fail when the C_API_TEST binary isn't built;
update the script to check for the executable (e.g., test -x
"${GTEST_DIR}/C_API_TEST" || skip) before setting CUOPT_USE_CPU_MEM_FOR_LOCAL
and running the test, and emit a clear message or silently skip if missing;
ensure you still pass through "$@" and preserve the existing --gtest_filter
argument when guarded.

In `@cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp`:
- Around line 355-374: to_gpu_solution() currently only copies
primal_objective_, dual_objective_ and solve_time_ into termination_stats[0],
dropping residual_, gap_, iterations_ and solved_by_pdlp; update the population
of termination_stats (type additional_info_t) so termination_stats[0].residual =
residual_, .gap = gap_, .iterations = iterations_, and .solved_by_pdlp =
solved_by_pdlp (in addition to the existing fields) before moving
termination_stats into the returned optimization_problem_solution_t to preserve
all termination metadata.

In `@cpp/src/mip/solve.cu`:
- Around line 323-345: The cpu_optimization_problem_t stores a pointer to a
stack-allocated raft::handle_t (set via cpu_prob->set_handle(&handle)), which
becomes dangling after this scope; before returning the CPU solution you must
clear or restore that handle to avoid crashes—call cpu_prob->set_handle(nullptr)
(or restore a saved previous handle if available) after solving (before
constructing/returning the CPU solution) in the block that uses
to_optimization_problem(), solve_mip(), and gpu_mip_solution_t so the cpu_prob
no longer points at the local stack handle.

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1441-1643: Both CPU-only tests assume remote execution env vars
exist and at least one variable; add early skips if remote env is not set and
avoid accessing primal_solution[0] when num_variables == 0. In
test_cpu_only_execution and test_cpu_only_mip_execution, check
getenv("CUOPT_REMOTE_HOST") and getenv("CUOPT_REMOTE_PORT") (or
CUOPT_REMOTE_HOST only if PORT not required) at the start and return/skip the
test if unset; also after retrieving num_variables, if num_variables == 0 set
status to CUOPT_SUCCESS and goto DONE (or skip printing primal_solution[0]) and
only allocate/use primal_solution[0] when num_variables > 0 to prevent
out-of-bounds access.

In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp`:
- Around line 306-320: The tests c_api_cpu_only::lp_solve and mip_solve only
assert CUOPT_SUCCESS; update test_cpu_only_execution and
test_cpu_only_mip_execution to return or populate a result struct containing
termination status, objective value, and solution vector (or a summary), then
modify the tests to assert the solver's termination status (e.g.,
OPTIMAL/FEASIBLE/INFEASIBLE/UNBOUNDED) and compare objective/solution against
known baselines or tolerances for afiro_original.mps and bb_optimality.mps;
additionally add at least one new CPU-only unit test (e.g., TEST(c_api_cpu_only,
degenerate_case)) that uses a small degenerate model (infeasible or unbounded or
singleton) and asserts the correct termination status and numeric values;
reference the functions test_cpu_only_execution, test_cpu_only_mip_execution and
the CPUOnlyTestEnvironment to locate where to return/consume the result struct
and add the new degenerate test.

In `@python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py`:
- Around line 47-247: Update the CPU-only tests to assert numerical correctness
instead of only checking for "PASSED": for each test (test_lp_solve_cpu_only,
test_mip_solve_cpu_only, test_lp_dual_solution_cpu_only,
test_warmstart_cpu_only) call the existing methods
(solution.get_termination_status(), get_primal_objective(),
get_primal_solution(), get_dual_solution(), get_dual_objective(),
get_reduced_cost(), get_pdlp_warm_start_data()) and add assertions that the
termination status equals the expected enum, the objective matches a known
baseline within a small tolerance, and the solution vector length and a few key
entries match expected values (use approx tolerances for floating comparisons);
additionally add a new small degenerate-case test (e.g.,
test_cpu_only_degenerate_cases) that constructs or loads a trivial
infeasible/unbounded/singleton problem and asserts the solver returns the
correct status for each case; use the same subprocess pattern and cpu_only_env,
and prefer numeric comparisons with explicit tolerances rather than string
checks.
- Around line 385-452: Both tests (test_lp_solution_has_dual_info and
test_mip_solution_has_mip_info) must be guarded for CPU-only hosts: either reuse
the existing cpu_only_env fixture or explicitly skip when CUDA is not available
before launching the subprocess; add a check (e.g., a pytest skip or calling
cpu_only_env at the top of each test) so the subprocess scripts are not run on
GPU-less hosts. Also ensure test isolation by resetting any global GPU-related
state before/after the subprocess call (invoke the cpu_only_env fixture or
perform cleanup like calling any existing cuopt/tensor library cache-clear
routines and gc.collect) so cached memory and globals don’t leak between tests.
🧹 Nitpick comments (8)
cpp/src/mip/solve.cu (1)

342-343: Prefer CUOPT_LOG_ over std::cerr for consistency.*

Using std::cerr bypasses configured logging sinks and levels. Consider switching to CUOPT_LOG_INFO to keep logs consistent.

♻️ Suggested tweak
-      std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
+      CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution");
cpp/tests/linear_programming/unit_tests/solution_interface_test.cu (2)

410-467: Assert warm‑start data survives CPU→GPU conversion.

The test currently stops at size checks and a comment after conversion. Add explicit assertions that warm‑start data exists and matches sizes post‑conversion.

✅ Suggested enhancement
     auto gpu_solution = cpu_solution->to_gpu_solution(rmm::cuda_stream_per_thread);
 
-    // Warmstart data should be available in GPU solution
-    // (it gets copied during conversion)
+    EXPECT_TRUE(gpu_solution.has_warm_start_data());
+    EXPECT_EQ(gpu_solution.get_current_primal_solution_host().size(), current_primal.size());
+    EXPECT_EQ(gpu_solution.get_current_dual_solution_host().size(), current_dual.size());

As per coding guidelines “Add tests for problem transformations: verify correctness of original→transformed→postsolve mappings and index consistency across problem representations.”


473-554: Validate copied values, not just sizes.

The copy‑to‑host tests currently only assert vector sizes. Consider validating a few values against the source MPS data to ensure correctness.

✅ Suggested enhancement
   // Test copy_objective_coefficients_to_host
   std::vector<double> obj_coeffs(n_vars);
   problem->copy_objective_coefficients_to_host(obj_coeffs.data(), n_vars);
   EXPECT_EQ(obj_coeffs.size(), static_cast<size_t>(n_vars));
+  for (int i = 0; i < n_vars; ++i) {
+    EXPECT_NEAR(obj_coeffs[i], mps_data.get_objective_coefficients()[i], 1e-12);
+  }

As per coding guidelines “Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).”

cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp (2)

542-548: Mark to_optimization_problem() overrides to catch signature drift.

Adding override on the derived declarations helps the compiler catch signature mismatches.

♻️ Suggested change
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;

Also applies to: 736-736


721-729: Clarify handle lifetime expectations for set_handle().

Consider documenting that the provided raft::handle_t must outlive any conversions and is not retained beyond the caller’s scope, to prevent accidental dangling pointers.

cpp/src/linear_programming/backend_selection.cpp (1)

31-50: Use a safe tolower conversion to avoid UB on signed chars.

::tolower expects unsigned char (or EOF). Using it directly on char can be undefined for non‑ASCII values.

♻️ Suggested tweak
-    std::transform(value.begin(), value.end(), value.begin(), ::tolower);
+    std::transform(value.begin(), value.end(), value.begin(),
+                   [](unsigned char c) { return static_cast<char>(std::tolower(c)); });
python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py (1)

286-379: CLI CPU-only tests should verify solver output, not just absence of CUDA errors.

Consider parsing cuopt_cli output to assert termination status/objective (or known baselines) so regressions don’t slip through when exit code remains 0.

As per coding guidelines: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

cpp/src/linear_programming/solve.cu (1)

1369-1371: Prefer CUOPT_LOG_ over std::cerr in library code.*

This bypasses the logging system and makes output inconsistent.

📝 Suggested change
-      std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
+      CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution");

Comment on lines +30 to +32
# Run C_API_TEST with CPU memory for local solves (excluding time limit tests)
echo "Running gtest C_API_TEST with CUOPT_USE_CPU_MEM_FOR_LOCAL"
CUOPT_USE_CPU_MEM_FOR_LOCAL=1 "${GTEST_DIR}/C_API_TEST" --gtest_filter=-c_api/TimeLimitTestFixture.* "$@"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard the extra C_API_TEST run when the binary is absent.

The new unconditional invocation will fail in configurations that skip building C_API_TEST. Add an existence check to keep CI/dev flows resilient.

🛡️ Proposed guard
-echo "Running gtest C_API_TEST with CUOPT_USE_CPU_MEM_FOR_LOCAL"
-CUOPT_USE_CPU_MEM_FOR_LOCAL=1 "${GTEST_DIR}/C_API_TEST" --gtest_filter=-c_api/TimeLimitTestFixture.* "$@"
+if [[ -x "${GTEST_DIR}/C_API_TEST" ]]; then
+  echo "Running gtest C_API_TEST with CUOPT_USE_CPU_MEM_FOR_LOCAL"
+  CUOPT_USE_CPU_MEM_FOR_LOCAL=1 "${GTEST_DIR}/C_API_TEST" --gtest_filter=-c_api/TimeLimitTestFixture.* "$@"
+else
+  echo "Skipping C_API_TEST (binary not found)"
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Run C_API_TEST with CPU memory for local solves (excluding time limit tests)
echo "Running gtest C_API_TEST with CUOPT_USE_CPU_MEM_FOR_LOCAL"
CUOPT_USE_CPU_MEM_FOR_LOCAL=1 "${GTEST_DIR}/C_API_TEST" --gtest_filter=-c_api/TimeLimitTestFixture.* "$@"
# Run C_API_TEST with CPU memory for local solves (excluding time limit tests)
if [[ -x "${GTEST_DIR}/C_API_TEST" ]]; then
echo "Running gtest C_API_TEST with CUOPT_USE_CPU_MEM_FOR_LOCAL"
CUOPT_USE_CPU_MEM_FOR_LOCAL=1 "${GTEST_DIR}/C_API_TEST" --gtest_filter=-c_api/TimeLimitTestFixture.* "$@"
else
echo "Skipping C_API_TEST (binary not found)"
fi
🤖 Prompt for AI Agents
In `@ci/run_ctests.sh` around lines 30 - 32, The added unconditional invocation of
C_API_TEST using CUOPT_USE_CPU_MEM_FOR_LOCAL can fail when the C_API_TEST binary
isn't built; update the script to check for the executable (e.g., test -x
"${GTEST_DIR}/C_API_TEST" || skip) before setting CUOPT_USE_CPU_MEM_FOR_LOCAL
and running the test, and emit a clear message or silently skip if missing;
ensure you still pass through "$@" and preserve the existing --gtest_filter
argument when guarded.

Comment on lines +355 to +374
// Create termination information
using additional_info_t =
typename optimization_problem_solution_t<i_t, f_t>::additional_termination_information_t;
std::vector<additional_info_t> termination_stats(1);
termination_stats[0].primal_objective = primal_objective_;
termination_stats[0].dual_objective = dual_objective_;
termination_stats[0].solve_time = solve_time_;

std::vector<pdlp_termination_status_t> termination_status_vec = {termination_status_};

// Create GPU solution
return optimization_problem_solution_t<i_t, f_t>(primal_device,
dual_device,
reduced_cost_device,
"", // objective_name
{}, // var_names
{}, // row_names
std::move(termination_stats),
std::move(termination_status_vec));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve termination metadata when converting CPU → GPU.

to_gpu_solution() only fills objective and solve_time, so residuals/gap/iteration counts (and solved_by_pdlp) are silently dropped. Populate the additional termination info from the stored members to keep metadata consistent.

✅ Suggested completion of termination metadata
     std::vector<additional_info_t> termination_stats(1);
-    termination_stats[0].primal_objective = primal_objective_;
-    termination_stats[0].dual_objective   = dual_objective_;
-    termination_stats[0].solve_time       = solve_time_;
+    termination_stats[0].primal_objective      = primal_objective_;
+    termination_stats[0].dual_objective        = dual_objective_;
+    termination_stats[0].solve_time            = solve_time_;
+    termination_stats[0].l2_primal_residual    = l2_primal_residual_;
+    termination_stats[0].l2_dual_residual      = l2_dual_residual_;
+    termination_stats[0].gap                   = gap_;
+    termination_stats[0].number_of_steps_taken = num_iterations_;
+    termination_stats[0].total_number_of_attempted_steps = num_iterations_;
+    termination_stats[0].solved_by_pdlp        = solved_by_pdlp_;
🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/cpu_optimization_problem_solution.hpp`
around lines 355 - 374, to_gpu_solution() currently only copies
primal_objective_, dual_objective_ and solve_time_ into termination_stats[0],
dropping residual_, gap_, iterations_ and solved_by_pdlp; update the population
of termination_stats (type additional_info_t) so termination_stats[0].residual =
residual_, .gap = gap_, .iterations = iterations_, and .solved_by_pdlp =
solved_by_pdlp (in addition to the existing fields) before moving
termination_stats into the returned optimization_problem_solution_t to preserve
all termination metadata.

Comment on lines +323 to +394
// Check if this is a CPU problem (test mode: CUOPT_USE_CPU_MEM_FOR_LOCAL=true)
auto* cpu_prob = dynamic_cast<cpu_optimization_problem_t<i_t, f_t>*>(problem_interface);
if (cpu_prob != nullptr) {
CUOPT_LOG_INFO("Test mode: Converting CPU problem to GPU for local MIP solve");

// Create CUDA resources for the conversion
rmm::cuda_stream stream;
raft::handle_t handle(stream);

// Set the handle on the CPU problem so it can create GPU resources
cpu_prob->set_handle(&handle);

// Convert CPU problem to GPU problem
auto op_problem = cpu_prob->to_optimization_problem();

// Solve on GPU
auto gpu_solution = solve_mip<i_t, f_t>(op_problem, settings);

// Wrap in GPU solution interface and convert to CPU solution
std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
gpu_mip_solution_t<i_t, f_t> gpu_sol_interface(std::move(gpu_solution));
return gpu_sol_interface.to_cpu_solution();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear the temporary CUDA handle to avoid a dangling pointer on CPU problems.

cpu_prob->set_handle(&handle) stores a pointer to a stack raft::handle_t. After returning, cpu_prob retains a dangling handle and can crash on later conversions. Clear (or restore) the handle before returning the CPU solution.

🛠️ Suggested fix
-      gpu_mip_solution_t<i_t, f_t> gpu_sol_interface(std::move(gpu_solution));
-      return gpu_sol_interface.to_cpu_solution();
+      gpu_mip_solution_t<i_t, f_t> gpu_sol_interface(std::move(gpu_solution));
+      auto cpu_solution = gpu_sol_interface.to_cpu_solution();
+      cpu_prob->set_handle(nullptr);  // avoid dangling handle after local conversion
+      return cpu_solution;
🤖 Prompt for AI Agents
In `@cpp/src/mip/solve.cu` around lines 323 - 345, The cpu_optimization_problem_t
stores a pointer to a stack-allocated raft::handle_t (set via
cpu_prob->set_handle(&handle)), which becomes dangling after this scope; before
returning the CPU solution you must clear or restore that handle to avoid
crashes—call cpu_prob->set_handle(nullptr) (or restore a saved previous handle
if available) after solving (before constructing/returning the CPU solution) in
the block that uses to_optimization_problem(), solve_mip(), and
gpu_mip_solution_t so the cpu_prob no longer points at the local stack handle.

Comment on lines +1441 to +1818
/**
* Test CPU-only execution with CUDA_VISIBLE_DEVICES="" and remote execution enabled.
* This simulates a CPU host without GPU access.
* Note: Environment variables must be set before calling this function.
*/
cuopt_int_t test_cpu_only_execution(const char* filename)
{
cuOptOptimizationProblem problem = NULL;
cuOptSolverSettings settings = NULL;
cuOptSolution solution = NULL;
cuopt_int_t status;
cuopt_int_t termination_status;
cuopt_float_t objective_value;
cuopt_float_t solve_time;
cuopt_int_t num_variables;
cuopt_int_t num_constraints;
cuopt_float_t* primal_solution = NULL;

printf("Testing CPU-only execution (simulated remote mode)...\n");
printf(" CUDA_VISIBLE_DEVICES=%s\n", getenv("CUDA_VISIBLE_DEVICES") ? getenv("CUDA_VISIBLE_DEVICES") : "(not set)");
printf(" CUOPT_REMOTE_HOST=%s\n", getenv("CUOPT_REMOTE_HOST") ? getenv("CUOPT_REMOTE_HOST") : "(not set)");

status = cuOptReadProblem(filename, &problem);
if (status != CUOPT_SUCCESS) {
printf("Error reading problem: %d\n", status);
goto DONE;
}

status = cuOptGetNumVariables(problem, &num_variables);
if (status != CUOPT_SUCCESS) {
printf("Error getting num variables: %d\n", status);
goto DONE;
}

status = cuOptGetNumConstraints(problem, &num_constraints);
if (status != CUOPT_SUCCESS) {
printf("Error getting num constraints: %d\n", status);
goto DONE;
}

printf(" Problem: %d variables, %d constraints\n", num_variables, num_constraints);

primal_solution = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));
if (primal_solution == NULL) {
printf("Error allocating primal solution\n");
status = -1;
goto DONE;
}

status = cuOptCreateSolverSettings(&settings);
if (status != CUOPT_SUCCESS) {
printf("Error creating solver settings: %d\n", status);
goto DONE;
}

status = cuOptSetIntegerParameter(settings, CUOPT_METHOD, CUOPT_METHOD_PDLP);
if (status != CUOPT_SUCCESS) {
printf("Error setting method: %d\n", status);
goto DONE;
}

status = cuOptSolve(problem, settings, &solution);
if (status != CUOPT_SUCCESS) {
printf("Error solving problem: %d\n", status);
goto DONE;
}

/* Verify we can retrieve all solution properties without CUDA errors */
status = cuOptGetTerminationStatus(solution, &termination_status);
if (status != CUOPT_SUCCESS) {
printf("Error getting termination status: %d\n", status);
goto DONE;
}

status = cuOptGetObjectiveValue(solution, &objective_value);
if (status != CUOPT_SUCCESS) {
printf("Error getting objective value: %d\n", status);
goto DONE;
}

status = cuOptGetSolveTime(solution, &solve_time);
if (status != CUOPT_SUCCESS) {
printf("Error getting solve time: %d\n", status);
goto DONE;
}

status = cuOptGetPrimalSolution(solution, primal_solution);
if (status != CUOPT_SUCCESS) {
printf("Error getting primal solution: %d\n", status);
goto DONE;
}

printf("CPU-only execution test passed\n");
printf(" Termination status: %s\n", termination_status_to_string(termination_status));
printf(" Objective value: %f\n", objective_value);
printf(" Solve time: %f\n", solve_time);
printf(" Primal solution[0]: %f\n", primal_solution[0]);

status = CUOPT_SUCCESS;

DONE:
free(primal_solution);
cuOptDestroyProblem(&problem);
cuOptDestroySolverSettings(&settings);
cuOptDestroySolution(&solution);
return status;
}

/**
* Test CPU-only MIP execution with CUDA_VISIBLE_DEVICES="" and remote execution enabled.
*/
cuopt_int_t test_cpu_only_mip_execution(const char* filename)
{
cuOptOptimizationProblem problem = NULL;
cuOptSolverSettings settings = NULL;
cuOptSolution solution = NULL;
cuopt_int_t status;
cuopt_int_t termination_status;
cuopt_float_t objective_value;
cuopt_float_t solve_time;
cuopt_float_t mip_gap;
cuopt_int_t num_variables;
cuopt_float_t* primal_solution = NULL;

printf("Testing CPU-only MIP execution (simulated remote mode)...\n");

status = cuOptReadProblem(filename, &problem);
if (status != CUOPT_SUCCESS) {
printf("Error reading MIP problem: %d\n", status);
goto DONE;
}

status = cuOptGetNumVariables(problem, &num_variables);
if (status != CUOPT_SUCCESS) {
printf("Error getting num variables: %d\n", status);
goto DONE;
}

primal_solution = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));

status = cuOptCreateSolverSettings(&settings);
if (status != CUOPT_SUCCESS) {
printf("Error creating solver settings: %d\n", status);
goto DONE;
}

status = cuOptSetFloatParameter(settings, CUOPT_TIME_LIMIT, 60.0);
if (status != CUOPT_SUCCESS) {
printf("Error setting time limit: %d\n", status);
goto DONE;
}

status = cuOptSolve(problem, settings, &solution);
if (status != CUOPT_SUCCESS) {
printf("Error solving MIP: %d\n", status);
goto DONE;
}

status = cuOptGetTerminationStatus(solution, &termination_status);
if (status != CUOPT_SUCCESS) {
printf("Error getting termination status: %d\n", status);
goto DONE;
}

status = cuOptGetObjectiveValue(solution, &objective_value);
if (status != CUOPT_SUCCESS) {
printf("Error getting objective value: %d\n", status);
goto DONE;
}

status = cuOptGetSolveTime(solution, &solve_time);
if (status != CUOPT_SUCCESS) {
printf("Error getting solve time: %d\n", status);
goto DONE;
}

status = cuOptGetMIPGap(solution, &mip_gap);
if (status != CUOPT_SUCCESS) {
printf("Error getting MIP gap: %d\n", status);
goto DONE;
}

status = cuOptGetPrimalSolution(solution, primal_solution);
if (status != CUOPT_SUCCESS) {
printf("Error getting primal solution: %d\n", status);
goto DONE;
}

printf("CPU-only MIP execution test passed\n");
printf(" Termination status: %s\n", termination_status_to_string(termination_status));
printf(" Objective value: %f\n", objective_value);
printf(" MIP gap: %f\n", mip_gap);
printf(" Solve time: %f\n", solve_time);

status = CUOPT_SUCCESS;

DONE:
free(primal_solution);
cuOptDestroyProblem(&problem);
cuOptDestroySolverSettings(&settings);
cuOptDestroySolution(&solution);
return status;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard CPU‑only tests for missing env vars and empty problems.

These tests assume remote execution is enabled and at least one variable exists. Add a fast precheck (skip/return) if CUOPT_REMOTE_HOST/PORT are unset, and guard primal_solution[0] when num_variables == 0.

🛠️ Suggested safeguards
   printf("Testing CPU-only execution (simulated remote mode)...\n");
   printf("  CUDA_VISIBLE_DEVICES=%s\n", getenv("CUDA_VISIBLE_DEVICES") ? getenv("CUDA_VISIBLE_DEVICES") : "(not set)");
   printf("  CUOPT_REMOTE_HOST=%s\n", getenv("CUOPT_REMOTE_HOST") ? getenv("CUOPT_REMOTE_HOST") : "(not set)");
+  if (getenv("CUOPT_REMOTE_HOST") == NULL || getenv("CUOPT_REMOTE_PORT") == NULL) {
+    printf("Skipping CPU-only test: remote execution not configured.\n");
+    return CUOPT_INVALID_ARGUMENT;
+  }
...
-  printf("  Primal solution[0]: %f\n", primal_solution[0]);
+  if (num_variables > 0) {
+    printf("  Primal solution[0]: %f\n", primal_solution[0]);
+  }
   primal_solution = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));
+  if (primal_solution == NULL && num_variables > 0) {
+    printf("Error allocating primal solution\n");
+    status = -1;
+    goto DONE;
+  }
🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c` around lines 1441 -
1643, Both CPU-only tests assume remote execution env vars exist and at least
one variable; add early skips if remote env is not set and avoid accessing
primal_solution[0] when num_variables == 0. In test_cpu_only_execution and
test_cpu_only_mip_execution, check getenv("CUOPT_REMOTE_HOST") and
getenv("CUOPT_REMOTE_PORT") (or CUOPT_REMOTE_HOST only if PORT not required) at
the start and return/skip the test if unset; also after retrieving
num_variables, if num_variables == 0 set status to CUOPT_SUCCESS and goto DONE
(or skip printing primal_solution[0]) and only allocate/use primal_solution[0]
when num_variables > 0 to prevent out-of-bounds access.

Comment on lines +306 to +324
TEST(c_api_cpu_only, lp_solve)
{
CPUOnlyTestEnvironment env;
const std::string& rapidsDatasetRootDir = cuopt::test::get_rapids_dataset_root_dir();
std::string lp_file = rapidsDatasetRootDir + "/linear_programming/afiro_original.mps";
EXPECT_EQ(test_cpu_only_execution(lp_file.c_str()), CUOPT_SUCCESS);
}

TEST(c_api_cpu_only, mip_solve)
{
CPUOnlyTestEnvironment env;
const std::string& rapidsDatasetRootDir = cuopt::test::get_rapids_dataset_root_dir();
std::string mip_file = rapidsDatasetRootDir + "/mip/bb_optimality.mps";
EXPECT_EQ(test_cpu_only_mip_execution(mip_file.c_str()), CUOPT_SUCCESS);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CPU-only C API tests should assert numerical correctness.

These tests only check CUOPT_SUCCESS. Please extend test_cpu_only_execution() / test_cpu_only_mip_execution() to validate termination status and objective/solution values (or known baselines) and include at least one degenerate CPU-only case.

As per coding guidelines: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp` around lines 306 -
320, The tests c_api_cpu_only::lp_solve and mip_solve only assert CUOPT_SUCCESS;
update test_cpu_only_execution and test_cpu_only_mip_execution to return or
populate a result struct containing termination status, objective value, and
solution vector (or a summary), then modify the tests to assert the solver's
termination status (e.g., OPTIMAL/FEASIBLE/INFEASIBLE/UNBOUNDED) and compare
objective/solution against known baselines or tolerances for afiro_original.mps
and bb_optimality.mps; additionally add at least one new CPU-only unit test
(e.g., TEST(c_api_cpu_only, degenerate_case)) that uses a small degenerate model
(infeasible or unbounded or singleton) and asserts the correct termination
status and numeric values; reference the functions test_cpu_only_execution,
test_cpu_only_mip_execution and the CPUOnlyTestEnvironment to locate where to
return/consume the result struct and add the new degenerate test.

Comment on lines +47 to +247
def test_lp_solve_cpu_only(self, cpu_only_env):
"""Test LP solving works in CPU-only mode."""
test_script = f'''
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUOPT_REMOTE_HOST"] = "localhost"
os.environ["CUOPT_REMOTE_PORT"] = "12345"

from cuopt import linear_programming
import cuopt_mps_parser

# Parse MPS file
mps_file = "{RAPIDS_DATASET_ROOT_DIR}/linear_programming/afiro_original.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)

# Create solver settings
settings = linear_programming.SolverSettings()

# Solve - this should use CPU backend due to remote execution mode
try:
solution = linear_programming.Solve(data_model, settings)

# Verify we can access solution properties
status = solution.get_termination_status()
objective = solution.get_primal_objective()
solve_time = solution.get_solve_time()
primal = solution.get_primal_solution()

print(f"Status: {{status}}")
print(f"Objective: {{objective}}")
print(f"Solve time: {{solve_time}}")
print(f"Primal size: {{len(primal)}}")
print("CPU-only LP solve: PASSED")
except Exception as e:
print(f"CPU-only LP solve: FAILED - {{e}}")
raise
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
env=cpu_only_env,
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout

def test_mip_solve_cpu_only(self, cpu_only_env):
"""Test MIP solving works in CPU-only mode."""
test_script = f'''
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUOPT_REMOTE_HOST"] = "localhost"
os.environ["CUOPT_REMOTE_PORT"] = "12345"

from cuopt import linear_programming
from cuopt.linear_programming.solver.solver_parameters import CUOPT_TIME_LIMIT
import cuopt_mps_parser

# Parse MPS file (actual MIP problem with integer variables)
mps_file = "{RAPIDS_DATASET_ROOT_DIR}/mip/bb_optimality.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)

# Create solver settings
settings = linear_programming.SolverSettings()
settings.set_parameter(CUOPT_TIME_LIMIT, 60.0)

# Solve - MIP will be detected from integer variables in the problem
try:
solution = linear_programming.Solve(data_model, settings)

# Verify we can access solution properties
status = solution.get_termination_status()
objective = solution.get_primal_objective()
solve_time = solution.get_solve_time()
solution_values = solution.get_primal_solution()

print(f"Status: {{status}}")
print(f"Objective: {{objective}}")
print(f"Solve time: {{solve_time}}")
print(f"Solution size: {{len(solution_values)}}")
print("CPU-only MIP solve: PASSED")
except Exception as e:
print(f"CPU-only MIP solve: FAILED - {{e}}")
raise
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
env=cpu_only_env,
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout

def test_lp_dual_solution_cpu_only(self, cpu_only_env):
"""Test LP dual solution retrieval works in CPU-only mode."""
test_script = f'''
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUOPT_REMOTE_HOST"] = "localhost"
os.environ["CUOPT_REMOTE_PORT"] = "12345"

from cuopt import linear_programming
import cuopt_mps_parser

# Parse MPS file
mps_file = "{RAPIDS_DATASET_ROOT_DIR}/linear_programming/afiro_original.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)

# Create solver settings
settings = linear_programming.SolverSettings()

try:
solution = linear_programming.Solve(data_model, settings)

# Verify LP-specific properties are accessible
dual_solution = solution.get_dual_solution()
dual_objective = solution.get_dual_objective()
reduced_costs = solution.get_reduced_cost()

print(f"Dual solution size: {{len(dual_solution)}}")
print(f"Dual objective: {{dual_objective}}")
print(f"Reduced costs size: {{len(reduced_costs)}}")
print("CPU-only LP dual: PASSED")
except Exception as e:
print(f"CPU-only LP dual: FAILED - {{e}}")
raise
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
env=cpu_only_env,
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout

def test_warmstart_cpu_only(self, cpu_only_env):
"""Test warmstart data handling works in CPU-only mode."""
test_script = f'''
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUOPT_REMOTE_HOST"] = "localhost"
os.environ["CUOPT_REMOTE_PORT"] = "12345"

from cuopt import linear_programming
from cuopt.linear_programming.solver.solver_parameters import (
CUOPT_METHOD,
CUOPT_ITERATION_LIMIT,
)
from cuopt.linear_programming.solver_settings import SolverMethod
import cuopt_mps_parser

# Parse MPS file
mps_file = "{RAPIDS_DATASET_ROOT_DIR}/linear_programming/afiro_original.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)

# Create solver settings with iteration limit to get warmstart data
settings = linear_programming.SolverSettings()
settings.set_parameter(CUOPT_METHOD, SolverMethod.PDLP)
settings.set_parameter(CUOPT_ITERATION_LIMIT, 100)

try:
# First solve
solution1 = linear_programming.Solve(data_model, settings)
warmstart_data = solution1.get_pdlp_warm_start_data()

if warmstart_data is not None:
print(f"Got warmstart data")

# Second solve with warmstart
settings.set_pdlp_warm_start_data(warmstart_data)
settings.set_parameter(CUOPT_ITERATION_LIMIT, 200)
solution2 = linear_programming.Solve(data_model, settings)

print(f"Second solve completed")
print("CPU-only warmstart: PASSED")
else:
print("No warmstart data available (expected for small iteration count)")
print("CPU-only warmstart: PASSED")
except Exception as e:
print(f"CPU-only warmstart: FAILED - {{e}}")
raise
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
env=cpu_only_env,
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add numerical-accuracy assertions for CPU-only solves.

These subprocess scripts only report “PASSED” and the outer test checks for that string, so incorrect objectives/solutions would still pass. Please assert termination status and solution/objective values (or compare against known baselines) and add at least one degenerate-case instance for the CPU-only path.

As per coding guidelines: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems).

🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py`
around lines 47 - 247, Update the CPU-only tests to assert numerical correctness
instead of only checking for "PASSED": for each test (test_lp_solve_cpu_only,
test_mip_solve_cpu_only, test_lp_dual_solution_cpu_only,
test_warmstart_cpu_only) call the existing methods
(solution.get_termination_status(), get_primal_objective(),
get_primal_solution(), get_dual_solution(), get_dual_objective(),
get_reduced_cost(), get_pdlp_warm_start_data()) and add assertions that the
termination status equals the expected enum, the objective matches a known
baseline within a small tolerance, and the solution vector length and a few key
entries match expected values (use approx tolerances for floating comparisons);
additionally add a new small degenerate-case test (e.g.,
test_cpu_only_degenerate_cases) that constructs or loads a trivial
infeasible/unbounded/singleton problem and asserts the solver returns the
correct status for each case; use the same subprocess pattern and cpu_only_env,
and prefer numeric comparisons with explicit tolerances rather than string
checks.

Comment on lines +385 to +452
def test_lp_solution_has_dual_info(self):
"""Test that LP solutions have dual solution info."""
test_script = f'''
from cuopt import linear_programming
import cuopt_mps_parser

mps_file = "{RAPIDS_DATASET_ROOT_DIR}/linear_programming/afiro_original.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)
settings = linear_programming.SolverSettings()

solution = linear_programming.Solve(data_model, settings)

# LP solutions should have dual info
dual = solution.get_dual_solution()
assert dual is not None, "Dual solution is None"
assert len(dual) > 0, "Dual solution is empty"

reduced_costs = solution.get_reduced_cost()
assert reduced_costs is not None, "Reduced costs is None"
assert len(reduced_costs) > 0, "Reduced costs is empty"

print("LP solution dual info test: PASSED")
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout

def test_mip_solution_has_mip_info(self):
"""Test that MIP solutions have MIP-specific info."""
test_script = f'''
from cuopt import linear_programming
from cuopt.linear_programming.solver.solver_parameters import CUOPT_TIME_LIMIT
import cuopt_mps_parser

# Use actual MIP file with integer variables
mps_file = "{RAPIDS_DATASET_ROOT_DIR}/mip/bb_optimality.mps"
data_model = cuopt_mps_parser.ParseMps(mps_file)
settings = linear_programming.SolverSettings()
settings.set_parameter(CUOPT_TIME_LIMIT, 60.0)

# Solve - MIP will be detected from integer variables
solution = linear_programming.Solve(data_model, settings)

# MIP solutions should have MIP-specific info via get_milp_stats()
milp_stats = solution.get_milp_stats()
mip_gap = milp_stats["mip_gap"]
solution_bound = milp_stats["solution_bound"]

# Just verify we can call these without errors
print(f"MIP gap: {{mip_gap}}")
print(f"Solution bound: {{solution_bound}}")
print("MIP solution info test: PASSED")
'''
result = subprocess.run(
[sys.executable, "-c", test_script],
capture_output=True,
text=True,
)
print(result.stdout)
print(result.stderr)
assert result.returncode == 0, f"Test failed: {result.stderr}"
assert "PASSED" in result.stdout
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard solution-interface subprocess tests for CPU-only hosts.

These scripts run without the CPU-only/remote env, so they’ll fail on GPU-less hosts. Consider reusing cpu_only_env or skipping when CUDA is unavailable.

As per coding guidelines: Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment.

🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py`
around lines 385 - 452, Both tests (test_lp_solution_has_dual_info and
test_mip_solution_has_mip_info) must be guarded for CPU-only hosts: either reuse
the existing cpu_only_env fixture or explicitly skip when CUDA is not available
before launching the subprocess; add a check (e.g., a pytest skip or calling
cpu_only_env at the top of each test) so the subprocess scripts are not run on
GPU-less hosts. Also ensure test isolation by resetting any global GPU-related
state before/after the subprocess call (invoke the cpu_only_env fixture or
perform cleanup like calling any existing cuopt/tensor library cache-clear
routines and gc.collect) so cached memory and globals don’t leak between tests.

@tmckayus
Copy link
Contributor Author

tmckayus commented Feb 2, 2026

/ok to test 7bad723

const raft::handle_t handle_{stream};

// Determine memory backend based on execution mode
auto memory_backend = cuopt::linear_programming::get_memory_backend_type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are memory_backend and is_remote_solve_enabled() the same GPU vs CPU checking system? Or can they differ in truth values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they can differ now, mostly for testing purposes and future proofing. default for local is gpu, default for remote is cpu, but you can override either.

@tmckayus tmckayus modified the milestones: 26.02, 26.04 Feb 5, 2026
tmckayus and others added 8 commits February 6, 2026 10:40
- Created optimization_problem_interface_t with gpu_optimization_problem_t
  and cpu_optimization_problem_t implementations
- GPU backend uses rmm::device_uvector, CPU backend uses std::vector
- Added zero-copy move-based setters to optimization_problem_t for efficient
  GPU->GPU transfers via to_optimization_problem()
- Backend selection via environment variables: CUOPT_USE_GPU_MEM,
  CUOPT_REMOTE_HOST, CUOPT_REMOTE_PORT
- Integrated with C API (cuopt_c.cpp), CLI (cuopt_cli.cpp), and Python
  (cython_solve.cu) - all use backend selection and new problem classes
- Added remote execution stubs (solve_lp_remote, solve_mip_remote) for
  future gRPC implementation
- Implement cpu_pdlp_warm_start_data_t for CPU-based warmstart data
- Add cross-backend warmstart conversion (GPU<->CPU) in populate_from_data_model_view
- Refactor Python bindings to use std::variant and zero-copy data transfer
- Add conversion methods (to_*_ret_t) in solution classes for cleaner Cython interface
- Fix Cython compilation errors with unique_ptr handling

Co-authored-by: Cursor <cursoragent@cursor.com>
Cleanup:
- Remove unnecessary int64_t template instantiations
- Auto-detect problem category (LP/MIP/IP) in set_variable_types methods
- Simplify cuopt_c.cpp variable type handling

Defer CUDA initialization for CPU-only hosts with remote execution:
- cuopt_cli: Conditionally create raft::handle_t and RMM resources only for GPU backend
- Python: Add lazy module loading to avoid CUDA init on import
- Python: Break circular import by moving solver_parameters imports to function scope
Enable testing CPU problem/solution code paths without remote execution.
When CUOPT_USE_CPU_MEM_FOR_LOCAL=1 is set, local solves use CPU memory
structures: problems are built on CPU, converted to GPU for solving,
then solutions are converted back to CPU.

Changes:
- Add memory_backend_t enum and use_cpu_memory_for_local() function
- Add GPU↔CPU solution conversion methods (to_cpu_solution)
- Add solution_interface_test.cu for polymorphic interface testing
- Add test_cpu_only_execution.py for Python API CPU-only tests
- Add CI step to re-run C_API_TEST with CPU memory mode
- Extend C API tests for CPU memory backend coverage
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
python/cuopt/cuopt/utilities/utils.py (1)

47-53: ⚠️ Potential issue | 🟡 Minor

Pre-existing: operator precedence bug in bounds checks (works by coincidence).

Lines 47 and 53 both have len(data.get_variable_lower_bounds() > 0) — the > 0 binds to the array (element-wise comparison), then len() is called on the resulting boolean array. The intent is almost certainly len(data.get_variable_lower_bounds()) > 0. It "works" because the boolean array has the same length as the original, but it's misleading and fragile.

Not introduced by this PR, but since the function was touched, consider fixing while you're here:

Suggested fix
-    if len(data.get_variable_lower_bounds() > 0):
+    if len(data.get_variable_lower_bounds()) > 0:
         assert len(solution) == len(data.get_variable_lower_bounds())
         assert np.all(
             solution
             >= (data.get_variable_lower_bounds() - integrality_tolerance)
         )
-    if len(data.get_variable_upper_bounds() > 0):
+    if len(data.get_variable_upper_bounds()) > 0:
         assert len(solution) == len(data.get_variable_upper_bounds())
cpp/cuopt_cli.cpp (1)

176-187: ⚠️ Potential issue | 🔴 Critical

MIP solver implementation missing — solve_mip(optimization_problem_interface_t*) is declared but not implemented.

The interface-based solve_lp correctly writes sol_file internally (lines 1218–1221 of solve.cu), but the corresponding solve_mip overload declared in solve.hpp (lines 191–194) has no implementation in solve.cu. This will cause a linker error. Additionally, without an implementation that reads sol_file from settings, solution file output cannot be guaranteed for MIP problems via the CLI.

🤖 Fix all issues with AI agents
In `@cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp`:
- Around line 39-96: Add a Doxygen comment to the public constructor
cpu_pdlp_warm_start_data_t(const pdlp_warm_start_data_view_t<i_t, f_t>& view) in
cpu_pdlp_warm_start_data.hpp that clearly states the input
pdlp_warm_start_data_view_t spans (e.g., current_primal_solution_,
current_dual_solution_, etc.) must reference host (CPU) memory because the
constructor uses std::vector::assign with raw pointers, and indicate callers
should use convert_to_cpu_warmstart() when starting from GPU/device memory;
place the comment immediately above the constructor declaration so it’s visible
in the public header.

In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp`:
- Around line 97-106: Before indexing the quadratic offsets vector, validate its
length: in the block guarded by data_model.has_quadratic_objective(), check that
data_model.get_quadratic_objective_offsets().size() >= (size_t)(n_vars + 1) (and
optionally validate related vectors sizes like get_quadratic_objective_values()
and get_quadratic_objective_indices() against q_nonzeros) and if the check
fails, return/throw a clear error or skip setting the quadratic matrix instead
of accessing get_quadratic_objective_offsets()[n_vars]; update the code around
the call to problem->set_quadratic_objective_matrix(...) to only compute
q_nonzeros and call that function after these bounds checks.

In `@cpp/src/linear_programming/cpu_optimization_problem.cpp`:
- Around line 690-695: The equality check currently uses exact comparisons for
objective_scaling_factor_ and objective_offset_ while arrays use an epsilon;
change the scalar comparisons to the same epsilon-based check (e.g., use
std::abs(objective_scaling_factor_ - other.get_objective_scaling_factor()) >
1e-9 and std::abs(objective_offset_ - other.get_objective_offset()) > 1e-9) so
floating-point rounding doesn't cause false negatives; ensure you reuse the same
tolerance value used elsewhere in this class so comparisons for
objective_scaling_factor_ and objective_offset_ match the array element checks.
- Around line 800-864: All nine copy_*_to_host methods
(copy_objective_coefficients_to_host, copy_constraint_matrix_to_host,
copy_row_types_to_host, copy_constraint_bounds_to_host,
copy_constraint_lower_bounds_to_host, copy_constraint_upper_bounds_to_host,
copy_variable_lower_bounds_to_host, copy_variable_upper_bounds_to_host,
copy_variable_types_to_host) must validate inputs before calling std::copy:
check that the provided size parameter is <= the corresponding internal
container's size (e.g., c_, A_, A_indices_, A_offsets_, row_types_, b_,
constraint_lower_bounds_, constraint_upper_bounds_, variable_lower_bounds_,
variable_upper_bounds_, variable_types_) and that output pointers are non-null;
if the check fails, return an error/throw a clear exception (e.g.,
std::invalid_argument or std::out_of_range) or handle it consistently with
project error handling instead of performing the copy, to prevent out-of-bounds
reads.
- Around line 775-794: Update the free-function signatures in solve_remote.hpp
so the remote solvers accept a const cpu_optimization_problem_t reference
(change parameters to cpu_optimization_problem_t<i_t, f_t> const& for both
solve_lp_remote and solve_mip_remote), then remove the const_cast in
cpu_optimization_problem_t::solve_lp_remote and
cpu_optimization_problem_t::solve_mip_remote and forward *this directly to
::cuopt::linear_programming::solve_lp_remote and
::cuopt::linear_programming::solve_mip_remote respectively.

In `@cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu`:
- Around line 110-115: The explicit template instantiations only cover
<int,double> for convert_to_cpu_warmstart and convert_to_gpu_warmstart which
will cause linker errors when float support is enabled; add corresponding
explicit instantiations for <int,float> (i.e., template
cpu_pdlp_warm_start_data_t<int,float> convert_to_cpu_warmstart(...) and template
pdlp_warm_start_data_t<int,float> convert_to_gpu_warmstart(...)) and wrap them
in `#if` MIP_INSTANTIATE_FLOAT / `#endif`, or alternatively add a clear comment
documenting why <int,float> is intentionally omitted.

In `@cpp/src/linear_programming/gpu_optimization_problem.cu`:
- Around line 855-874: The current const member functions
gpu_optimization_problem_t<i_t,f_t>::solve_lp_remote and ::solve_mip_remote use
const_cast on *this which can cause undefined behavior if the free functions
mutate the object; remove the const_cast by either (A) making these member
methods non-const (change their signatures to drop const) so they legitimately
pass a mutable reference to ::cuopt::linear_programming::solve_lp_remote and
::cuopt::linear_programming::solve_mip_remote, or (B) add const overloads of the
free functions (::cuopt::linear_programming::solve_lp_remote and
::cuopt::linear_programming::solve_mip_remote) that accept a const
gpu_optimization_problem_t<i_t,f_t>& (and ensure they do not modify the object)
and call those from the const members; pick one approach and update the matching
function signatures and calls accordingly.

In `@cpp/src/linear_programming/solution_conversion.cu`:
- Around line 148-196: The CPU conversion method cpu_lp_solution_t<i_t,
f_t>::to_cpu_linear_programming_ret_t currently unconditionally moves
pdlp_warm_start_data_ vectors and copies scalar warm-start fields; add the same
presence guard used on the GPU path (check
pdlp_warm_start_data_.current_primal_solution_.size() > 0) and only move
current_primal_solution_, current_dual_solution_, initial_primal_average_,
initial_dual_average_, current_ATY_, sum_primal_solutions_, sum_dual_solutions_,
last_restart_duality_gap_primal_solution_,
last_restart_duality_gap_dual_solution_ and copy initial_primal_weight_,
initial_step_size_, total_pdlp_iterations_, total_pdhg_iterations_,
last_candidate_kkt_score_, last_restart_kkt_score_, sum_solution_weight_,
iterations_since_last_restart_ when that guard is true so scalars aren’t left
with uninitialized values in ret.
- Around line 96-108: Guard the access to
solution_.get_additional_termination_information(0) by checking that the
solution has termination info (e.g., termination_stats_ is not empty) before
calling it; if empty, skip calling get_additional_termination_information(0) and
set ret.l2_primal_residual_, ret.l2_dual_residual_, ret.primal_objective_,
ret.dual_objective_, ret.gap_, ret.nb_iterations_, ret.solve_time_, and
ret.solved_by_pdlp_ to sensible defaults (or leave them unchanged) while still
populating ret.termination_status_, ret.error_status_, and ret.error_message_;
use the same guard pattern used for warm-start checks to protect the call to
solution_.get_additional_termination_information and avoid undefined behavior.

In `@cpp/src/linear_programming/solve.cu`:
- Line 1369: Remove the stray debug output using std::cerr that prints "Test
mode: Converting GPU solution back to CPU solution" and replace it with the
existing logging macro (CUOPT_LOG_INFO) so diagnostics are consistent with
surrounding code; locate the std::cerr statement and either delete it or change
it to CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution")
to match existing logging style.
- Around line 1354-1372: The GPU stream created as rmm::cuda_stream (used to
construct raft::handle_t handle and set via cpu_prob->set_handle(&handle)) must
be synchronized after calling solve_lp(...) and before performing D2H in
gpu_lp_solution_t::to_cpu_solution(); add an explicit stream.synchronize() on
that local rmm::cuda_stream instance after obtaining gpu_solution (i.e. after
the call to solve_lp<i_t,f_t>(...) and before constructing/returning via
gpu_lp_solution_t<i_t,f_t>::to_cpu_solution()) to ensure all GPU work on that
stream is complete.

In `@cpp/src/mip/solve.cu`:
- Line 391: Replace the direct std::cerr debug print that emits "Test mode:
Converting GPU solution back to CPU solution" with the project's structured
logger call CUOPT_LOG_INFO using the same message; locate the exact occurrence
of std::cerr << "Test mode: Converting GPU solution back to CPU solution" <<
std::endl; in cpp/src/mip/solve.cu and swap it for CUOPT_LOG_INFO("Test mode:
Converting GPU solution back to CPU solution"); (omit std::endl and any
surrounding iostream includes if no longer needed).
- Around line 359-402: Wrap the body of the interface-based solve_mip
(template<typename i_t, typename f_t>
std::unique_ptr<mip_solution_interface_t<i_t,f_t>> solve_mip(...)) in the same
try-catch used by the other overload: catch cuopt::logic_error and
std::bad_alloc (and optionally std::exception) and convert/log them to the
appropriate user-facing error mapping/return (preserve existing mapping behavior
used by the optimization_problem_t overload). Ensure the try block encloses
calls to problem_interface->to_optimization_problem(), the recursive
solve_mip(op_problem, settings) call, gpu_mip_solution_t<i_t,f_t> construction,
and cpu conversion via to_cpu_solution(), and rethrow or translate exceptions
consistent with the project's error-handling conventions.

In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c`:
- Around line 1561-1562: The allocations for dual_solution and reduced_costs
must be checked for NULL after malloc to avoid passing invalid pointers to
cuOptGetDualSolution and cuOptGetReducedCosts; after the lines that allocate
dual_solution and reduced_costs, add NULL checks that log an error (or set an
errno-like status), free any previously allocated resources, and return/exit
with a non-zero status if either allocation fails. Ensure you reference and
validate the pointers named dual_solution and reduced_costs before any calls to
cuOptGetDualSolution or cuOptGetReducedCosts, and perform appropriate cleanup of
other related buffers on error.

In `@python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx`:
- Line 404: The code dereferences lp_ptr.current_primal_solution_ without
verifying it's non-null; update the condition in the block around
solver_wrapper.pyx (where lp_ptr.current_primal_solution_ is checked) to first
ensure lp_ptr.current_primal_solution_ is truthy before accessing
.get()[0].size(), following the safe pattern used on the CPU path (see the CPU
path check near the other branch) so the condition becomes a null-check plus
size-check on lp_ptr.current_primal_solution_ to avoid undefined behavior.

In `@python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py`:
- Around line 1-4: Remove the duplicate SPDX license header by deleting one of
the repeated "SPDX-License-Identifier: Apache-2.0" lines (there are two
identical lines in the file header); ensure only a single
"SPDX-License-Identifier: Apache-2.0" remains at the top of the file alongside
the existing copyright line.
🧹 Nitpick comments (16)
cpp/src/linear_programming/backend_selection.cpp (2)

31-53: Consider extracting the duplicated env-var boolean parsing.

use_gpu_memory_for_remote() and use_cpu_memory_for_local() share identical logic for reading an env var and parsing a case-insensitive boolean. A small helper would reduce duplication:

♻️ Suggested helper
+static bool env_var_is_truthy(const char* var_name)
+{
+  const char* val = std::getenv(var_name);
+  if (val != nullptr) {
+    std::string s(val);
+    std::transform(s.begin(), s.end(), s.begin(),
+                   [](unsigned char c) { return std::tolower(c); });
+    return (s == "true" || s == "1");
+  }
+  return false;
+}
+
 bool use_gpu_memory_for_remote()
 {
-  const char* use_gpu_mem = std::getenv("CUOPT_USE_GPU_MEM_FOR_REMOTE");
-  if (use_gpu_mem != nullptr) {
-    std::string value(use_gpu_mem);
-    std::transform(value.begin(), value.end(), value.begin(), ::tolower);
-    return (value == "true" || value == "1");
-  }
-  return false;
+  return env_var_is_truthy("CUOPT_USE_GPU_MEM_FOR_REMOTE");
 }
 
 bool use_cpu_memory_for_local()
 {
-  const char* use_cpu_mem = std::getenv("CUOPT_USE_CPU_MEM_FOR_LOCAL");
-  if (use_cpu_mem != nullptr) {
-    std::string value(use_cpu_mem);
-    std::transform(value.begin(), value.end(), value.begin(), ::tolower);
-    return (value == "true" || value == "1");
-  }
-  return false;
+  return env_var_is_truthy("CUOPT_USE_CPU_MEM_FOR_LOCAL");
 }

Note: The lambda [](unsigned char c) { return std::tolower(c); } also avoids the technically-UB bare ::tolower on signed char values.


1-15: Unused include <cstdio>.

<cstdio> (Line 13) is not referenced anywhere in this file.

python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py (1)

36-45: Fixture duplication across test classes.

The cpu_only_env fixture is identically defined in both TestCPUOnlyExecution (Line 37) and TestCuoptCliCPUOnly (Line 253). Consider moving it to a module-level conftest.py fixture or a module-scoped fixture function to avoid maintaining two copies.

cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp (1)

102-104: Redundant forward declaration.

pdlp_warm_start_data_t is already fully defined via the #include on Line 10. This forward declaration is harmless but unnecessary.

cpp/src/linear_programming/gpu_optimization_problem.cu (2)

244-258: get_n_integers copies all types to host — prefer device-side counting.

This method allocates a host vector, copies D2H, syncs, and counts on CPU. Since set_variable_types already uses thrust::count_if on device, consider caching the integer count at set-time, or using thrust::count_if here as well.


880-949: C API copy helpers use synchronous cudaMemcpy without size validation.

These functions trust the caller-provided size and perform synchronous cudaMemcpy. If size exceeds the device buffer length, this is a device buffer overread. Since these are internal helpers, the risk is low, but a debug-mode assertion would add safety:

assert(size <= static_cast<i_t>(c_.size()));

As per coding guidelines: Verify correct problem size checks before expensive GPU/CPU operations.

cpp/tests/linear_programming/unit_tests/solution_interface_test.cu (2)

486-513: copy_to_host tests only verify pre-allocated vector sizes, not copied data.

The assertions like EXPECT_EQ(obj_coeffs.size(), static_cast<size_t>(n_vars)) are trivially true because obj_coeffs was just constructed as std::vector<double>(n_vars). These tests don't validate that the actual data was copied correctly.

Consider comparing at least a few elements against the known MPS source data (e.g., from mps_data), or verifying round-trip consistency (populate → copy_to_host → compare against original).


437-467: Warmstart CPU→GPU test lacks assertions on the converted GPU solution.

The test converts cpu_solution to a GPU solution at Line 462 but has no assertions verifying that warmstart data is actually preserved on the GPU side. The comment on Lines 464-466 acknowledges this but doesn't act on it.

Consider adding assertions similar to the LP/MIP conversion tests—e.g., check that the GPU solution has warmstart data available and its sizes match. As per coding guidelines, tests should validate correctness, not just "runs without error."

python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx (2)

388-513: Significant code duplication between GPU and CPU LP paths.

The GPU LP path (Lines 388-513) and CPU LP path (Lines 515-617) follow identical structure and construct Solution with the same fields. Only the data extraction differs (DeviceBuffer + series_from_buf vs _vector_to_numpy). The same applies to the warmstart extraction blocks (Lines 402-457 vs 524-561), including the identical fallback blocks when warmstart is empty.

Consider extracting the common pattern: first extract data into local Python variables (numpy arrays), then construct Solution once. This would halve the maintenance surface for the LP path.

Also applies to: 515-617


459-493: Inconsistent argument passing style: positional vs keyword.

The non-batch Solution(...) calls at Lines 459-493 use positional arguments, while the batch-mode calls at Lines 496-513 use keyword arguments. If Solution.__init__ signature changes, the positional calls are fragile. Consider using keyword arguments consistently for all Solution constructor calls to improve maintainability.

Also applies to: 496-513

cpp/include/cuopt/linear_programming/utilities/cython_types.hpp (1)

24-61: Scalar members are not default-initialized.

Members like initial_primal_weight_, total_pdlp_iterations_, nb_iterations_, solve_time_, solved_by_pdlp_, nodes_, etc. have no default initializers. If a code path constructs one of these structs and forgets to set a field, reading it is undefined behavior.

Consider adding default member initializers (e.g., double solve_time_ = 0.0;, int nb_iterations_ = 0;, bool solved_by_pdlp_ = false;) to provide safe defaults, especially since these structs are populated from multiple code paths (local solve, remote solve, error paths).

Also applies to: 64-101, 104-122, 125-143

cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu (1)

40-73: Excessive stream synchronization — 9 syncs where 1 would suffice.

Each device_to_host_vector call synchronizes the stream individually. In convert_to_cpu_warmstart, this means 9 back-to-back synchronizations for the 9 vector fields. Since all copies use the same stream and are serialized, a single stream.synchronize() after all copies would be functionally equivalent and reduce overhead.

The same applies to convert_to_gpu_warmstart (lines 75–108).

Consider removing the per-call sync from the helpers and adding a single sync at the end of each conversion function, or providing a no_sync variant for bulk-copy scenarios.

cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)

42-50: Raw pointer returns from call_solve_lp / call_solve_mip — clarify ownership.

Both functions return raw pointers (lp_solution_interface_t<int, double>* and mip_solution_interface_t<int, double>*). The ownership contract is unclear from the declaration alone — the caller needs to know whether they must delete the result or whether it's managed elsewhere.

If these are intended for Cython consumption where unique_ptr interop is awkward, consider at minimum adding a @note documenting ownership (e.g., "Caller takes ownership and must delete the returned pointer").

cpp/src/linear_programming/cpu_optimization_problem.cpp (1)

452-516: Host getters copy data unnecessarily for the CPU backend.

All get_*_host() methods return std::vector by value, copying the internal vectors each time. For the CPU backend, the data is already in host memory. The interface signature forces return-by-value (needed for the GPU backend which copies from device), but consider adding const reference accessors specific to cpu_optimization_problem_t that callers within the CPU path can use directly, avoiding unnecessary copies for large problems.

This is minor since the interface constraint is understood and the copies are not on hot paths.

cpp/include/cuopt/linear_programming/optimization_problem_interface.hpp (2)

547-547: to_optimization_problem() is missing the override keyword in both GPU and CPU classes.

Both gpu_optimization_problem_t::to_optimization_problem() (line 547) and cpu_optimization_problem_t::to_optimization_problem() (line 736) lack override, while all other virtual method implementations in these classes consistently use it. Although the signature matches and it works as an override, the missing keyword bypasses the compiler's signature-mismatch safety net.

♻️ Proposed fix
   // gpu_optimization_problem_t (line 547):
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;

   // cpu_optimization_problem_t (line 736):
-  optimization_problem_t<i_t, f_t> to_optimization_problem();
+  optimization_problem_t<i_t, f_t> to_optimization_problem() override;

Also applies to: 736-736


721-729: set_handle() public method allows mutable state change on a problem object — document thread-safety expectations.

set_handle() mutates the internal handle_ptr_ and is public. If a cpu_optimization_problem_t is being accessed concurrently (e.g., solve in progress while handle is reassigned), this is a data race. The doc comment says it's for test mode, which is fine, but consider:

  1. Adding a note that this must not be called while the problem is being used elsewhere.
  2. Potentially restricting visibility (e.g., friend the test class) if this is strictly for testing.

This is a minor concern given the test-mode intent.

Comment on lines +39 to +96
// Constructor from view (copies span data to vectors)
cpu_pdlp_warm_start_data_t(const pdlp_warm_start_data_view_t<i_t, f_t>& view)
: initial_primal_weight_(view.initial_primal_weight_),
initial_step_size_(view.initial_step_size_),
total_pdlp_iterations_(view.total_pdlp_iterations_),
total_pdhg_iterations_(view.total_pdhg_iterations_),
last_candidate_kkt_score_(view.last_candidate_kkt_score_),
last_restart_kkt_score_(view.last_restart_kkt_score_),
sum_solution_weight_(view.sum_solution_weight_),
iterations_since_last_restart_(view.iterations_since_last_restart_)
{
// Copy vector data from spans
if (view.current_primal_solution_.data() != nullptr) {
current_primal_solution_.assign(
view.current_primal_solution_.data(),
view.current_primal_solution_.data() + view.current_primal_solution_.size());
}
if (view.current_dual_solution_.data() != nullptr) {
current_dual_solution_.assign(
view.current_dual_solution_.data(),
view.current_dual_solution_.data() + view.current_dual_solution_.size());
}
if (view.initial_primal_average_.data() != nullptr) {
initial_primal_average_.assign(
view.initial_primal_average_.data(),
view.initial_primal_average_.data() + view.initial_primal_average_.size());
}
if (view.initial_dual_average_.data() != nullptr) {
initial_dual_average_.assign(
view.initial_dual_average_.data(),
view.initial_dual_average_.data() + view.initial_dual_average_.size());
}
if (view.current_ATY_.data() != nullptr) {
current_ATY_.assign(view.current_ATY_.data(),
view.current_ATY_.data() + view.current_ATY_.size());
}
if (view.sum_primal_solutions_.data() != nullptr) {
sum_primal_solutions_.assign(
view.sum_primal_solutions_.data(),
view.sum_primal_solutions_.data() + view.sum_primal_solutions_.size());
}
if (view.sum_dual_solutions_.data() != nullptr) {
sum_dual_solutions_.assign(view.sum_dual_solutions_.data(),
view.sum_dual_solutions_.data() + view.sum_dual_solutions_.size());
}
if (view.last_restart_duality_gap_primal_solution_.data() != nullptr) {
last_restart_duality_gap_primal_solution_.assign(
view.last_restart_duality_gap_primal_solution_.data(),
view.last_restart_duality_gap_primal_solution_.data() +
view.last_restart_duality_gap_primal_solution_.size());
}
if (view.last_restart_duality_gap_dual_solution_.data() != nullptr) {
last_restart_duality_gap_dual_solution_.assign(
view.last_restart_duality_gap_dual_solution_.data(),
view.last_restart_duality_gap_dual_solution_.data() +
view.last_restart_duality_gap_dual_solution_.size());
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find pdlp_warm_start_data_view_t definition
rg -n "pdlp_warm_start_data_view_t" --type-add 'hdr:*.{hpp,h,cuh}' --type=hdr -A 10 | head -80

Repository: NVIDIA/cuopt

Length of output: 8892


🏁 Script executed:

#!/bin/bash
# Also search more broadly to understand the view type
rg "class pdlp_warm_start_data_view_t|struct pdlp_warm_start_data_view_t" -A 20

Repository: NVIDIA/cuopt

Length of output: 4891


🏁 Script executed:

#!/bin/bash
# Search for convert_to_cpu_warmstart and convert_to_gpu_warmstart to understand conversion
rg -n "convert_to_cpu_warmstart|convert_to_gpu_warmstart" -B 2 -A 10 | head -100

Repository: NVIDIA/cuopt

Length of output: 8072


🏁 Script executed:

#!/bin/bash
# Find the implementation of the view constructor in the .cu file
rg -n "cpu_pdlp_warm_start_data_t.*view" -A 30 cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu

Repository: NVIDIA/cuopt

Length of output: 2111


🏁 Script executed:

#!/bin/bash
# Check if there are any calls to the view constructor from outside
rg "cpu_pdlp_warm_start_data_t.*\(.*view\)" --type cpp --type hpp | head -20

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

#!/bin/bash
# Look at the full cpu_pdlp_warm_start_data.cu implementation
wc -l cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu
head -100 cpp/src/linear_programming/cpu_pdlp_warm_start_data.cu

Repository: NVIDIA/cuopt

Length of output: 4954


Document constructor's host-memory requirement in public header.

The constructor from pdlp_warm_start_data_view_t is public and inline but lacks documentation specifying that spans must reference host memory. The type system does not enforce memory location, and std::vector::assign() performs host-side pointer arithmetic. Add a Doxygen comment documenting that input spans must be host-resident, or note that convert_to_cpu_warmstart() should be used for GPU-to-CPU warmstart transfers.

🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/cpu_pdlp_warm_start_data.hpp` around
lines 39 - 96, Add a Doxygen comment to the public constructor
cpu_pdlp_warm_start_data_t(const pdlp_warm_start_data_view_t<i_t, f_t>& view) in
cpu_pdlp_warm_start_data.hpp that clearly states the input
pdlp_warm_start_data_view_t spans (e.g., current_primal_solution_,
current_dual_solution_, etc.) must reference host (CPU) memory because the
constructor uses std::vector::assign with raw pointers, and indicate callers
should use convert_to_cpu_warmstart() when starting from GPU/device memory;
place the comment immediately above the constructor declaration so it’s visible
in the public header.

Comment on lines +97 to +106
// Handle quadratic objective if present
if (data_model.has_quadratic_objective()) {
i_t q_nonzeros = data_model.get_quadratic_objective_offsets()[n_vars];
problem->set_quadratic_objective_matrix(data_model.get_quadratic_objective_values().data(),
q_nonzeros,
data_model.get_quadratic_objective_indices().data(),
q_nonzeros,
data_model.get_quadratic_objective_offsets().data(),
n_vars + 1);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing bounds check on quadratic objective offsets vector before indexing.

Line 99 accesses data_model.get_quadratic_objective_offsets()[n_vars] without verifying the vector has at least n_vars + 1 elements. If the MPS data has a malformed quadratic objective section, this is an out-of-bounds access.

🛠️ Proposed fix
   if (data_model.has_quadratic_objective()) {
-    i_t q_nonzeros = data_model.get_quadratic_objective_offsets()[n_vars];
+    const auto& q_offsets = data_model.get_quadratic_objective_offsets();
+    if (q_offsets.size() <= static_cast<size_t>(n_vars)) {
+      // Malformed quadratic objective data - skip
+      return;
+    }
+    i_t q_nonzeros = q_offsets[n_vars];
     problem->set_quadratic_objective_matrix(data_model.get_quadratic_objective_values().data(),
                                             q_nonzeros,
🤖 Prompt for AI Agents
In `@cpp/include/cuopt/linear_programming/optimization_problem_utils.hpp` around
lines 97 - 106, Before indexing the quadratic offsets vector, validate its
length: in the block guarded by data_model.has_quadratic_objective(), check that
data_model.get_quadratic_objective_offsets().size() >= (size_t)(n_vars + 1) (and
optionally validate related vectors sizes like get_quadratic_objective_values()
and get_quadratic_objective_indices() against q_nonzeros) and if the check
fails, return/throw a clear error or skip setting the quadratic matrix instead
of accessing get_quadratic_objective_offsets()[n_vars]; update the code around
the call to problem->set_quadratic_objective_matrix(...) to only compute
q_nonzeros and call that function after these bounds checks.

Comment on lines +690 to +695
if (maximize_ != other.get_sense()) return false;
if (n_vars_ != other.get_n_variables()) return false;
if (n_constraints_ != other.get_n_constraints()) return false;
if (objective_scaling_factor_ != other.get_objective_scaling_factor()) return false;
if (objective_offset_ != other.get_objective_offset()) return false;
if (problem_category_ != other.get_problem_category()) return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent floating-point comparison: exact equality for scalars vs. epsilon for arrays.

Lines 693–694 use != (exact comparison) for objective_scaling_factor_ and objective_offset_, while lines 752–754 use std::abs(... ) > 1e-9 for array elements. If these scalars could ever differ by floating-point rounding (e.g., after serialization/deserialization through MPS), the exact check will produce false negatives.

Consider using the same epsilon tolerance consistently:

-  if (objective_scaling_factor_ != other.get_objective_scaling_factor()) return false;
-  if (objective_offset_ != other.get_objective_offset()) return false;
+  if (std::abs(objective_scaling_factor_ - other.get_objective_scaling_factor()) > 1e-9) return false;
+  if (std::abs(objective_offset_ - other.get_objective_offset()) > 1e-9) return false;

As per coding guidelines, "use epsilon comparisons for floating-point equality checks."

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 690 -
695, The equality check currently uses exact comparisons for
objective_scaling_factor_ and objective_offset_ while arrays use an epsilon;
change the scalar comparisons to the same epsilon-based check (e.g., use
std::abs(objective_scaling_factor_ - other.get_objective_scaling_factor()) >
1e-9 and std::abs(objective_offset_ - other.get_objective_offset()) > 1e-9) so
floating-point rounding doesn't cause false negatives; ensure you reuse the same
tolerance value used elsewhere in this class so comparisons for
objective_scaling_factor_ and objective_offset_ match the array element checks.

Comment on lines +775 to +794
template <typename i_t, typename f_t>
std::unique_ptr<lp_solution_interface_t<i_t, f_t>>
cpu_optimization_problem_t<i_t, f_t>::solve_lp_remote(
pdlp_solver_settings_t<i_t, f_t> const& settings) const
{
// Forward to the cpu_optimization_problem_t overload
// Need to cast away const since solve functions take non-const reference
auto& non_const_this = const_cast<cpu_optimization_problem_t<i_t, f_t>&>(*this);
return ::cuopt::linear_programming::solve_lp_remote(non_const_this, settings);
}

template <typename i_t, typename f_t>
std::unique_ptr<mip_solution_interface_t<i_t, f_t>>
cpu_optimization_problem_t<i_t, f_t>::solve_mip_remote(
mip_solver_settings_t<i_t, f_t> const& settings) const
{
// Forward to the cpu_optimization_problem_t overload
auto& non_const_this = const_cast<cpu_optimization_problem_t<i_t, f_t>&>(*this);
return ::cuopt::linear_programming::solve_mip_remote(non_const_this, settings);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the declarations/definitions of solve_lp_remote and solve_mip_remote free functions
rg -n "solve_lp_remote|solve_mip_remote" --type cpp --type cu -g '!*test*' -B2 -A10 | head -100

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

#!/bin/bash
# Search for solve_lp_remote and solve_mip_remote function definitions without file type filter
rg "solve_lp_remote|solve_mip_remote" -B2 -A10 | grep -E "(solve_lp_remote|solve_mip_remote|std::unique_ptr|cpu_optimization_problem_t)" | head -50

Repository: NVIDIA/cuopt

Length of output: 6116


🏁 Script executed:

#!/bin/bash
# Get the full implementations of solve_lp_remote and solve_mip_remote
cat -n cpp/src/linear_programming/solve_remote.cu | head -150

Repository: NVIDIA/cuopt

Length of output: 8381


🏁 Script executed:

#!/bin/bash
# Continue reading solve_remote.cu to see the rest of GPU solve
cat -n cpp/src/linear_programming/solve_remote.cu | sed -n '150,250p'

Repository: NVIDIA/cuopt

Length of output: 5658


🏁 Script executed:

#!/bin/bash
# Check the header file signatures
cat -n cpp/include/cuopt/linear_programming/solve_remote.hpp

Repository: NVIDIA/cuopt

Length of output: 2651


Make solve_lp_remote and solve_mip_remote free functions accept const reference parameters.

The const-cast is pragmatically safe because these functions only read from the problem object (via get_n_variables(), get_n_constraints(), etc.) and never mutate it. However, the non-const reference signatures violate const-correctness. Update the function declarations in solve_remote.hpp to accept const references instead:

std::unique_ptr<lp_solution_interface_t<i_t, f_t>> solve_lp_remote(
  cpu_optimization_problem_t<i_t, f_t> const& cpu_problem,  // Add const
  pdlp_solver_settings_t<i_t, f_t> const& settings);

std::unique_ptr<mip_solution_interface_t<i_t, f_t>> solve_mip_remote(
  cpu_optimization_problem_t<i_t, f_t> const& cpu_problem,  // Add const
  mip_solver_settings_t<i_t, f_t> const& settings);

Then the const-cast in the calling methods can be removed entirely.

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 775 -
794, Update the free-function signatures in solve_remote.hpp so the remote
solvers accept a const cpu_optimization_problem_t reference (change parameters
to cpu_optimization_problem_t<i_t, f_t> const& for both solve_lp_remote and
solve_mip_remote), then remove the const_cast in
cpu_optimization_problem_t::solve_lp_remote and
cpu_optimization_problem_t::solve_mip_remote and forward *this directly to
::cuopt::linear_programming::solve_lp_remote and
::cuopt::linear_programming::solve_mip_remote respectively.

Comment on lines +800 to +864
template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_objective_coefficients_to_host(f_t* output,
i_t size) const
{
// Already in host memory - just copy
std::copy(c_.begin(), c_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_constraint_matrix_to_host(
f_t* values, i_t* indices, i_t* offsets, i_t num_values, i_t num_indices, i_t num_offsets) const
{
// Already in host memory - just copy
std::copy(A_.begin(), A_.begin() + num_values, values);
std::copy(A_indices_.begin(), A_indices_.begin() + num_indices, indices);
std::copy(A_offsets_.begin(), A_offsets_.begin() + num_offsets, offsets);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_row_types_to_host(char* output, i_t size) const
{
std::copy(row_types_.begin(), row_types_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_constraint_bounds_to_host(f_t* output,
i_t size) const
{
std::copy(b_.begin(), b_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_constraint_lower_bounds_to_host(f_t* output,
i_t size) const
{
std::copy(constraint_lower_bounds_.begin(), constraint_lower_bounds_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_constraint_upper_bounds_to_host(f_t* output,
i_t size) const
{
std::copy(constraint_upper_bounds_.begin(), constraint_upper_bounds_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_variable_lower_bounds_to_host(f_t* output,
i_t size) const
{
std::copy(variable_lower_bounds_.begin(), variable_lower_bounds_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_variable_upper_bounds_to_host(f_t* output,
i_t size) const
{
std::copy(variable_upper_bounds_.begin(), variable_upper_bounds_.begin() + size, output);
}

template <typename i_t, typename f_t>
void cpu_optimization_problem_t<i_t, f_t>::copy_variable_types_to_host(var_t* output,
i_t size) const
{
std::copy(variable_types_.begin(), variable_types_.begin() + size, output);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

copy_*_to_host functions lack bounds checking on size parameter.

All nine copy_*_to_host methods use std::copy(vec_.begin(), vec_.begin() + size, output) without verifying that size <= vec_.size(). If a caller passes a size larger than the internal vector, this reads past the end of the vector — undefined behavior.

Since these are C API support functions (called from external C code), defensive validation is especially important.

🛡️ Example fix for one method (apply pattern to all)
 template <typename i_t, typename f_t>
 void cpu_optimization_problem_t<i_t, f_t>::copy_objective_coefficients_to_host(f_t* output,
                                                                                i_t size) const
 {
-  // Already in host memory - just copy
-  std::copy(c_.begin(), c_.begin() + size, output);
+  cuopt_expects(size <= static_cast<i_t>(c_.size()),
+                "copy_objective_coefficients_to_host: size exceeds data size");
+  std::copy(c_.begin(), c_.begin() + size, output);
 }

As per coding guidelines, "Validate input sanitization to prevent buffer overflows and resource exhaustion attacks."

🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/cpu_optimization_problem.cpp` around lines 800 -
864, All nine copy_*_to_host methods (copy_objective_coefficients_to_host,
copy_constraint_matrix_to_host, copy_row_types_to_host,
copy_constraint_bounds_to_host, copy_constraint_lower_bounds_to_host,
copy_constraint_upper_bounds_to_host, copy_variable_lower_bounds_to_host,
copy_variable_upper_bounds_to_host, copy_variable_types_to_host) must validate
inputs before calling std::copy: check that the provided size parameter is <=
the corresponding internal container's size (e.g., c_, A_, A_indices_,
A_offsets_, row_types_, b_, constraint_lower_bounds_, constraint_upper_bounds_,
variable_lower_bounds_, variable_upper_bounds_, variable_types_) and that output
pointers are non-null; if the check fails, return an error/throw a clear
exception (e.g., std::invalid_argument or std::out_of_range) or handle it
consistently with project error handling instead of performing the copy, to
prevent out-of-bounds reads.

Comment on lines +359 to +402
template <typename i_t, typename f_t>
std::unique_ptr<mip_solution_interface_t<i_t, f_t>> solve_mip(
optimization_problem_interface_t<i_t, f_t>* problem_interface,
mip_solver_settings_t<i_t, f_t> const& settings)
{
// Check if remote execution is enabled
if (is_remote_execution_enabled()) {
CUOPT_LOG_INFO("Remote MIP solve requested");
return problem_interface->solve_mip_remote(settings);
} else {
// Local execution - convert to optimization_problem_t and call original solve_mip
CUOPT_LOG_INFO("Local MIP solve");

// Check if this is a CPU problem (test mode: CUOPT_USE_CPU_MEM_FOR_LOCAL=true)
auto* cpu_prob = dynamic_cast<cpu_optimization_problem_t<i_t, f_t>*>(problem_interface);
if (cpu_prob != nullptr) {
CUOPT_LOG_INFO("Test mode: Converting CPU problem to GPU for local MIP solve");

// Create CUDA resources for the conversion
rmm::cuda_stream stream;
raft::handle_t handle(stream);

// Set the handle on the CPU problem so it can create GPU resources
cpu_prob->set_handle(&handle);

// Convert CPU problem to GPU problem
auto op_problem = cpu_prob->to_optimization_problem();

// Solve on GPU
auto gpu_solution = solve_mip<i_t, f_t>(op_problem, settings);

// Wrap in GPU solution interface and convert to CPU solution
std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
gpu_mip_solution_t<i_t, f_t> gpu_sol_interface(std::move(gpu_solution));
return gpu_sol_interface.to_cpu_solution();
}

auto op_problem = problem_interface->to_optimization_problem();
auto gpu_solution = solve_mip<i_t, f_t>(op_problem, settings);

// Wrap GPU solution in interface and return
return std::make_unique<gpu_mip_solution_t<i_t, f_t>>(std::move(gpu_solution));
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing error handling in the new solve_mip overload.

The existing solve_mip(optimization_problem_t&, ...) overload at Lines 183-341 has comprehensive try-catch for cuopt::logic_error and std::bad_alloc. This new interface-based overload has no exception handling. If to_optimization_problem(), the recursive solve_mip call, or to_cpu_solution() throws, the exception will propagate uncaught and may crash the caller (especially from the C API or Python bindings).

🛠️ Proposed fix (wrap in try-catch)
 template <typename i_t, typename f_t>
 std::unique_ptr<mip_solution_interface_t<i_t, f_t>> solve_mip(
   optimization_problem_interface_t<i_t, f_t>* problem_interface,
   mip_solver_settings_t<i_t, f_t> const& settings)
 {
+  try {
   // Check if remote execution is enabled
   if (is_remote_execution_enabled()) {
     ...
   }
+  } catch (const cuopt::logic_error& e) {
+    CUOPT_LOG_ERROR("Error in solve_mip (interface): %s", e.what());
+    // Return an error solution via the interface
+    auto error_solution = mip_solution_t<i_t, f_t>{e, rmm::cuda_stream_per_thread};
+    return std::make_unique<gpu_mip_solution_t<i_t, f_t>>(std::move(error_solution));
+  } catch (const std::exception& e) {
+    CUOPT_LOG_ERROR("Error in solve_mip (interface): %s", e.what());
+    auto error = cuopt::logic_error(e.what(), cuopt::error_type_t::RuntimeError);
+    auto error_solution = mip_solution_t<i_t, f_t>{error, rmm::cuda_stream_per_thread};
+    return std::make_unique<gpu_mip_solution_t<i_t, f_t>>(std::move(error_solution));
+  }
 }

As per coding guidelines: "Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes."

🤖 Prompt for AI Agents
In `@cpp/src/mip/solve.cu` around lines 359 - 402, Wrap the body of the
interface-based solve_mip (template<typename i_t, typename f_t>
std::unique_ptr<mip_solution_interface_t<i_t,f_t>> solve_mip(...)) in the same
try-catch used by the other overload: catch cuopt::logic_error and
std::bad_alloc (and optionally std::exception) and convert/log them to the
appropriate user-facing error mapping/return (preserve existing mapping behavior
used by the optimization_problem_t overload). Ensure the try block encloses
calls to problem_interface->to_optimization_problem(), the recursive
solve_mip(op_problem, settings) call, gpu_mip_solution_t<i_t,f_t> construction,
and cpu conversion via to_cpu_solution(), and rethrow or translate exceptions
consistent with the project's error-handling conventions.

auto gpu_solution = solve_mip<i_t, f_t>(op_problem, settings);

// Wrap in GPU solution interface and convert to CPU solution
std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Replace std::cerr with CUOPT_LOG_INFO.

Line 391 uses std::cerr for debug output while the rest of this function (and the file) uses CUOPT_LOG_INFO. This inconsistency breaks structured logging and observability.

🛠️ Proposed fix
-      std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
+      CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::cerr << "Test mode: Converting GPU solution back to CPU solution" << std::endl;
CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution");
🤖 Prompt for AI Agents
In `@cpp/src/mip/solve.cu` at line 391, Replace the direct std::cerr debug print
that emits "Test mode: Converting GPU solution back to CPU solution" with the
project's structured logger call CUOPT_LOG_INFO using the same message; locate
the exact occurrence of std::cerr << "Test mode: Converting GPU solution back to
CPU solution" << std::endl; in cpp/src/mip/solve.cu and swap it for
CUOPT_LOG_INFO("Test mode: Converting GPU solution back to CPU solution"); (omit
std::endl and any surrounding iostream includes if no longer needed).

Comment on lines +1561 to +1562
dual_solution = (cuopt_float_t*)malloc(num_constraints * sizeof(cuopt_float_t));
reduced_costs = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing NULL checks for malloc results.

dual_solution and reduced_costs are allocated without checking for NULL. If the MPS file has many constraints/variables, these allocations could fail, leading to NULL being passed to cuOptGetDualSolution / cuOptGetReducedCosts.

🛠️ Proposed fix
   dual_solution = (cuopt_float_t*)malloc(num_constraints * sizeof(cuopt_float_t));
   reduced_costs = (cuopt_float_t*)malloc(num_variables * sizeof(cuopt_float_t));
+  if ((dual_solution == NULL && num_constraints > 0) ||
+      (reduced_costs == NULL && num_variables > 0)) {
+    printf("Error allocating buffers\n");
+    status = -1;
+    goto DONE;
+  }
🤖 Prompt for AI Agents
In `@cpp/tests/linear_programming/c_api_tests/c_api_test.c` around lines 1561 -
1562, The allocations for dual_solution and reduced_costs must be checked for
NULL after malloc to avoid passing invalid pointers to cuOptGetDualSolution and
cuOptGetReducedCosts; after the lines that allocate dual_solution and
reduced_costs, add NULL checks that log an error (or set an errno-like status),
free any previously allocated resources, and return/exit with a non-zero status
if either allocation fails. Ensure you reference and validate the pointers named
dual_solution and reduced_costs before any calls to cuOptGetDualSolution or
cuOptGetReducedCosts, and perform appropriate cleanup of other related buffers
on error.

# Extract warm start data if available (check if buffers are non-empty)
if not is_batch:
# Check if warm start data exists by checking buffer size
if lp_ptr.current_primal_solution_.get()[0].size() > 0:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -n "current_primal_solution_" --type=cpp -C 3 cpp/include/cuopt/linear_programming/utilities/cython_types.hpp cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp

Repository: NVIDIA/cuopt

Length of output: 1684


🏁 Script executed:

# Also check the actual pyx file for context around line 404
rg -n "current_primal_solution_" --type=pyx -C 5 python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx

Repository: NVIDIA/cuopt

Length of output: 83


🏁 Script executed:

# Search for where current_primal_solution_ is assigned or allocated in the C++ code
rg -n "current_primal_solution_\s*=" --type=cpp -B2 -A2 cpp/src/linear_programming/

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Find the solver_wrapper.pyx file
fd "solver_wrapper" --type f

Repository: NVIDIA/cuopt

Length of output: 119


🏁 Script executed:

# Search for current_primal_solution_ assignments in all C++ files
rg -n "current_primal_solution_\s*=" --type=cpp -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 760


🏁 Script executed:

# Check the structure definitions more carefully - look at the full structs
cat -n cpp/include/cuopt/linear_programming/utilities/cython_types.hpp | head -100

Repository: NVIDIA/cuopt

Length of output: 4120


🏁 Script executed:

cat -n python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx | sed -n '390,420p'

Repository: NVIDIA/cuopt

Length of output: 2239


🏁 Script executed:

# Also look for where this is being used and if there are null checks before it
rg -n "current_primal_solution_" python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx -B5 -A5

Repository: NVIDIA/cuopt

Length of output: 1721


🏁 Script executed:

# Check if there's any initialization that guarantees non-null unique_ptr
rg -n "linear_programming_ret_t" --type=cpp -B5 -A5 | grep -A10 "current_primal_solution_"

Repository: NVIDIA/cuopt

Length of output: 1165


🏁 Script executed:

# Look for the constructor or initialization of linear_programming_ret_t
rg -n "linear_programming_ret_t\s*{" --type=cpp -A20

Repository: NVIDIA/cuopt

Length of output: 175


🏁 Script executed:

# Look for where linear_programming_ret_t is initialized/populated
rg -n "linear_programming_ret_t" --type=cpp -B2 -A10 | head -80

Repository: NVIDIA/cuopt

Length of output: 8879


🏁 Script executed:

# Check if unique_ptr members are always initialized with make_unique or can remain null
rg -n "make_unique.*device_buffer\|std::move.*solution" --type=cpp | grep -i "primal\|dual"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at how PDLP warmstart data is set - is it optional?
rg -n "pdlp_warm_start_data_" --type=cpp -B3 -A3

Repository: NVIDIA/cuopt

Length of output: 38738


Add null-pointer check before dereferencing warmstart buffer.

Line 404 dereferences lp_ptr.current_primal_solution_ (a std::unique_ptr<rmm::device_buffer>) via .get()[0].size() without first verifying the pointer is non-null. If the unique_ptr is null, this causes undefined behavior.

Use the safe pattern from the CPU path (line 526) or add an explicit null check:

if lp_ptr.current_primal_solution_ and lp_ptr.current_primal_solution_.get()[0].size() > 0:
🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx` at line 404,
The code dereferences lp_ptr.current_primal_solution_ without verifying it's
non-null; update the condition in the block around solver_wrapper.pyx (where
lp_ptr.current_primal_solution_ is checked) to first ensure
lp_ptr.current_primal_solution_ is truthy before accessing .get()[0].size(),
following the safe pattern used on the CPU path (see the CPU path check near the
other branch) so the condition becomes a null-check plus size-check on
lp_ptr.current_primal_solution_ to avoid undefined behavior.

Comment on lines +1 to +4
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES.
# SPDX-License-Identifier: Apache-2.0
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate SPDX-License-Identifier header.

Lines 2 and 4 both declare SPDX-License-Identifier: Apache-2.0. Remove the duplicate.

🐛 Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES.
-# SPDX-License-Identifier: Apache-2.0
 # All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES.
# SPDX-License-Identifier: Apache-2.0
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0
🤖 Prompt for AI Agents
In `@python/cuopt/cuopt/tests/linear_programming/test_cpu_only_execution.py`
around lines 1 - 4, Remove the duplicate SPDX license header by deleting one of
the repeated "SPDX-License-Identifier: Apache-2.0" lines (there are two
identical lines in the file header); ensure only a single
"SPDX-License-Identifier: Apache-2.0" remains at the top of the file alongside
the existing copyright line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants