Add experimental PW LOBPCG eigensolver by amadeus725 · Pull Request #7406 · deepmodeling/abacus-develop

amadeus725 · 2026-05-30T13:24:02Z

Background

During recent code reading and testing, we found that the current PW bpcg
solver path does not appear to expose an explicit generalized eigenproblem
interface for the overlap matrix S. Further comparison on USPP cases showed
that bpcg can produce results noticeably different from cg and dav, where
generalized eigenproblem handling is required.

As a first step toward improving this area, this PR introduces an experimental
LOBPCG solver framework for PW standard eigenproblems.

Changes

Add DiagoLobpcg for CPU PW standard eigenproblems (S = I).
Add ks_solver = lobpcg support in the PW hsolver path.
Add input documentation for lobpcg.
Add SCF iteration print label LB.
Add unit tests for:
- Hermitian eigenvalue correctness
- orthonormality and residual checks
- generalized-interface not-implemented behavior

Current Limitations

This PR intentionally does not claim USPP/generalized eigenproblem support yet.

Currently supported:

PW basis
CPU
std::complex<double>
norm-conserving / standard eigenproblem path (S = I)

Currently not supported:

USPP / generalized eigenproblem (S != I)
GPU
float instantiation
band-parallel LOBPCG

Unsupported generalized usage exits explicitly with WARNING_QUIT.

Tests

cmake --build . --target hsolver MODULE_HSOLVER_lobpcg abacus_basic_para -j2
env I_MPI_FABRICS=shm ctest -R MODULE_HSOLVER_lobpcg --output-on-failure
git diff --check

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

mohanchen · 2026-05-31T00:52:06Z

Nice try. You can keep refining the implementation and use test cases to verify its correctness and performance.

Cstandardlib · 2026-05-31T02:17:28Z

+              Real* eigenvalue_in,
+              const std::vector<double>& ethr_band);
+
+    /// USPP (S≠I) diagonalization. NOT IMPLEMENTED — aborts.


Actually standard eigenproblem is just a special case(S=I) for generalized problem. Only one diag interface is necessary for both, and the latter is adequate.

Thanks for the suggestion. I will narrow down the interface.

Cstandardlib · 2026-05-31T02:36:06Z

+              Real* eigenvalue_in,
+              const std::vector<double>& ethr_band);
+
+    /// USPP (S≠I) diagonalization. NOT IMPLEMENTED — aborts.


The standard eigenvalue problem is just a special case (where S is I) for the generalized problem. Thus one diag interface (the latter) is adequate.

Cstandardlib · 2026-05-31T02:41:49Z

+namespace hsolver {
+
+// ============================================================================
+// Band-major explicit-loop helpers (CPU, n_band_l == n_band required)


Most loops can be achieved with a BLAS 3 op

amadeus725 · 2026-05-31T07:32:19Z

Updated LOBPCG to expose only the generalized diag(hpsi, spsi, ...) interface. The current implementation still only accepts S = I; non-identity overlap exits explicitly. Also replaced several explicit dense matrix operations with existing PGemmCN / PLinearTransform helpers.

amadeus725 · 2026-06-06T07:31:52Z

Update: PW LOBPCG band-parallel generalized path

This update moves the experimental PW LOBPCG solver beyond the previous
serial/S=I prototype. The current version now has a 2-rank validated
band-parallel path, including the generalized S != I / USPP case.

What changed

The LOBPCG PW path now always enters through the generalized
diag(hpsi, spsi, ...) interface. The standard problem is treated as the
S = I special case, while USPP uses the same interface with an explicit
overlap operator.

Band parallelism has been wired through the PW solver and input/global-state
setup:

ks_solver = lobpcg can now keep bndpar > 1.
ks_run, local band counts, and initial wavefunction preparation are
updated for LOBPCG band-parallel execution.
Initial PW wavefunctions are prepared globally when needed and then
scattered across band groups.

For the generalized band-parallel algorithm, the distributed subspace now
tracks the full set of objects needed by LOBPCG:

trial/search blocks: X, Z, P
Hamiltonian products: HX, HZ, HP
overlap products: SX, SZ, SP
projected subspace matrices: H_sub, S_sub
global coefficient rotation back to distributed wavefunctions

Stability changes

The generalized band-parallel path uses rank-compression when the projected
overlap matrix S_sub is ill-conditioned. This is important for USPP cases:
direct generalized diagonalization of the raw subspace can be unstable when
the local LOBPCG subspace contains nearly dependent directions.

The implementation also keeps explicit diagnostics for failed subspace solves,
non-finite vectors, and unconverged LOBPCG inner iterations. These diagnostics
include k-point, local/global band count, basis size, and USPP context.

Performance-related changes

Two low-risk optimizations are included:

Build Hermitian projected subspace blocks from the upper triangle and mirror
the conjugate block. This reduces PGemmCN calls in the tested
band-parallel cases.
Avoid redundant H*pdir / S*pdir recomputation after the generalized
band-parallel tail update.

There is also a small shared PLinearTransform::act cleanup: an unused
B_tmp copy in the MPI path was removed. The existing
MODULE_HSOLVER_para_linear_trans test was run after this change.

Validation

Unit and integration-style tests run locally:

cmake --build build --target MODULE_HSOLVER_lobpcg -j2
cmake --build build --target MODULE_HSOLVER_linear_trans MODULE_HSOLVER_lobpcg -j2
cmake --build build --target abacus_basic_para -j2
cmake --build build --target MODULE_IO_system_variable_test -j2

env I_MPI_FABRICS=shm ctest --test-dir build -R '^MODULE_HSOLVER_lobpcg$' --output-on-failure
env I_MPI_FABRICS=shm ctest --test-dir build -R '^MODULE_HSOLVER_lobpcg_parallel$' --output-on-failure
env I_MPI_FABRICS=shm ctest --test-dir build -R '^MODULE_HSOLVER_para_linear_trans$' --output-on-failure
ctest --test-dir build -R '^MODULE_IO_system_variable_test$' --output-on-failure
git diff --check

Real-case checks

The following 2-rank band-parallel USPP cases were rerun after the latest
cleanup:

Case	Final energy	Notes
NaCl USPP bp2	`-1220.8380554789837333 eV`	`PGemmCN multiply = 82`
Fe USPP tight bp2	`-2860.7186058101456183 eV`	`PGemmCN multiply = 160`, `PLinearTransform act = 272`

Previously validated reference matrix:

Case family	Parallel settings	Result
Si NC	bp1 / bp2 / bp4	energies agree to roundoff
NaCl USPP	bp1 / bp2 / bp4	energies agree to roundoff
Fe USPP tight	bp1 / bp2 / bp4	energies agree to roundoff

Current limitations

This is still an experimental PW LOBPCG implementation. The currently
validated scope is:

PW basis
CPU
std::complex<double>
standard S = I and generalized S != I / USPP paths
band-parallel validation on 2 ranks, with prior bp4 reference checks

Still not covered:

GPU
float instantiation
systematic bp4/bp8 stress testing on a larger machine
deeper PLinearTransform call-count reduction

The next performance step would likely require a batched or multi-RHS
PLinearTransform interface. That would touch lower-level band-parallel
communication, so I kept it out of this update.

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

source/source_psi/psi_prepare.cpp:1

In the lobpcg_band_parallel case, scatter_source is taken from bp_global_evc_device->get_pointer(). When PARAM.inp.device == "gpu", that pointer is very likely device memory; passing it into Parallel_Common::send_data(...) (and also into syncmem_complex_op() as a source) will break unless the codebase guarantees CUDA-aware MPI and that send_data supports device buffers. A safer approach is to always use a host-accessible pointer for MPI sends (e.g., use bp_global_evc_cpu->get_pointer() / stage a host buffer and synchronize device→host explicitly before the sends), and keep device pointers confined to device-side operations."

#include "psi_prepare.h"

amadeus725 · 2026-06-06T07:54:55Z

+void DiagoLobpcg<T, Device>::validate_ethr_band(const std::vector<double>& ethr_band) const
+{
+    if (ethr_band.size() < static_cast<size_t>(this->n_band_l)) {
+        std::ostringstream oss;
+        oss << "LOBPCG ethr_band size mismatch: size=" << ethr_band.size()
+            << ", required=" << this->n_band_l;
+        if (!this->diag_context.empty()) {
+            oss << ", context={" << this->diag_context << "}";
+        }
+        throw std::invalid_argument(oss.str());
+    }
+}
+
+template <typename T, typename Device>
+bool DiagoLobpcg<T, Device>::test_error(
+    const ct::Tensor& err_in, const std::vector<double>& ethr_band)
+{
+    Real* _err_st = err_in.data<Real>();
+    bool  not_conv = false;
+    std::vector<Real> tmp_cpu;
+    if (err_in.device_type() == ct::DeviceType::GpuDevice) {
+        tmp_cpu.resize(this->n_band_l);
+        _err_st = tmp_cpu.data();
+        syncmem_var_d2h_op()(_err_st, err_in.data<Real>(), this->n_band_l);
+    }
+    for (int ii = 0; ii < this->n_band_l; ii++)
+        if (_err_st[ii] > ethr_band[ii]) not_conv = true;
+#ifdef __MPI
+    MPI_Allreduce(MPI_IN_PLACE, &not_conv, 1, MPI_C_BOOL, MPI_LOR, BP_WORLD);
+#endif
+    return not_conv;
+}


amadeus725 · 2026-06-06T07:54:19Z

+#ifdef __MPI
+    int nproc_in_pool, kpar = 1, mypool, rank_in_pool;
+    setupmpi(argc, argv, nproc, myrank);
+    divide_pools(nproc, myrank, nproc_in_pool, kpar, mypool, rank_in_pool);
+    const bool use_band_parallel_world = std::getenv("ABACUS_LOBPCG_TEST_BNDPAR") != nullptr;
+    MPI_Comm_split(MPI_COMM_WORLD, use_band_parallel_world ? 0 : myrank, 0, &BP_WORLD);
+    if (use_band_parallel_world)
+    {
+        GlobalV::MY_BNDGROUP = myrank;
+        GlobalV::NPROC_IN_BNDGROUP = nproc;
+        MPI_Comm_free(&POOL_WORLD);
+        MPI_Comm_split(MPI_COMM_WORLD, myrank, 0, &POOL_WORLD);
+    }
+    GlobalV::NPROC_IN_POOL = nproc;
+#else
+    MPI_Init(&argc, &argv);
+#endif


Add experimental PW LOBPCG eigensolver

578198f

Copilot AI review requested due to automatic review settings May 30, 2026 13:24

Fix hsolver test links for LOBPCG

45f7e0f

Copilot AI reviewed May 30, 2026

View reviewed changes

Merge branch 'develop' into feature/pw-lobpcg

bac1a01

mohanchen added the project_learning label May 31, 2026

Cstandardlib reviewed May 31, 2026

View reviewed changes

Refine LOBPCG interface and BLAS usage

01f1403

Enable PW LOBPCG band-parallel generalized path

7f01c2e

amadeus725 requested review from Cstandardlib and Copilot June 6, 2026 07:41

Copilot AI reviewed Jun 6, 2026

View reviewed changes

amadeus725 added 2 commits June 6, 2026 15:53

Clarify LOBPCG local threshold contract

3ea5ea8

Fix LOBPCG band-parallel finite checks

f34334d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental PW LOBPCG eigensolver#7406

Add experimental PW LOBPCG eigensolver#7406
amadeus725 wants to merge 7 commits into
deepmodeling:developfrom
amadeus725:feature/pw-lobpcg

amadeus725 commented May 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mohanchen commented May 31, 2026

Uh oh!

Cstandardlib May 31, 2026

Uh oh!

amadeus725 May 31, 2026

Uh oh!

Cstandardlib May 31, 2026

Uh oh!

Cstandardlib May 31, 2026

Uh oh!

amadeus725 commented May 31, 2026

Uh oh!

amadeus725 commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

amadeus725 Jun 6, 2026

Uh oh!

amadeus725 Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

amadeus725 commented May 30, 2026

Background

Changes

Current Limitations

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

mohanchen commented May 31, 2026

Uh oh!

Cstandardlib May 31, 2026

Choose a reason for hiding this comment

Uh oh!

amadeus725 May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Cstandardlib May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Cstandardlib May 31, 2026

Choose a reason for hiding this comment

Uh oh!

amadeus725 commented May 31, 2026

Uh oh!

amadeus725 commented Jun 6, 2026

What changed

Stability changes

Performance-related changes

Validation

Real-case checks

Current limitations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

amadeus725 Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

amadeus725 Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants