-
Notifications
You must be signed in to change notification settings - Fork 319
Optimize wholemolecules #1045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimize wholemolecules #1045
Conversation
speedup 3%, but code is a mess
| // for(unsigned j=0; j<forcesForApply.size(); ++j) { | ||
| // forcesForApply[j]+=omp_f[j]; | ||
| // } |
Check notice
Code scanning / CodeQL
Commented-out code
| // if( nt>1 ) for(unsigned j=0; j<nder; ++j) omp_f[j] += ff*thisderiv[1+j]; | ||
| //else for(unsigned j=0; j<nder; ++j) forcesForApply[j] += ff*thisderiv[1+j]; |
Check notice
Code scanning / CodeQL
Commented-out code
| // for(const auto & a : atom_value_ind) { | ||
| // plumed_dbg_massert( ind<forcesToApply.size(), "problem setting forces in " + getLabel() ); | ||
| // std::size_t nn = a.first, kk = a.second; | ||
| // xpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
| // ypos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
| // zpos[nn]->inputForce[kk] += forcesToApply[ind]; ind++; | ||
| // } |
Check notice
Code scanning / CodeQL
Commented-out code
| // for(const auto & a : atom_value_ind) { | ||
| // std::size_t nn = a.first, kk = a.second; | ||
| // positions[j][0] = xpos[nn]->data[kk]; | ||
| // positions[j][1] = ypos[nn]->data[kk]; | ||
| // positions[j][2] = zpos[nn]->data[kk]; | ||
| // charges[j] = chargev[nn]->data[kk]; | ||
| // masses[j] = masv[nn]->data[kk]; | ||
| // j++; | ||
| // } |
Check notice
Code scanning / CodeQL
Commented-out code
| // for(const auto & a : atom_value_ind) { | ||
| // std::size_t nn = a.first, kk = a.second; | ||
| // xpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
| // ypos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
| // zpos[nn]->inputForce[kk] += xf*xval->data[1+k] + yf*yval->data[1+k] + zf*zval->data[1+k]; k++; | ||
| // } |
Check notice
Code scanning / CodeQL
Commented-out code
| atom_value_ind_grouped.back().second.push_back(kk); | ||
| auto prev_nn=nn; | ||
| for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
| auto nn = atom_value_ind[i].first; |
Check notice
Code scanning / CodeQL
Declaration hides variable
| auto prev_nn=nn; | ||
| for(unsigned i=1; i<atom_value_ind.size(); i++) { | ||
| auto nn = atom_value_ind[i].first; | ||
| auto kk = atom_value_ind[i].second; |
Check notice
Code scanning / CodeQL
Declaration hides variable
| // } else { | ||
| // for(unsigned j=1; j<p_groups[i].size(); ++j) { | ||
| // Vector first=getGlobalPosition(p_roots[i][j-1]); | ||
| // Vector second=getGlobalPosition(p_groups[i][j]); | ||
| // second=first+pbcDistance(first,second); | ||
| // setGlobalPosition(p_groups[i][j], second ); | ||
| // } | ||
| // } |
Check notice
Code scanning / CodeQL
Commented-out code
| // if(addref) { | ||
| // first = refs[i]+pbcDistance(refs[i],first); | ||
| // setGlobalPosition( p_groups[i][0], first ); | ||
| // } | ||
| // if(!doemst) { |
Check notice
Code scanning / CodeQL
Commented-out code
|
And here results for the intel compiler on my workstation. Reference is current master (just pulled), then I time master + this optimization of wholemolecules. My input: 28% -> 26% overhead Carlo's input: no measurable overhead in both cases (<1%) |
|
Hello @GiovanniBussi What you have done here seems sensible. I don't think I can do it better. |
|
@gtribello I really don't like the way it's done, because it's intrusive in what's supposed to be "user code" (wholemolecules), with modifications that are difficult to understand. I also want to repeat the timings, because there is an interplay between all the optimizations we are doing. This one might be not so relevant, so I would keep it on hold |
@gtribello here I tried a trick similar to #1044 to optimize wholemolecules. Notice that I didn't tough the EMST stuff, so tests are expected to fail. Anyway, tests with simple wholemolecules should work.
On my usual input I get some further speedup, with overhead going from 13% to 10%.
On @carlocamilloni 's input see here it's even better. Before this commit, I get same performance as v2.9. After this commit I gain something like 5%.
Maybe @gtribello you want to have a look at this code and think if there's some other reasonable (and simpler) solution. Notice that these PR are all stacked onto each other, so you should only look at the last commit (but the performance is measured including the previous PRs)
With my input:
With Carlo's input: