Tolerate linear solver failure in non-linear schemes#6825
Tolerate linear solver failure in non-linear schemes#6825lhy11009 wants to merge 1 commit intogeodynamics:mainfrom
Conversation
|
Here I’m experimenting with an option that allows the nonlinear solver to continue to the next iteration even if the cheap Stokes solve fails to reach its tolerance and skip the expensive one. From a strict numerical standpoint this isn’t ideal, but in practice I’ve found that a “failed” Stokes iteration can still move the solution into a better state for the next nonlinear iteration. In my runs, I use At this stage, I mainly want to get feedback on whether this makes sense to you all, and whether there are already existing mechanisms in ASPECT that achieve something similar. Depending on that, we can then decide whether it’s worth formalizing something along these lines. |
|
Hy Haoyuan, I think this is a very interesting idea. Is this mostly useful if you have strongly nonlinear material model properties that may lead to difficult / unrealistic results in the first nonlinear iterations? Conceptually this is similar to choosing a coarse linear solver tolerance inside the nonlinear solver under the assumption that the nonlinear solver will iterate until its tolerance is reached. Something like this (changing the linear solver tolerance between nonlinear iterations) is possible in the Newton solver, where the linear solver doesnt need to reach a very tight tolerance anyway in order to improve the solution. However, your approach would work for any nonlinear solver. We can check others opinions -- I would be curious to hear what @bangerth and @tjhei think about this -- but I wouldnt be opposed to adding a parameter like
What do you think? |
|
Hey Rene, Yes—this is very useful in cases with strongly nonlinear material behavior during the first nonlinear iterations. In practice, it saves a significant amount of wall-clock time because the cheap iterations are much faster (even though I haven’t quantified this yet). I’ve used this scheme across both 2-D and 3-D models, as well as setups with slightly different configurations (e.g., resolution changes), and it has worked consistently well. I think all of your suggestions make sense. In particular, it would be important to consider how this new parameter relates to “Maximum number of expensive Stokes solver steps” and “Nonlinear solver failure strategy.” Conceptually, I also agree that the final linear solve before exiting the nonlinear solver loop should still converge. I’d be curious to hear what others think as well. |
|
I discussed this with @YiminJin today, and he pointed out the similarity between my approach and the E–W method. However, in the E–W method, the tolerance of the linear solver is gradually tightened as the non-linear solver converges. In contrast, my approach effectively uses no explicit tolerance for the linear solver; instead, it relies on a fixed number of cheap Stokes iterations. |
|
I am not opposed to a strategy like the one @gassmoeller suggests whereby we tolerate the linear solver failing. I find that more appealing than futzing around with the tolerance or solver parameters of the linear solver because it would make sure that what you implement remains localized in the nonlinear solver, rather than spilling into the linear solver code. At the end of the day, you are exactly right: If you understand the E-W method as a way to say "We don't actually care about exactly solving the linear system while we're still far from the nonlinear solution", then choosing cheaper options for the linear solver or ignoring its non-convergence makes sense too. I get where @gassmoeller is coming from, but I'd even accept if that happens in the last nonlinear iteration -- all we really care about, and should care about, is the nonlinear residual. From a code perspective, this is about the nonlinear solver, and so it would be nice if we could contain the changes to that part of the code base. |
|
Both of you pointed out that what we should aim for is to “tolerate linear solver failure in non-linear schemes.” Based on this, I have changed the title of this PR. @gassmoeller suggests an implementation of storing and checking the Stokes solution from the last step as a safety check. This would make particular sense when the non-linear and linear solver tolerances are set to the same value. In several of my examples, the linear solver fails during the first few non-linear iterations but does converge in the final iteration, just before the overall non-linear solver converges. Which is kind of cool. Looking forward to the discussion next Monday. |
49f268e to
26f828c
Compare
|
I tried to follow the lines of discussion we had, and now the implementation works partially. I also added a test using this new strategy. In this test, the nonlinear solver converges in one step after ~40 iterations, despite the linear solver reporting a failure. However, I ran into an issue regarding the implementation. Ideally, we would only modify the nonlinear solver so that it continues the iteration even if the linear solver fails. But I realized that this does not work in practice: if we only change how the nonlinear solver handles exceptions, the linear solver will still throw the exception before updating the residual, so the nonlinear solver cannot proceed correctly. Instead, in this example I went to the lowest level of the linear solver, where the exception is first thrown, and modified it so that it continues from there. This might not be the best design choice, so I am very open to suggestions on how this should be handled. |
gassmoeller
left a comment
There was a problem hiding this comment.
I agree with your assessment, we need to indeed do this at the level of the linear solver. I think your general approach is correct, but I have some comments to the code (see below).
Also: could you model this approach after what we do with nonlinear_failure_strategy? I.e. use an enum type to allow more possible behaviors in the future (e.g. we could extend this at some point to cut the timestep if the linear solver fails).
Also please introduce a variable similar to nonlinear_solver_failures that keeps track of how often the linear solver failed, and then output that number at the end of the simulation like done in simulator/core.cc:1280.
7ef9e8b to
a7c84dc
Compare
|
@gassmoeller Done with replying to your comments. I also left a couple of open comments and questions. |
gassmoeller
left a comment
There was a problem hiding this comment.
Yes, I think this is the right behavior.
I would like to ask for the following changes, just to improve the safety of the functionality:
-
Add a variable
linear_solver_failuresin simulator.h like the variablenonlinear_solver_failures. Search for all places wherenonlinear_solver_failuresis used and letlinear_solver_failuresbehave the same way. Add a warning at the end of the model run (like in core.cc:2329) if any linear solver failures occured and report how many. This will document for the user how many times the linear solver failed (and hopefully push them to investigate if too many failures happened). Also I noticed thatnonlinear_solver_failuresis not serialized into the checkpoint for restarts. We should do that (and do the same forlinear_solver_failures). Can you add both to checkpoint_restart.cc:609? -
Add a function to parameters.h similar to
is_defect_correctionbut name itsolve_Stokes_iterativelyand check if any iterative Stokes solver scheme is selected. Then add something like the following in parameters.cc after reading the linear solver failure strategy:
if (linear_solver_failure_strategy == continue)
AssertThrow(Parameters<dim>::solve_Stokes_iteratively(nonlinear_solver) == true, ExcMessage("You are not allowed to select the linear solver failure strategy 'continue' with a solver scheme that does not iterate the Stokes equations."));
This will not be quite as safe as I originally hoped for (checking if the last Stokes solver converged), but at least it will make sure no one can continue with a non-converged solution if the solver scheme does not at least try to iterate.
c7ae7db to
3a84b6d
Compare
3a84b6d to
13f7e4e
Compare
| // if the solver fails trigger the post stokes solver signal and throw an exception | ||
| catch (const std::exception &exc) | ||
| { | ||
| // ++linear_solver_failures; |
There was a problem hiding this comment.
This is related to counting the linear_solver_failure number. However, Adding this here ("GMG" solver) doesn't work because this function is not part of the simulator. The other location of the "AMG" solver doesn't have this problem.
I have addressed these points, except for the issue I commented on in the code. |
Pull Request Checklist. Please read and check each box with an X. Delete any part not applicable. Ask on the forum if you need help with any step.
This is a tentative PR that I opened to first discuss it with the maintainers. The code change is merely for demonstrating the idea. See the comments below.
Before your first pull request:
For all pull requests:
For new features/models or changes of existing features: