https://github.com/google/learned_optimization/blob/a49615fd9694f2a3088a8d5a60d6021936bb94f8/docs/notebooks/Part1_Introduction.ipynb#L779 output_params used instead of output_momentums