Slow start of SCM #9864
BerryOsterlund
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all
I'm having problems with slow restart of SCM. Production running cluster with HA SCM and OM.
The SCM on one node was restarted this morning. Doing a full rolling restart of the entire cluster right now, and this is the first service to restart. SCM starts up, I have access to the WebUI but after that nothing is shown in the logs. The RPC server port for DN's isn’t up yet so the DN's cant register. It's been sitting like this for almost 2 hours now and no progress.
I did the exact same thing earlier this week in a Dev environment, and one of the three SCM's there had the same issue. That was sitting idle in the same position for about 25 minutes and then the IPC port was initialized and all started to work. Didn’t do anything except waiting in that case.
What is the SCM trying to do in this situation? Between the HTTPS port is listening and the RPC port starts to listen. And more importantly, what can I do to solve this situation and get the SCM up again?
Btw, is this the correct channel/place to ask these types of questions, or is it better suited on example Slack?
Beta Was this translation helpful? Give feedback.
All reactions