As noted #43, @lee218llnl: reported.
Occasionally I get hangs with STAT, particularly after running it multiple times. It appears to be in lmon__fe.cxx on line 4601 in a pthread_cond_timedwait. I don’t know if this is an actual affect or just a correlation, but it seems like if I subsequently attach TV to the job and detach TV, then I am able to attach again with STAT.
I have also seen hang-like behavior (looping) in cobo on cobo_connect_hostname. This also appears to happen if I aggressively attach/detach/attach STAT multiple times.
I suspect 1 is due to FIFO handling within jsrun but I need a simple reproducer to prove or disprove myself.
I suspect 2 is due to a problem with colocation service within jsrun, but I need a simple reproducer to prove or disprove myself.
As noted #43, @lee218llnl: reported.
I suspect 1 is due to FIFO handling within
jsrunbut I need a simple reproducer to prove or disprove myself.I suspect 2 is due to a problem with colocation service within
jsrun, but I need a simple reproducer to prove or disprove myself.