When we run kola in the pipeline we run many many tests in a single run, but only 4 at a time (--parallel=4).
It appears, though, that some qemu processes are hanging around longer. For example:
core@coreos-ppc64le-builder:~$ sudo ss -lutnp | grep qemu
udp UNCONN 0 0 0.0.0.0:32771 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=143))
udp UNCONN 0 0 0.0.0.0:33353 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=163))
udp UNCONN 0 0 0.0.0.0:33513 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=185))
udp UNCONN 0 0 0.0.0.0:35914 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=122))
udp UNCONN 0 0 0.0.0.0:36387 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=152))
udp UNCONN 0 0 0.0.0.0:36766 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=133))
udp UNCONN 0 0 0.0.0.0:36842 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=169))
udp UNCONN 0 0 0.0.0.0:36847 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=108))
udp UNCONN 0 0 0.0.0.0:37014 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=118))
udp UNCONN 0 0 0.0.0.0:37312 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=114))
udp UNCONN 0 0 0.0.0.0:37328 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=156))
udp UNCONN 0 0 0.0.0.0:37600 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=165))
udp UNCONN 0 0 0.0.0.0:37746 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=145))
udp UNCONN 0 0 0.0.0.0:37902 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=107))
udp UNCONN 0 0 0.0.0.0:38864 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=168))
udp UNCONN 0 0 0.0.0.0:38962 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=110))
udp UNCONN 0 0 0.0.0.0:40629 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=112))
udp UNCONN 0 0 0.0.0.0:41679 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=159))
udp UNCONN 0 0 0.0.0.0:41731 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=142))
udp UNCONN 0 0 0.0.0.0:42216 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=170))
udp UNCONN 0 0 0.0.0.0:43546 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=149))
udp UNCONN 0 0 0.0.0.0:44382 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=121))
udp UNCONN 0 0 0.0.0.0:44477 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=146))
udp UNCONN 0 0 0.0.0.0:44969 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=106))
udp UNCONN 0 0 0.0.0.0:45113 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=135))
udp UNCONN 0 0 0.0.0.0:45711 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=30))
udp UNCONN 0 0 0.0.0.0:45852 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=130))
udp UNCONN 0 0 0.0.0.0:46064 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=126))
udp UNCONN 0 0 0.0.0.0:47320 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=154))
udp UNCONN 0 0 0.0.0.0:47657 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=131))
udp UNCONN 0 0 0.0.0.0:48291 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=187))
udp UNCONN 0 0 0.0.0.0:48428 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=148))
udp UNCONN 0 0 0.0.0.0:50286 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=155))
udp UNCONN 0 0 0.0.0.0:50475 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=140))
udp UNCONN 0 0 0.0.0.0:50478 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=120))
udp UNCONN 0 0 0.0.0.0:51084 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=164))
udp UNCONN 0 0 0.0.0.0:51150 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=158))
udp UNCONN 0 0 0.0.0.0:51467 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=160))
udp UNCONN 0 0 0.0.0.0:51639 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=115))
udp UNCONN 0 0 0.0.0.0:51643 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=162))
udp UNCONN 0 0 0.0.0.0:51867 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=138))
udp UNCONN 0 0 0.0.0.0:51911 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=134))
udp UNCONN 0 0 0.0.0.0:52978 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=144))
udp UNCONN 0 0 0.0.0.0:53949 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=171))
udp UNCONN 0 0 0.0.0.0:54155 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=129))
udp UNCONN 0 0 0.0.0.0:54285 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=166))
udp UNCONN 0 0 0.0.0.0:54296 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=167))
udp UNCONN 0 0 0.0.0.0:54434 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=137))
udp UNCONN 0 0 0.0.0.0:55749 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=189))
udp UNCONN 0 0 0.0.0.0:55831 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=113))
udp UNCONN 0 0 0.0.0.0:55904 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=139))
udp UNCONN 0 0 0.0.0.0:56132 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=111))
udp UNCONN 0 0 0.0.0.0:56266 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=182))
udp UNCONN 0 0 0.0.0.0:56434 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=151))
udp UNCONN 0 0 0.0.0.0:56636 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=141))
udp UNCONN 0 0 0.0.0.0:56930 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=153))
udp UNCONN 0 0 0.0.0.0:57442 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=117))
udp UNCONN 0 0 0.0.0.0:57848 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=136))
udp UNCONN 0 0 0.0.0.0:57890 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=150))
udp UNCONN 0 0 0.0.0.0:58370 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=116))
udp UNCONN 0 0 0.0.0.0:58738 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=157))
udp UNCONN 0 0 0.0.0.0:59310 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=161))
udp UNCONN 0 0 0.0.0.0:59466 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=128))
udp UNCONN 0 0 0.0.0.0:60703 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=119))
udp UNCONN 0 0 *:37513 *:* users:(("qemu-system-ppc",pid=1975486,fd=127))
udp UNCONN 0 0 *:39929 *:* users:(("qemu-system-ppc",pid=1975486,fd=125))
udp UNCONN 0 0 *:46748 *:* users:(("qemu-system-ppc",pid=1975486,fd=124))
udp UNCONN 0 0 *:49099 *:* users:(("qemu-system-ppc",pid=1975486,fd=132))
udp UNCONN 0 0 *:50749 *:* users:(("qemu-system-ppc",pid=1975486,fd=123))
udp UNCONN 0 0 *:54827 *:* users:(("qemu-system-ppc",pid=1975486,fd=147))
tcp LISTEN 0 1 127.0.0.1:33359 0.0.0.0:* users:(("qemu-system-ppc",pid=1975455,fd=19))
tcp LISTEN 0 1 127.0.0.1:33229 0.0.0.0:* users:(("qemu-system-ppc",pid=1975987,fd=19))
tcp LISTEN 0 1 127.0.0.1:38673 0.0.0.0:* users:(("qemu-system-ppc",pid=1975486,fd=19))
tcp LISTEN 0 1 127.0.0.1:41731 0.0.0.0:* users:(("qemu-system-ppc",pid=1975753,fd=19))
The majority of those are owned by the same process so there might be some opportunity for us to figure out how to clean those up, or maybe they are legitimate somehow?
When we run kola in the pipeline we run many many tests in a single run, but only 4 at a time (
--parallel=4).It appears, though, that some qemu processes are hanging around longer. For example:
The majority of those are owned by the same process so there might be some opportunity for us to figure out how to clean those up, or maybe they are legitimate somehow?
The bigger problem however is that we had a port clash with the
kdump.crash.nfstest. The test requires the use of the "Qemu Host" IP:Port10.0.2.2:2049so it can end up clashing if something else happens to be using that port already.Something like this happened recently in the RHCOS pipeline:
Details
Basically for whatever reason the host clashed:
Could not set up host forwarding rule 'tcp:127.0.0.1:2049-:2049'. I don't know if anotherkdump.crash.nfstest was running for another pipeline (or if they would even clash).You can also reproduce something similar with
cosa kola run kdump.crash.nfs --multiply 2 --parallel 2.