Skip to content

process consuming pyhps wastes 5 minutes trying to clean up its child processes because HPS client responds to sigterm incorrectly #743

@ansAFinney

Description

@ansAFinney

🔍 Before submitting the issue

  • I have searched among the existing issues
  • I am using a Python virtual environment

🐞 Description of the bug

Defect encountered when trying to use pyhps in a SAF solution

The data transfer client process restarts itself when the SAF framework tries to kill it at the end of a SAF transaction method which consumes the pyhps API including some file transfers. SAF Transaction methods kill off all the child processes created by the method at the end of the transaction method. Its takes about 5 minutes before SAF transaction methods get to the point where they are "free" of the data transfer client. This is problematic for us so I was wondering:

a) is there a way to configure the client so that the data transfer client isn't sticky and just dies when killed?
b) is there a way to explicitly shutdown the data transfer client via the API?
c) is the data transfer client essential or are there alternatives that don't involve child processes etc?
d) is there an easy way to identify the data transfer client process so we can avoid trying to kill it?
e) what is the designed lifetime of the data transfer client?

the SAF long running transaction method logs look like this when they terminate:

2026-02-03 15:45:25,706 DEBUG [ansys.saf.glow._executor.transaction] [transaction.py:351] [trace_id=50a1e11a81bc83b1f2cb44f257258d79 span_id=02322f527952135d resource.service.name=GLOW METHOD RUNNER]- Uploading field status with value type <class 'str'> and value evaluated
2026-02-03 15:45:26,010 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 170000)
2026-02-03 15:45:26,012 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 50036)
2026-02-03 15:45:26,016 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:322] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker log output stopped
2026-02-03 15:45:26,405 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:377] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker exited with code 15, restarting ...
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.client] [client.py:495] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Port changed to 64287
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:366] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Starting worker: C:\Users\afinney\AppData\Local\Ansys\hps\data-transfer\binaries\worker\hpsdata-650849e76cee1d92.exe --log-types diode --host 127.0.0.1 --port 64287 --dt-url https://hps.aapstejtgyp6r8f.win.ansys.com:8443/hps/dt/api/v1 --log-types console -v 3 --insecure --auth-type api-key -t "Bearer ***"
2026-02-03 15:50:04,581 WARNING [ansys.hps.data_transfer.client.client] [client.py:644] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Failed to send shutdown request: [WinError 10061] No connection could be made because the target machine actively refused it
2026-02-03 15:50:04,581 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:290] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping worker ...
2026-02-03 15:50:04,834 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:388] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker monitor stopped
2026-02-03 15:50:05,321 DEBUG [ansys.hps.data_transfer.client.client] [client.py:698] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker status monitor stopped
2026-02-03 15:50:09,612 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:299] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker did not stop in time, killing ...
2026-02-03 15:50:09,613 INFO [ansys.hps.client.client] [client.py:250] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping the data transfer client gracefully.

📝 Steps to reproduce

via the pyhps API using one python process do the following:

  1. create an HPS project, job, task etc
  2. upload files to support the job
  3. run the job to successful completion
  4. kill all the child processes of the calling python process

BUG doing this causes the kill process to stall for 5 minutes doing bugger all

the code for the kill is as follows:

a call to

kill_proc_tree(os.getpid(), timeout=1)

which calls:
https://psutil.readthedocs.io/en/latest/#kill-process-tree

our copy is:

# ©2023, ANSYS Inc. Unauthorized use, distribution or duplication is prohibited.
from collections.abc import Callable
import logging
import signal

import psutil

logger = logging.getLogger(__name__)


# Taken from https://psutil.readthedocs.io/en/latest/#kill-process-tree
def kill_proc_tree(
    pid: int,
    sig: signal.Signals = signal.SIGTERM,
    include_parent: bool = False,
    timeout: float | None = None,
    on_terminate: Callable[[psutil.Process], None] | None = None,
):
    """Kill a process tree (including grandchildren) with signal
    "sig" and return a (gone, still_alive) tuple.
    "on_terminate", if specified, is a callback function which is
    called as soon as a child terminates.
    """
    parent = psutil.Process(pid)
    children = parent.children(recursive=True)
    if include_parent:
        children.append(parent)
    for p in children:
        try:
            logger.info(f"Killing child process (pid: {p.pid})")
            p.send_signal(sig)
        except psutil.NoSuchProcess:
            pass
    gone, alive = psutil.wait_procs(children, timeout=timeout, callback=on_terminate)
    return (gone, alive)

💻 Which operating system are you using?

Windows

📀 Which ANSYS version are you using?

this failure doesn't involve any flagship products

🐍 Which Python version are you using?

3.10

📦 Installed packages

accessible-pygments==0.0.4
aiofiles==23.2.1
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aioshutil==1.3
aiosignal==1.4.0
aiosqlite==0.19.0
alabaster==1.0.0
annotated-doc==0.0.3
annotated-types==0.6.0
ansys-api-dbu==0.3.28
ansys-api-discovery==1.0.20
ansys-api-edb==0.2.1
ansys-api-fluent==0.3.36
ansys-api-geometry==0.4.90
ansys-api-mapdl==0.5.2
ansys-api-mechanical==0.1.3
ansys-api-platform-instancemanagement==1.0.0
ansys-api-tools-filetransfer==0.1.2
ansys-bdm-api==0.3.1
ansys-bdm-shared-volume==0.2.0
ansys-edb-core==0.2.1
ansys-fluent-core==0.37.1
ansys-geometry-core==0.14.2
ansys-hps-client==0.9.0
ansys-iam-oidc==0.6.0
ansys-mapdl-core==0.71.3
ansys-mapdl-reader==0.55.2
ansys-math-core==0.2.4
ansys-mechanical-core==0.12.0
ansys-mechanical-env==0.1.6
ansys-mechanical-stubs==0.1.9
ansys-minerva-python-client==0.3.1
ansys-optislang-core==0.9.4
ansys-platform-instancemanagement==1.1.2
ansys-pythonnet==3.1.0rc6
-e git+https://github.com/ansys-internal/glow-engine.git@bc35724e8bb17555a06022f54baf4dce908124d3#egg=ansys_saf_glow_engine
ansys-saf-pim-light-server==0.3.11.dev1
ansys-saf-product-configuration==0.11.dev2
ansys-saf-product-manager==0.1.dev2
ansys-saf-testing==0.4.0
ansys-sphinx-theme==1.4.2
ansys-theia-viewer==0.2.5b0
ansys-tools-common==0.4.0
ansys-tools-filetransfer==0.2.1
ansys-tools-path==0.8.1
ansys-translation-utilities==0.1.0
ansys-units==0.9.1
anyio==3.7.1
appdirs==1.4.4
ariadne==0.23.0
asgi-lifespan==2.1.0
asgiref==3.10.0
async-timeout==4.0.3
asyncpg==0.29.0
attrs==23.2.0
autodoc_pydantic==2.2.0
Babel==2.14.0
backoff==2.2.1
basedpyright==1.32.1
beartype==0.22.9
beautifulsoup4==4.12.3
bidict==0.23.1
black==24.10.0
bson==0.5.10
build==0.8.0
cachelib==0.9.0
cachetools==5.3.2
cattrs==23.2.3
certifi==2024.7.4
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
clr-loader==0.2.6
codespell==2.4.1
colorama==0.4.6
contourpy==1.2.0
coverage==7.6.4
cryptography==44.0.2
cycler==0.12.1
dash==2.18.2
dash-core-components==2.0.0
dash-extensions==1.0.19rc1
dash-html-components==2.0.0
dash-table==5.0.0
dataclass-wizard==0.22.3
debugpy==1.8.0
defusedxml==0.7.1
Deprecated==1.3.1
dill==0.3.8
diskcache==5.6.3
distlib==0.3.8
Django==5.2.9
docker==7.1.0
docutils==0.21.2
EditorConfig==0.12.3
exceptiongroup==1.2.0
execnet==2.1.1
ezdxf==1.4.3
fastapi==0.121.3
filelock==3.20.3
Flask==2.2.5
Flask-Caching==2.3.0
flexcache==0.3
flexparser==0.4
fonttools==4.61.1
fpdf2==2.7.9
frozenlist==1.4.1
geomdl==5.4.0
googleapis-common-protos==1.62.0
gql==3.5.0
graphql-core==3.2.3
greenlet==3.1.1
grpcio==1.60.1
grpcio-health-checking==1.48.2
grpcio-status==1.60.1
gunicorn==23.0.0
h11==0.16.0
httpcore==1.0.9
httpx==0.27.2
idna==3.7
imagesize==1.4.1
importlib-metadata==6.11.0
importlib-resources==6.1.1
iniconfig==2.0.0
itsdangerous==2.1.2
jaraco.classes==3.3.1
Jinja2==3.1.6
joblib==1.5.3
joserfc==1.2.2
jsbeautifier==1.14.11
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
keyring==24.3.0
kiwisolver==1.4.5
livereload==2.7.1
lsprotocol==2023.0.1
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.20.2
marshmallow-oneofschema==3.1.1
matplotlib==3.8.3
mdurl==0.1.2
mistune==2.0.5
mock==4.0.3
more-itertools==10.3.0
msgpack==1.1.2
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
narwhals==2.14.0
nest-asyncio==1.6.0
networkx==3.1
nh3==0.2.15
nltk==3.9.2
nodeenv==1.9.1
nodejs-wheel-binaries==22.20.0
numpy==1.26.4
numpydoc==1.8.0
oauthlib==3.2.2
opentelemetry-api==1.27.0
opentelemetry-exporter-otlp==1.27.0
opentelemetry-exporter-otlp-proto-common==1.27.0
opentelemetry-exporter-otlp-proto-grpc==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-instrumentation==0.48b0
opentelemetry-instrumentation-asgi==0.48b0
opentelemetry-instrumentation-fastapi==0.48b0
opentelemetry-instrumentation-flask==0.48b0
opentelemetry-instrumentation-httpx==0.48b0
opentelemetry-instrumentation-logging==0.48b0
opentelemetry-instrumentation-wsgi==0.48b0
opentelemetry-proto==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-semantic-conventions==0.48b0
opentelemetry-util-http==0.48b0
outcome==1.3.0.post0
packaging==26.0
pandas==2.2.1
pathspec==0.12.1
pdf2image==1.17.0
pep517==0.13.1
pillow==10.3.0
Pint==0.24.4
pkg-about==1.0.8
pkginfo==1.9.6
platformdirs==4.2.0
plotly==6.5.0
pluggy==1.5.0
plumbum==1.10.0
pooch==1.8.2
prettytable==3.17.0
propcache==0.4.1
protobuf==4.25.8
psutil==6.0.0
pyaedt==0.22.2
pyansys-tools-report==0.8.2
pyansys-tools-versioning==0.7.0
pyc-wheel==1.2.7
pycparser==2.21
pycryptodome==3.23.0
pydantic==2.10.6
pydantic-settings==2.5.2
pydantic_core==2.27.2
pydata-sphinx-theme==0.16.1
pyedb==0.63.0
pygls==1.3.0
Pygments==2.17.2
pyiges==0.3.2
PyJWT==2.8.0
pyparsing==3.1.2
pyproject-api==1.6.1
pyright==1.1.407
PySocks==1.7.1
pytest==8.3.4
pytest-asyncio==0.23.7
pytest-cov==4.1.0
pytest-html==4.1.1
pytest-md==0.2.0
pytest-metadata==3.1.0
pytest-mock==3.15.0
pytest-rerunfailures==15.0
pytest-timeout==2.3.1
pytest-xdist==3.6.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-engineio==4.12.3
python-multipart==0.0.19
python-socketio==5.14.3
pytomlpp==1.0.13
pytz==2024.1
pyvista==0.46.4
pywin32==306
pywin32-ctypes==0.2.2
PyYAML==6.0.1
readme-renderer==42.0
referencing==0.35.1
regex==2026.1.15
requests==2.32.4
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
retrying==1.3.4
rfc3986==1.5.0
rich==13.7.0
rpds-py==0.18.1
rpyc==6.0.2
rtree==1.4.1
ruff==0.14.2
ruff-lsp==0.0.58
scikit-rf==1.8.0
scipy==1.15.2
scooby==0.11.0
selenium==4.32.0
semver==3.0.4
shapely==2.1.2
simple-websocket==1.1.0
six==1.16.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.5
Sphinx==8.1.3
sphinx-autobuild==2021.3.14
sphinx-autodoc-typehints==2.5.0
sphinx-code-tabs==0.5.5
sphinx-copybutton==0.5.2
sphinx-gallery==0.15.0
sphinx-notfound-page==0.8.3
sphinx-tabs==3.4.7
sphinx_design==0.6.1
sphinx_mdinclude==0.5.4
sphinxcontrib-applehelp==2.0.0
sphinxcontrib-devhelp==2.0.0
sphinxcontrib-htmlhelp==2.1.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==2.0.0
sphinxcontrib-serializinghtml==2.0.0
sphinxcontrib-websupport==1.2.7
sphinxemoji==0.2.0
SQLAlchemy==2.0.36
sqlparse==0.5.3
starlette==0.49.1
stream-zip==0.0.83
tabulate==0.9.0
tenacity==8.5.0
toml==0.10.2
tomli==2.0.1
tomli_w==1.2.0
tornado==6.5.4
tox==4.15.1
tqdm==4.67.1
trame==3.11.0
trame-client==3.11.2
trame-common==1.1.1
trame-server==3.10.0
trame-vtklocal==0.15.2
trio==0.32.0
trio-websocket==0.12.2
twine==4.0.2
typing_extensions==4.14.0
tzdata==2024.1
urllib3==2.6.3
uvicorn==0.24.0.post1
virtualenv==20.36.1
vtk==9.5.2
waitress==3.0.1
wcwidth==0.2.14
websocket-client==1.9.0
Werkzeug==3.0.6
wrapt==1.16.0
wslink==2.5.0
wsproto==1.2.0
yarl==1.22.0
zipp==3.19.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions