🔍 Before submitting the issue
🐞 Description of the bug
Defect encountered when trying to use pyhps in a SAF solution
The data transfer client process restarts itself when the SAF framework tries to kill it at the end of a SAF transaction method which consumes the pyhps API including some file transfers. SAF Transaction methods kill off all the child processes created by the method at the end of the transaction method. Its takes about 5 minutes before SAF transaction methods get to the point where they are "free" of the data transfer client. This is problematic for us so I was wondering:
a) is there a way to configure the client so that the data transfer client isn't sticky and just dies when killed?
b) is there a way to explicitly shutdown the data transfer client via the API?
c) is the data transfer client essential or are there alternatives that don't involve child processes etc?
d) is there an easy way to identify the data transfer client process so we can avoid trying to kill it?
e) what is the designed lifetime of the data transfer client?
the SAF long running transaction method logs look like this when they terminate:
2026-02-03 15:45:25,706 DEBUG [ansys.saf.glow._executor.transaction] [transaction.py:351] [trace_id=50a1e11a81bc83b1f2cb44f257258d79 span_id=02322f527952135d resource.service.name=GLOW METHOD RUNNER]- Uploading field status with value type <class 'str'> and value evaluated
2026-02-03 15:45:26,010 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 170000)
2026-02-03 15:45:26,012 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 50036)
2026-02-03 15:45:26,016 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:322] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker log output stopped
2026-02-03 15:45:26,405 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:377] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker exited with code 15, restarting ...
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.client] [client.py:495] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Port changed to 64287
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:366] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Starting worker: C:\Users\afinney\AppData\Local\Ansys\hps\data-transfer\binaries\worker\hpsdata-650849e76cee1d92.exe --log-types diode --host 127.0.0.1 --port 64287 --dt-url https://hps.aapstejtgyp6r8f.win.ansys.com:8443/hps/dt/api/v1 --log-types console -v 3 --insecure --auth-type api-key -t "Bearer ***"
2026-02-03 15:50:04,581 WARNING [ansys.hps.data_transfer.client.client] [client.py:644] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Failed to send shutdown request: [WinError 10061] No connection could be made because the target machine actively refused it
2026-02-03 15:50:04,581 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:290] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping worker ...
2026-02-03 15:50:04,834 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:388] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker monitor stopped
2026-02-03 15:50:05,321 DEBUG [ansys.hps.data_transfer.client.client] [client.py:698] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker status monitor stopped
2026-02-03 15:50:09,612 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:299] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker did not stop in time, killing ...
2026-02-03 15:50:09,613 INFO [ansys.hps.client.client] [client.py:250] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping the data transfer client gracefully.
📝 Steps to reproduce
via the pyhps API using one python process do the following:
- create an HPS project, job, task etc
- upload files to support the job
- run the job to successful completion
- kill all the child processes of the calling python process
BUG doing this causes the kill process to stall for 5 minutes doing bugger all
the code for the kill is as follows:
a call to
kill_proc_tree(os.getpid(), timeout=1)
which calls:
https://psutil.readthedocs.io/en/latest/#kill-process-tree
our copy is:
# ©2023, ANSYS Inc. Unauthorized use, distribution or duplication is prohibited.
from collections.abc import Callable
import logging
import signal
import psutil
logger = logging.getLogger(__name__)
# Taken from https://psutil.readthedocs.io/en/latest/#kill-process-tree
def kill_proc_tree(
pid: int,
sig: signal.Signals = signal.SIGTERM,
include_parent: bool = False,
timeout: float | None = None,
on_terminate: Callable[[psutil.Process], None] | None = None,
):
"""Kill a process tree (including grandchildren) with signal
"sig" and return a (gone, still_alive) tuple.
"on_terminate", if specified, is a callback function which is
called as soon as a child terminates.
"""
parent = psutil.Process(pid)
children = parent.children(recursive=True)
if include_parent:
children.append(parent)
for p in children:
try:
logger.info(f"Killing child process (pid: {p.pid})")
p.send_signal(sig)
except psutil.NoSuchProcess:
pass
gone, alive = psutil.wait_procs(children, timeout=timeout, callback=on_terminate)
return (gone, alive)
💻 Which operating system are you using?
Windows
📀 Which ANSYS version are you using?
this failure doesn't involve any flagship products
🐍 Which Python version are you using?
3.10
📦 Installed packages
accessible-pygments==0.0.4
aiofiles==23.2.1
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aioshutil==1.3
aiosignal==1.4.0
aiosqlite==0.19.0
alabaster==1.0.0
annotated-doc==0.0.3
annotated-types==0.6.0
ansys-api-dbu==0.3.28
ansys-api-discovery==1.0.20
ansys-api-edb==0.2.1
ansys-api-fluent==0.3.36
ansys-api-geometry==0.4.90
ansys-api-mapdl==0.5.2
ansys-api-mechanical==0.1.3
ansys-api-platform-instancemanagement==1.0.0
ansys-api-tools-filetransfer==0.1.2
ansys-bdm-api==0.3.1
ansys-bdm-shared-volume==0.2.0
ansys-edb-core==0.2.1
ansys-fluent-core==0.37.1
ansys-geometry-core==0.14.2
ansys-hps-client==0.9.0
ansys-iam-oidc==0.6.0
ansys-mapdl-core==0.71.3
ansys-mapdl-reader==0.55.2
ansys-math-core==0.2.4
ansys-mechanical-core==0.12.0
ansys-mechanical-env==0.1.6
ansys-mechanical-stubs==0.1.9
ansys-minerva-python-client==0.3.1
ansys-optislang-core==0.9.4
ansys-platform-instancemanagement==1.1.2
ansys-pythonnet==3.1.0rc6
-e git+https://github.com/ansys-internal/glow-engine.git@bc35724e8bb17555a06022f54baf4dce908124d3#egg=ansys_saf_glow_engine
ansys-saf-pim-light-server==0.3.11.dev1
ansys-saf-product-configuration==0.11.dev2
ansys-saf-product-manager==0.1.dev2
ansys-saf-testing==0.4.0
ansys-sphinx-theme==1.4.2
ansys-theia-viewer==0.2.5b0
ansys-tools-common==0.4.0
ansys-tools-filetransfer==0.2.1
ansys-tools-path==0.8.1
ansys-translation-utilities==0.1.0
ansys-units==0.9.1
anyio==3.7.1
appdirs==1.4.4
ariadne==0.23.0
asgi-lifespan==2.1.0
asgiref==3.10.0
async-timeout==4.0.3
asyncpg==0.29.0
attrs==23.2.0
autodoc_pydantic==2.2.0
Babel==2.14.0
backoff==2.2.1
basedpyright==1.32.1
beartype==0.22.9
beautifulsoup4==4.12.3
bidict==0.23.1
black==24.10.0
bson==0.5.10
build==0.8.0
cachelib==0.9.0
cachetools==5.3.2
cattrs==23.2.3
certifi==2024.7.4
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
clr-loader==0.2.6
codespell==2.4.1
colorama==0.4.6
contourpy==1.2.0
coverage==7.6.4
cryptography==44.0.2
cycler==0.12.1
dash==2.18.2
dash-core-components==2.0.0
dash-extensions==1.0.19rc1
dash-html-components==2.0.0
dash-table==5.0.0
dataclass-wizard==0.22.3
debugpy==1.8.0
defusedxml==0.7.1
Deprecated==1.3.1
dill==0.3.8
diskcache==5.6.3
distlib==0.3.8
Django==5.2.9
docker==7.1.0
docutils==0.21.2
EditorConfig==0.12.3
exceptiongroup==1.2.0
execnet==2.1.1
ezdxf==1.4.3
fastapi==0.121.3
filelock==3.20.3
Flask==2.2.5
Flask-Caching==2.3.0
flexcache==0.3
flexparser==0.4
fonttools==4.61.1
fpdf2==2.7.9
frozenlist==1.4.1
geomdl==5.4.0
googleapis-common-protos==1.62.0
gql==3.5.0
graphql-core==3.2.3
greenlet==3.1.1
grpcio==1.60.1
grpcio-health-checking==1.48.2
grpcio-status==1.60.1
gunicorn==23.0.0
h11==0.16.0
httpcore==1.0.9
httpx==0.27.2
idna==3.7
imagesize==1.4.1
importlib-metadata==6.11.0
importlib-resources==6.1.1
iniconfig==2.0.0
itsdangerous==2.1.2
jaraco.classes==3.3.1
Jinja2==3.1.6
joblib==1.5.3
joserfc==1.2.2
jsbeautifier==1.14.11
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
keyring==24.3.0
kiwisolver==1.4.5
livereload==2.7.1
lsprotocol==2023.0.1
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.20.2
marshmallow-oneofschema==3.1.1
matplotlib==3.8.3
mdurl==0.1.2
mistune==2.0.5
mock==4.0.3
more-itertools==10.3.0
msgpack==1.1.2
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
narwhals==2.14.0
nest-asyncio==1.6.0
networkx==3.1
nh3==0.2.15
nltk==3.9.2
nodeenv==1.9.1
nodejs-wheel-binaries==22.20.0
numpy==1.26.4
numpydoc==1.8.0
oauthlib==3.2.2
opentelemetry-api==1.27.0
opentelemetry-exporter-otlp==1.27.0
opentelemetry-exporter-otlp-proto-common==1.27.0
opentelemetry-exporter-otlp-proto-grpc==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-instrumentation==0.48b0
opentelemetry-instrumentation-asgi==0.48b0
opentelemetry-instrumentation-fastapi==0.48b0
opentelemetry-instrumentation-flask==0.48b0
opentelemetry-instrumentation-httpx==0.48b0
opentelemetry-instrumentation-logging==0.48b0
opentelemetry-instrumentation-wsgi==0.48b0
opentelemetry-proto==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-semantic-conventions==0.48b0
opentelemetry-util-http==0.48b0
outcome==1.3.0.post0
packaging==26.0
pandas==2.2.1
pathspec==0.12.1
pdf2image==1.17.0
pep517==0.13.1
pillow==10.3.0
Pint==0.24.4
pkg-about==1.0.8
pkginfo==1.9.6
platformdirs==4.2.0
plotly==6.5.0
pluggy==1.5.0
plumbum==1.10.0
pooch==1.8.2
prettytable==3.17.0
propcache==0.4.1
protobuf==4.25.8
psutil==6.0.0
pyaedt==0.22.2
pyansys-tools-report==0.8.2
pyansys-tools-versioning==0.7.0
pyc-wheel==1.2.7
pycparser==2.21
pycryptodome==3.23.0
pydantic==2.10.6
pydantic-settings==2.5.2
pydantic_core==2.27.2
pydata-sphinx-theme==0.16.1
pyedb==0.63.0
pygls==1.3.0
Pygments==2.17.2
pyiges==0.3.2
PyJWT==2.8.0
pyparsing==3.1.2
pyproject-api==1.6.1
pyright==1.1.407
PySocks==1.7.1
pytest==8.3.4
pytest-asyncio==0.23.7
pytest-cov==4.1.0
pytest-html==4.1.1
pytest-md==0.2.0
pytest-metadata==3.1.0
pytest-mock==3.15.0
pytest-rerunfailures==15.0
pytest-timeout==2.3.1
pytest-xdist==3.6.1
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-engineio==4.12.3
python-multipart==0.0.19
python-socketio==5.14.3
pytomlpp==1.0.13
pytz==2024.1
pyvista==0.46.4
pywin32==306
pywin32-ctypes==0.2.2
PyYAML==6.0.1
readme-renderer==42.0
referencing==0.35.1
regex==2026.1.15
requests==2.32.4
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
retrying==1.3.4
rfc3986==1.5.0
rich==13.7.0
rpds-py==0.18.1
rpyc==6.0.2
rtree==1.4.1
ruff==0.14.2
ruff-lsp==0.0.58
scikit-rf==1.8.0
scipy==1.15.2
scooby==0.11.0
selenium==4.32.0
semver==3.0.4
shapely==2.1.2
simple-websocket==1.1.0
six==1.16.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.5
Sphinx==8.1.3
sphinx-autobuild==2021.3.14
sphinx-autodoc-typehints==2.5.0
sphinx-code-tabs==0.5.5
sphinx-copybutton==0.5.2
sphinx-gallery==0.15.0
sphinx-notfound-page==0.8.3
sphinx-tabs==3.4.7
sphinx_design==0.6.1
sphinx_mdinclude==0.5.4
sphinxcontrib-applehelp==2.0.0
sphinxcontrib-devhelp==2.0.0
sphinxcontrib-htmlhelp==2.1.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==2.0.0
sphinxcontrib-serializinghtml==2.0.0
sphinxcontrib-websupport==1.2.7
sphinxemoji==0.2.0
SQLAlchemy==2.0.36
sqlparse==0.5.3
starlette==0.49.1
stream-zip==0.0.83
tabulate==0.9.0
tenacity==8.5.0
toml==0.10.2
tomli==2.0.1
tomli_w==1.2.0
tornado==6.5.4
tox==4.15.1
tqdm==4.67.1
trame==3.11.0
trame-client==3.11.2
trame-common==1.1.1
trame-server==3.10.0
trame-vtklocal==0.15.2
trio==0.32.0
trio-websocket==0.12.2
twine==4.0.2
typing_extensions==4.14.0
tzdata==2024.1
urllib3==2.6.3
uvicorn==0.24.0.post1
virtualenv==20.36.1
vtk==9.5.2
waitress==3.0.1
wcwidth==0.2.14
websocket-client==1.9.0
Werkzeug==3.0.6
wrapt==1.16.0
wslink==2.5.0
wsproto==1.2.0
yarl==1.22.0
zipp==3.19.1
🔍 Before submitting the issue
🐞 Description of the bug
Defect encountered when trying to use pyhps in a SAF solution
The data transfer client process restarts itself when the SAF framework tries to kill it at the end of a SAF transaction method which consumes the pyhps API including some file transfers. SAF Transaction methods kill off all the child processes created by the method at the end of the transaction method. Its takes about 5 minutes before SAF transaction methods get to the point where they are "free" of the data transfer client. This is problematic for us so I was wondering:
a) is there a way to configure the client so that the data transfer client isn't sticky and just dies when killed?
b) is there a way to explicitly shutdown the data transfer client via the API?
c) is the data transfer client essential or are there alternatives that don't involve child processes etc?
d) is there an easy way to identify the data transfer client process so we can avoid trying to kill it?
e) what is the designed lifetime of the data transfer client?
the SAF long running transaction method logs look like this when they terminate:
2026-02-03 15:45:25,706 DEBUG [ansys.saf.glow._executor.transaction] [transaction.py:351] [trace_id=50a1e11a81bc83b1f2cb44f257258d79 span_id=02322f527952135d resource.service.name=GLOW METHOD RUNNER]- Uploading field status with value type <class 'str'> and value evaluated
2026-02-03 15:45:26,010 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 170000)
2026-02-03 15:45:26,012 INFO [ansys.saf.glow._utilities.procs] [procs.py:30] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Killing child process (pid: 50036)
2026-02-03 15:45:26,016 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:322] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker log output stopped
2026-02-03 15:45:26,405 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:377] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker exited with code 15, restarting ...
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.client] [client.py:495] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Port changed to 64287
2026-02-03 15:45:27,415 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:366] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Starting worker: C:\Users\afinney\AppData\Local\Ansys\hps\data-transfer\binaries\worker\hpsdata-650849e76cee1d92.exe --log-types diode --host 127.0.0.1 --port 64287 --dt-url https://hps.aapstejtgyp6r8f.win.ansys.com:8443/hps/dt/api/v1 --log-types console -v 3 --insecure --auth-type api-key -t "Bearer ***"
2026-02-03 15:50:04,581 WARNING [ansys.hps.data_transfer.client.client] [client.py:644] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Failed to send shutdown request: [WinError 10061] No connection could be made because the target machine actively refused it
2026-02-03 15:50:04,581 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:290] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping worker ...
2026-02-03 15:50:04,834 DEBUG [ansys.hps.data_transfer.client.binary] [binary.py:388] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker monitor stopped
2026-02-03 15:50:05,321 DEBUG [ansys.hps.data_transfer.client.client] [client.py:698] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker status monitor stopped
2026-02-03 15:50:09,612 WARNING [ansys.hps.data_transfer.client.binary] [binary.py:299] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Worker did not stop in time, killing ...
2026-02-03 15:50:09,613 INFO [ansys.hps.client.client] [client.py:250] [trace_id=0 span_id=0 resource.service.name=GLOW METHOD RUNNER]- Stopping the data transfer client gracefully.
📝 Steps to reproduce
via the pyhps API using one python process do the following:
BUG doing this causes the kill process to stall for 5 minutes doing bugger all
the code for the kill is as follows:
a call to
which calls:
https://psutil.readthedocs.io/en/latest/#kill-process-tree
our copy is:
💻 Which operating system are you using?
Windows
📀 Which ANSYS version are you using?
this failure doesn't involve any flagship products
🐍 Which Python version are you using?
3.10
📦 Installed packages