Skip to content

Comments

easy fix for MP in frozen MacOS binaries#1185

Merged
vasole merged 7 commits intomasterfrom
macos_fix
Feb 13, 2026
Merged

easy fix for MP in frozen MacOS binaries#1185
vasole merged 7 commits intomasterfrom
macos_fix

Conversation

@sergey-yaroslavtsev
Copy link
Collaborator

Closes #1181
Just a proper freeze and hidden imports
Do not affect Windows

@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 12, 2026

build with tests can be find here:
https://github.com/sergey-yaroslavtsev/pymca_ci_mod/actions/runs/21921825599
changes compare to original master in that fork:
image

Now hdf5 tree is showing (tested on two arm Mac), All tests are passing. Do not see any other issue currently.
MacOS universal and Windows (cx_freeze) - 114 tests with 4 skipped as it should be 👍

@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 12, 2026

Probably we can accumulate with other open PRs and make a release, since this one is crucial for MacOS.

@vasole
Copy link
Member

vasole commented Feb 12, 2026

I feel uncomfortable importing a module that PyMca does not need at all when frozen. The proof is that it was never an issue until last release.

The multiprocess module is only needed to access HDF5 files while being written. To spawn or to fork a process each time an HDF5 file is opened/accessed is overkill.

I am much more comfortable with not shipping the multiprocess module with frozen binary (as it was before this release) and simply not trying to import it when sys.frozen exists and is true.

@vasole
Copy link
Member

vasole commented Feb 12, 2026

Besides my comment above, your modification should be made for each frozen application, not just PyMcaMain. You have the list in pyinstaller_github.spec:

image

@sergey-yaroslavtsev
Copy link
Collaborator Author

The multiprocess module is only needed to access HDF5 files while being written. To spawn or to fork a process each time an HDF5 file is opened/accessed is overkill.

The general point would be that if we do not import MP then every method which utilize MP need a workaround and will be run in the main process. Since MP is a standard library and feeze_support with hooks works well (it is recommended solution) I do not see why it should not be used. If any third-library use (or will) MP and we do not add support it also will create a problem.

I am much more comfortable with not shipping the multiprocess module with frozen binary (as it was before this release) and simply not trying to import it when sys.frozen exists and is true.

However, i do agree with a concept "it works - do not touch". So if we decide to be on very safe side it could be a workaround for now.

Besides my comment above, your modification should be made for each frozen application, not just PyMcaMain. You have the list in pyinstaller_github.spec:

You are 100% right. I will add it there as well as they are other entry points.

Relative to last point - does MacOS users really use them? because it is not trivial to open .app as an archive - find the executables to run... From such advanced users I would expect utilization of PyMca as library not as an app. I mean does these executables are really needed in DMG release?

@vasole
Copy link
Member

vasole commented Feb 12, 2026

Relative to last point - does MacOS users really use them? because it is not trivial to open .app as an archive - find the executables to run... From such advanced users I would expect utilization of PyMca as library not as an app. I mean does these executables are really needed in DMG release?

The user does not use them but the code does it. Perhaps not QStackWidget.py, but all the others are called as separate processes when performing batches.

@vasole
Copy link
Member

vasole commented Feb 12, 2026

The general point would be that if we do not import MP then every method which utilize MP need a workaround and will be run in the main process.

As far as I know, prior to this version there was only one place and it was handled in HDF5Widget.py

@vasole
Copy link
Member

vasole commented Feb 12, 2026

Is it possible to protect each import of and access to multiprocess by a try/except or a check for sys.frozen and its value?

Without that the previous freezing pipeline will not work and there will be no fallback for easy testing. In fact, we are modifying 9 files instead of just two (multiprocess was only imported at a single place).

@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 12, 2026

Is it possible to protect each import of and access to multiprocess by a try/except or a check for sys.frozen and its value?

You mean:

try:
   import multiprocessing
except Exception:
   pass 
...
if __name__ == '__main__':
    if hasattr(sys, 'frozen'):
       multiprocessing.freeze_support()

I do not see problem in it.
Just to notice: (1) MP is standard python library (2) freeze_support works only in frozen anyway.

Without that the previous freezing pipeline will not work and there will be no fallback for easy testing.

We can adjust accordingly.

In fact, we are modifying 9 files instead of just two (multiprocess was only imported at a single place).

True. If we are sure of no other MP (and we do not want to "clean" HDF5Widget) usage in future at all this solution is worse than a workaround. However, this solution from my perspective is a bit more future proof and provide some flexibility. I could be wrong due to lack of experience - do not hesitate to point it out.

P.S.
a dry-run to verify other imports:
https://github.com/silx-kit/pymca/actions/runs/21943365525

@vasole
Copy link
Member

vasole commented Feb 12, 2026

It should be if getattr(sys, 'frozen', False) to make it future proof.

@woutdenolf
Copy link
Collaborator

Same comment as I have for argparse: lets try to have a common CLI base and re-use it everywhere. #1183 (comment)

@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 12, 2026

Without that the previous freezing pipeline will not work and there will be no fallback for easy testing.

I realized that if we need it to work then HDF5Utils need to be fixed anyway to have a workaround for MP otherwise software run will crush immediately...

Thus,

try:
   import multiprocessing
except Exception:
   pass 

probably makes the hole fix to have much less sense... Because if we do it this way we are not allowed to directly import MP in other modules making this fix much less useful for developing since one of ideas was to avoid workarounds and be able to use MP directly.

I think we need to decide between two options:

  1. we directly import MP with support
    pros:
    a) easier further development - if we will need a new module with MP
    b) frozen version will work faster (the parts which can use MP will actually use them)
    c) it is kind of standard solution for MP
    cons:
    a) it was stable without it - so we break the rule "it works - do not touch" - potentially could lead to new bugs
    b) MP will become part of PyMca (i do not think it is a problem but worth to mention)

  2. we do not implement this fix and we have workaround for every case which use MP. As Armando suggest from the beggining.
    pros:
    a) it was stable before so no new bugs will appear
    b) no changes in the PyMca core functionality is foreseen so chance of MP need is low.
    cons:
    a) slower work of frozen version (parts which can utilize MP) - but there are just few.
    b) if we will need new module with MP - then it will require workaround every time.
    c) we should be careful with third-party-libraries

Options in between seems to have actually cons from both at the same time and does not make a lot of sense unfortunately.
Hard choice.

If after this discussion you have strong opinion please let me know.

@vasole
Copy link
Member

vasole commented Feb 12, 2026

Without that the previous freezing pipeline will not work and there will be no fallback for easy testing.

I realized that if we need it to work then HDF5Utils need to be fixed anyway to have a workaround for MP otherwise software run will crush immediately...

Thus,

try:
   import multiprocessing
except Exception:
   pass 

That was one of the two files to fix. The other one is pyinstaller_github.spec

I think we need to decide between two options:

  1. we directly import MP with support
    pros:
    a) easier further development - if we will need a new module with MP
    b) frozen version will work faster (the parts which can use MP will actually use them)

MP does not add any speed up to PyMca. PyMca uses subprocess and not the multiprocess module. Whatever works as standalone, it works with subprocess.

The multiprocess approach is the method @woutdenolf found to achieve robustness when trying to access HDF5 files while being written.

c) it is kind of standard solution for MP
cons:
a) it was stable without it - so we break the rule "it works - do not touch" - potentially could lead to new bugs
b) MP will become part of PyMca (i do not think it is a problem but worth to mention)

The less dependencies the better. If I can manage without using a module, I prefer to avoid it. I do not have to care about if the module works in some exotic platform or conditions.

  1. we do not implement this fix and we have workaround for every case which use MP. As Armando suggest from the beggining.
    pros:
    a) it was stable before so no new bugs will appear
    b) no changes in the PyMca core functionality is foreseen so chance of MP need is low.
    cons:
    a) slower work of frozen version (parts which can utilize MP) - but there are just few.

See above. MP does not add any speed to PyMca. I would expect the opposite (spawning or forking a process just to access a file)

b) if we will need new module with MP - then it will require workaround every time.
c) we should be careful with third-party-libraries

Options in between seems to have actually cons from both at the same time and does not make a lot of sense unfortunately. Hard choice.

If after this discussion you have strong opinion please let me know.

I'm afraid you will have to decide. I expect @woutdenolf and myself to have different opinions on this.

@woutdenolf
Copy link
Collaborator

Two options for me:

  • PyMca uses multiprocessing and any shell command (the main ones and the -m ones) uses multiprocessing.freeze_support(). We can have a common CLI main which implements all common CLI things: multiprocessing.freeze_support, ArgumentDefaultsHelpFormatter etc.

  • PyMca does not use multiprocessing so multiprocessing.freeze_support() is not needed.

I don't agree with anything bespoke.

@woutdenolf
Copy link
Collaborator

woutdenolf commented Feb 12, 2026

If you want to replace the multiprocessing approach with subprocess I'm sure it can be done. However you'll need to pickle over a pipe or a file.

LLM slop because I don't have time:

import subprocess
import pickle
import sys

def run_in_subprocess_popen(func, *args, default=None, **kwargs):
    payload = (func, args, kwargs)

    proc = subprocess.Popen(
        [sys.executable, "worker.py"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )

    try:
        # Send pickled payload
        pickle.dump(payload, proc.stdin)
        proc.stdin.close()

        # Read result
        result = pickle.load(proc.stdout)

        proc.wait()

        if proc.returncode != 0:
            return default

        return result

    except Exception:
        proc.kill()
        return default
import sys
import pickle

def main():
    try:
        func, args, kwargs = pickle.load(sys.stdin.buffer)
        result = func(*args, **kwargs)
        pickle.dump(result, sys.stdout.buffer)
        sys.stdout.buffer.flush()
    except Exception as e:
        # You may want better error handling
        raise

if __name__ == "__main__":
    main()

@woutdenolf
Copy link
Collaborator

More LLM slop. Anyway you get the idea, the point here is the data serialization to and from the subprocess, which is why I used multiprocessing because it already takes care of all that.

import subprocess
import pickle
import sys
import struct

def run_in_subprocess(func_name, *args, default=None, **kwargs):
    payload = (func_name, args, kwargs)
    data = pickle.dumps(payload)

    proc = subprocess.Popen(
        [sys.executable, "-m", "yourpackage.worker"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )

    try:
        # Send length-prefixed pickle (prevents framing issues)
        proc.stdin.write(struct.pack("!I", len(data)))
        proc.stdin.write(data)
        proc.stdin.flush()
        proc.stdin.close()

        # Read response length
        raw_len = proc.stdout.read(4)
        if not raw_len:
            return default

        msg_len = struct.unpack("!I", raw_len)[0]
        result_data = proc.stdout.read(msg_len)

        result = pickle.loads(result_data)

        proc.wait()

        if proc.returncode != 0:
            return default

        return result

    except Exception:
        proc.kill()
        return default
import sys
import pickle
import struct
from yourpackage.module import get_hdf5_group_keys

FUNCTIONS = {
    "get_hdf5_group_keys": get_hdf5_group_keys,
}

def main():
    # Read length-prefixed pickle
    raw_len = sys.stdin.buffer.read(4)
    if not raw_len:
        return

    msg_len = struct.unpack("!I", raw_len)[0]
    data = sys.stdin.buffer.read(msg_len)

    func_name, args, kwargs = pickle.loads(data)

    result = FUNCTIONS[func_name](*args, **kwargs)

    result_data = pickle.dumps(result)

    sys.stdout.buffer.write(struct.pack("!I", len(result_data)))
    sys.stdout.buffer.write(result_data)
    sys.stdout.buffer.flush()

if __name__ == "__main__":
    main()

@vasole
Copy link
Member

vasole commented Feb 12, 2026

@woutdenolf

I am more comfortable with PyMca not using multiprocessing. It is only used for the HDF5 robust access stuff and it was never available for frozen binaries. I see it as adding a dependency that can make things crash/misbehave at application start for no gain.

Concerning the subprocess alternative. Is it worth to replace multiprocess by subprocess just to have that functionality available for frozen binaries? My feeling is that it is overkill and that we are taking the risk of replacing something that works at beamlines by something untested.

I really see the least risky approach to leave the things as they were prior to last release. It warrants things work at beamlines and with frozen binaries.

@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 12, 2026

if we do smthg like in the HDF5Utils:

def run_in_subprocess(target, *args, context=None, default=None, **kwargs):
    try: 
        import multiprocessing
        ctx = multiprocessing.get_context(context)
        queue = ctx.Queue(maxsize=1)
        p = ctx.Process(
            target=subprocess_main,
            args=(queue, target) + args,
            kwargs=kwargs,
        )
        p.start()
        try:
            p.join()
            try:
                return queue.get(block=False)
            except Empty:
                return default
        finally:
            try:
                p.kill()
            except AttributeError:
                p.terminate()
    except Exception:
        if getattr(sys, 'frozen', False):
            _logger.debug("Frozen executable. Using standard approach")
        else:
            _logger.warning("Multiprocessing is not available. Using standard approach")
        try:
            return target(*args, **kwargs)
        except Exception:
            return default

and exclude MP from frozen binaries;
it works but the one particular test will fail for the frozen binaries - HDF5UtilsTest.py::testHDF5Utils::testSegFault

Of course this test could be skipped for frozen binaries tests.

@sergey-yaroslavtsev
Copy link
Collaborator Author

if I understood correctly that was the idea of @vasole:

My suggestion would be to lazy import multiprocessing where it is needed or not import it at all in frozen versions.

@sergey-yaroslavtsev
Copy link
Collaborator Author

Concerning the subprocess alternative. Is it worth to replace multiprocess by subprocess just to have that functionality available for frozen binaries? My feeling is that it is overkill and that we are taking the risk of replacing something that works at beamlines by something untested.

it will require to import pickle instead of multiprocessing - yes pickle do not require smthg like freeze_support but it also can create problems - have not used it myself (was using only multiprocessing with freeze_support)

@woutdenolf
Copy link
Collaborator

I'm not going to die on this hill. I gave my opinion and you are free not to follow it.

@vasole
Copy link
Member

vasole commented Feb 13, 2026

@sergey-yaroslavtsev

Please can you try if the frozen binaries generated after my modifications still work?

Copy link
Member

@vasole vasole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully tested on MacOS BigSur

@vasole vasole merged commit ed8739d into master Feb 13, 2026
43 checks passed
@vasole vasole deleted the macos_fix branch February 13, 2026 10:47
@sergey-yaroslavtsev
Copy link
Collaborator Author

sergey-yaroslavtsev commented Feb 13, 2026

MP was not excluded in pyinstaller_github.spec and cx_setup_github, thus, current frozen binaries still have MP in them.
pyinstaller_github.spec lines 81–82
cx_setup_github.py lines 69–70

@vasole Is it intended (just to be sure)?

To be noted:
If one will exclude them and freeze binaries - it probably should work (the same way as #1188; but better to teste it explicitly) but the test HDF5UtilsTest : testSegFault will probably fail (also as in #1188).

@vasole
Copy link
Member

vasole commented Feb 16, 2026

@vasole Is it intended (just to be sure)?

Yes, it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Packaging] MacOS 5.9.5 frozen application does not read HDF5 read by 5.9.4

3 participants