-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Ensure subtest's context kwargs are JSON serializable #13963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Ensure subtest's context kwargs are JSON serializable #13963
Conversation
Convert all the values of `SubtestContext.kwargs` to strings using `saferepr`. This complies with the requirement that the returned dict from `pytest_report_to_serializable` is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON. Fixes pytest-dev/pytest-xdist#1273
a3cf834 to
7585f45
Compare
bluetech
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments for your consideration.
src/_pytest/subtests.py
Outdated
| return dataclasses.asdict(self) | ||
| result = dataclasses.asdict(self) | ||
| # Brute-force the returned kwargs dict to be JSON serializable (pytest-dev/pytest-xdist#1273). | ||
| result["kwargs"] = {k: saferepr(v) for (k, v) in result["kwargs"].items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should strive to have the property that from_json(to_json(report)) == report. In this case it means that we should have self.kwargs itself should be serializable, not just in to_json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should strive to have the property that from_json(to_json(report)) == report.
I agree this is desirable, but I don't see how we can accomplish this in a general manner, for any type of object.
In unittest subtests, kwargs is permitted to contain any type of object, because it is not serialized to anything and used only for reporting.
I don't see how we can support that in functions that serialize to JSON.
Perhaps I'm missing something, could you elaborate a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking is that kwargs itself contain strings, not the objects themselves, that would of course make it serializable. It is not exposed in the API so should be fine? Unless the original objects are needed for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.
We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.), but for other objects I don't see another solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.)
Implemented that, take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I defer to your judgment on the representation.
However, I still think that from_json(to_json(report)) == report is important.
Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.
If the report itself keeps the kwargs as dict[str, json-serializable] (i.e. performs the conversion in init, not in to_serializable), then the property holds. Is there a need to keep the "raw" objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the report itself keeps the kwargs as dict[str, json-serializable] (i.e. performs the conversion in init, not in to_serializable), then the property holds. Is there a need to keep the "raw" objects?
I think we should keep the original objects, because doesn't seem right to me for us to arbitrarily convert the values to something else during __init__ just on the off-chance to_serializable might be called eventually. We lose data this way. Nothing prevents a custom plugin to expect the report objects to be intact in their own pytest_runtest_logreport implementation.
But something occurred to me: there's nothing in pytest_report_to_serializable that says the data returned needs to be JSON-native, only JSON-compatible. This means we can convert the data in the report in whenever format we want, as long as the resulting data can be converted into JSON.
With this in mind, I think we can simply use pickle to convert the kwargs into a str, which we can then easily deserialize during pytest_report_from_serializable. This will hold the from_serializable(to_serializable(report)) == report property, and should support most data types. pickle has its problems, but is the standard way to serialize/deserialize data in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please take a look.
One downside that did not occur to me is that local objects (such as MyEnum used in the test) normally cannot be pickled. I don't think this will be a problem in production, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lose data this way. Nothing prevents a custom plugin to expect the report objects to be intact in their own pytest_runtest_logreport implementation.
I somewhat think that an arbitrary Python value would be harder for a plugin to handle than a JSON-native value. AFAIK the main/intended use case for the pytest_runtest_logreport hook is to report the outcome to some other place (terminal, junitxml, etc.) and so what the plugin mostly cares about is the nodeid and outcome and metadata. In case of subtests, I'm not sure how many plugins would want to do something special with the kwargs formatting beyond repring it.
Regarding pickle, I never really used it, but I assumed it wasn't used in pytest for a reason. E.g. execnet doesn't use it for serialization, and pytest also went with json-like for report serialization. Maybe because it's Python specific, or binary, or has multiple versions, or not secure? For this reason my vague sense is that we should avoid it but I don't have a concrete argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat think that an arbitrary Python value would be harder for a plugin to handle than a JSON-native value.
Plugins are supposed to be "looking" at the value returned by pytest_report_to_serializable, the value is only meant to be JSON serializable so plugins have some guidance on how it is supposed to be implemented. Ultimately, plugins are expected to pass the returned value over to pytest_report_from_serializable and then use the returned object.
AFAIK the main/intended use case for the pytest_runtest_logreport hook is to report the outcome to some other place (terminal, junitxml, etc.) and so what the plugin mostly cares about is the nodeid and outcome and metadata. In case of subtests, I'm not sure how many plugins would want to do something special with the kwargs formatting beyond repring it.
Perhaps, but I wouldn't just assume that: there is nothing in the docs or the design that says that we will convert the kwargs using saferepr (and lose the original information). I can envision a plugin saving that data to some other logging system, and attempting to recover it later using pytest_report_from_serializable. If we convert things using saferepr from under it we lose the original metadata.
Regarding pickle, I never really used it, but I assumed it wasn't used in pytest for a reason. E.g. execnet doesn't use it for serialization, and pytest also went with json-like for report serialization. Maybe because it's Python specific, or binary, or has multiple versions, or not secure? For this reason my vague sense is that we should avoid it but I don't have a concrete argument.
The reasons are:
- It is Python specific; but here we are ultimately implementing hooks in Python, so I don't see this as an issue.
- It has multiple versions, and does not play well with Python 2 -> 3 objects; I believe this was the main motivator for execnet using a custom system.
- It is not secure, because it can execute arbitrary code (like
__init__of some classes if they are pickled,__setstate__, etc). But ultimately we are executing custom Python code anyway directly (the hooks), so this point is moot (we are already executing arbitrary code).
I think we are running in circles here:
- We want
from_json(to_json(kwargs)) == kwargsand I agree with that. - You seem to suggest to use
safereprwhen creating* the SubTestReport, which I disagree with: we shouldn't be changing the data like that just because eventually we might call a separate hook for serialization.
Perhaps @RonnyPfannschmidt @Pierre-Sassoulas @The-Compiler want to chime in and have other suggestions or share their opinions on what a correct approach would look like?
53f50a5 to
f8da2cb
Compare
f8da2cb to
fdfa22f
Compare
Convert all the values of
SubtestContext.kwargsto strings usingsaferepr.This complies with the requirement that the returned dict from
pytest_report_to_serializableis serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.Fixes pytest-dev/pytest-xdist#1273