-
Notifications
You must be signed in to change notification settings - Fork 462
layers: Do not remove substates on Destroy #11501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Instead, remove substates when the object is actually destructed (in the destructor). This matches the pre-substate behavior, where a state object could retain some data after Destroy (for example, syncval uses this for error messages). This assumes that there is shared ptr reference that keeps the object alive. In the post-substate version, Destroy removes registered substates, and the above scenario does not work. In the case of syncval, this results in losing resource handle information during submit time validation. Originally syncval command buffer state object stored the list of handles and QueueBatchContext held share_ptr to submitted command buffers.
|
CI Vulkan-ValidationLayers build queued with queue ID 624744. |
|
Still trying to come up with a test that reproduces that issue, but not that easy... (tested directly on the app) |
|
CI Vulkan-ValidationLayers build # 22191 running. |
|
CI Vulkan-ValidationLayers build # 22191 passed. |
|
I definitely want to reproduce this with a test. I understand that this error happens due to non synchronized writes from two queues (at least that's how vvl detects this) but the crash happens because command buffer is deleted (deletion looks correct, no in-use error). In order to delete command buffer you have to wait for it. Waiting for command buffer prevents the hazard.. I have a reproduciable gfxr capture, so can figure this out sooner or later. |
| for (auto &item : sub_states_) { | ||
| item.second->Destroy(); | ||
| } | ||
| sub_states_.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about in Pipeline::Destroy()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently we have use case with command buffers, it depends on the logic probably if other state object might need this (command buffers we referenced via shares ptr and used after deletion). This results in hard crash so if other objects have such scenarios it will be hard to miss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we now not clear here, but only for pipeline?
I can't tell if this is an issue due to just how Command Buffers (and queues) are the only state we have that we don't derive from CoreChecks
Is it because these are dispatchable handles
I guess if we are clearing it in Pipelines, but not here, curious what is the "rule" to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ideally to have logic that doesn’t need this but syncval relied on this before, so this just gets old behavior (that’s regression after substates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we now not clear here, but only for pipeline?
Ah, I see, indeed. Not sure why we cleared only for command buffers and pipelines and not for others.
|
Reproduced the original crash with a test. Now need to figure out if the scenario from the test is a valid behavior or not (even after this fix). If it's a valid behavior then need to fix additional false positive issue (can be as a separate PR). |
This is from a different example Is this the The interesting part here is that my acquire barrier uses AllCommands + MemoryRead. I can imagine this could be specified somewhere as invalid, because i'm going to be writing to the image even though it hasn't been made ready for writing yet. |
|
@MennoVink I found the root cause of the issue, that's in syncval internals (we also have internal test that reproduces the same behavior as in the app). Now we need to fix one part of the validation. The crash itself is not the root cause (e.g. that there is no check for null at that point). If false positive is fixed, that crash will go away and also no false positive error message. I'm trying to come up with a solution during this week. p.s. that's not about QFOT mostly about internal mechanism to track accesses from different queues in general. |
Instead, remove substates when the object is actually destructed (in the destructor).
This matches the pre-substate behavior, where a state object could retain some data after Destroy (for example, syncval uses this for error messages).
In the post-substate version, Destroy removes registered substates. In the case of syncval, this results in losing resource handle information during submit time validation. Originally syncval command buffer state object stored the list of handles which was referenced in case of error (even if command buffer was destroyed).
Closes #11490