Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Nov 8, 2024

Add a function parsec_data_discard that releases the data such that the host copy remains intact but does not prevent destruction of the data once all device copies have been released. This keeps the host copy available for device copies to inspect and avoids potential race conditions in the release process. During an eviction, copies of data with a discarded host copy are not transfered but put directly into the lru.

Replaces some duplicated code with a call to parsec_device_release_gpu_copy.

@devreal devreal requested a review from a team as a code owner November 8, 2024 19:27
@devreal devreal force-pushed the parsec-data-discard branch 2 times, most recently from bbd1448 to 1dd2d54 Compare November 8, 2024 19:59
item = (parsec_list_item_t*)item->list_next; /* conversion needed for volatile */
if( 0 == gpu_copy->readers ) {
if (cpu_copy->flags & PARSEC_DATA_FLAG_DISCARDED) {
parsec_list_item_ring_chop((parsec_list_item_t*)gpu_copy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. The flag isn't changed and the device copy maintains a reference on the data_t so if we miss the update of the flags here we will evict into the host copy and then release everything.

@devreal devreal force-pushed the parsec-data-discard branch 2 times, most recently from dde7129 to 14934a5 Compare November 20, 2024 23:14
Add a function `parsec_data_discard` that releases the data
such that the host copy remains intact but does not prevent
destruction of the data once all device copies have been released.
This keeps the host copy available for device copies to inspect
and avoids potential race conditions in the release process.
During an eviction, copies of data with a discarded host copy
are not transfered but put directly into the lru.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Otherwise we cannot destroy empty or discarded data.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Also OR the flag instead of assigning it.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Discarded data may never be pushed back so don't warn about it
still being owned by the device.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Discarded data sit toward the end of the lru while the data
to be evicted is at the front. We walk both forward and backward
to collect the discarded data from the back, until we either meet the
pivot or we found enough data to evict. If we discarded data we don't
evict.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Copy link
Contributor

@therault therault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed during the call, issues raised are addressed and this is performance / feature - critical for MRA, so we should merge.

@abouteiller
Copy link
Contributor

will review that it doesn't break dplasma and merge

@devreal devreal force-pushed the parsec-data-discard branch from be66039 to 4f6b4c8 Compare February 14, 2025 21:23
@devreal
Copy link
Contributor Author

devreal commented Feb 14, 2025

I modified this PR so that parsec_data_discard notifies the device(s) about discarded data. If the device finds that it has discarded data, it will try to release that data. Otherwise we don't the pay the cost of iterating through the LRU. This simplified the w2r task creation back to something sane(r) I had earlier.

We only try to find discarded data if we know that there is discarded data.
If no one discarded data (e.g., DPLASMA) we don't go look for it.
This is also needed to properly clean up discarded data before releasing
the zone allocator.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
@devreal devreal force-pushed the parsec-data-discard branch from 4f6b4c8 to 9c7b42b Compare February 14, 2025 22:53
#endif
};

static inline void release_discarded_data(parsec_device_gpu_module_t *gpu_device, parsec_gpu_data_copy_t* gpu_copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler says this function is not used.

@devreal
Copy link
Contributor Author

devreal commented Feb 25, 2025

This may not be needed anymore if we get parsec_data_release_self_contained_data from #671: https://github.com/ICLDisco/parsec/pull/671/files#diff-b76b62ea20f19d97740a2221cabf57210f9f97991c0ef635ca0cafcc4c3c40d1R596

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants