Skip to content

Conversation

@ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Aug 25, 2025

Status: Data models and fundamental logic are implemented. I need to clean up a lot and add tests and some admin debug screens. I also need to do the data migration for existing containers data.

This PR introduces a publishing app concept of dependency tracking, as well as adding a PublishingSideEffect analog of the existing DraftSideEffect. The high-level goal is to be able to efficiently do operations like, "publish this entity and all its children" or answer queries like "when was the last time the published state of this entity changed (including its children)?".

The publish_from_drafts function has been modified to publish all dependencies.

Follow-on to #328, broadly implements what was discussed in #317, though I found it simpler to model all dependencies at the PublishableEntity level, rather than splitting them out into separate Draft and Published dependencies. That being said, the dependencies_hash_digest must be computed separately, since the published and draft versions of any given dependency can be different at any given time.

@ormsbee ormsbee marked this pull request as draft August 25, 2025 22:42
@ormsbee ormsbee changed the title Simpler side effects Publishing dependency tracking and side-effect calculation for Draft and Published Aug 25, 2025
@ormsbee ormsbee changed the title Publishing dependency tracking and side-effect calculation for Draft and Published Publishing dependency tracking and side-effect calculation + dependency state summary for published and draft Aug 27, 2025
@ormsbee ormsbee force-pushed the simpler-side-effects branch from 72688d7 to 51511f6 Compare August 28, 2025 15:49
@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 3, 2025

@bradenmacdonald, @kdmccormick: Based on your feedback today, I reworked the admin page entries to look like this:

image

It's not quite obvious, but the version information for the draft is italicized if it's different from the published version + dependency state.

@ormsbee ormsbee force-pushed the simpler-side-effects branch from 0b452cf to 9323705 Compare September 3, 2025 14:41
Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.

Also introduces PublishableEntityVersionDependency as a way of
tracking unpinned dependencies of a version, e.g. the children
of a version of a Unit.

Also introduces a new dependencies_hash_digest field to Draft
and Published models to track dependency state.

This PR adds Django admin functionality for this new data, as
well as optimizing some existing calls that used to have to do
more complex container -> entity list traversal.
@ormsbee ormsbee force-pushed the simpler-side-effects branch from 9323705 to 2e9cc4c Compare September 3, 2025 14:42
@bradenmacdonald
Copy link
Contributor

@ormsbee Cool, I think that's a bit more useful and clear.

@ormsbee ormsbee marked this pull request as ready for review September 3, 2025 17:20
@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 3, 2025

@kdmccormick, @bradenmacdonald: I still have to write more tests, but I need to put this down for a day or two to attend to some long-overdue reviews. The code should be in a reviewable state, minus some union-related type annotation stuff that I have to look into later. Please review if you have time.

@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 11, 2025

Annoying late night thought: It sounds like we may at some point elevate the state summary hash to something that external clients would use, e.g. for history comparison or caching purposes. But it's not stored in the history, since it's calculated on the Draft and Published tables. But it doesn't have to be--since any change to a Draft or a Published (even side-effects) has to be represented in the change logs, that means that we could put those hash summaries on the DraftChangeLogRecord and PublishChangeLogRecord instead. That way, you could see where a particular state was in the history of that thing, and what caused it to become that way.

@kdmccormick
Copy link
Member

@ormsbee I think it's a great idea

@kdmccormick
Copy link
Member

Before locking ourselves into UUIDs, I'm still curious if there are any other ways to generate a dependency hash.

A bad idea (just writing it down for posterity)

What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package? For example, in pseudocode:

def hash(pev) = combine_hashes(
    hash(pev.publishable_entity.key),
    hash(pev.title),
    *pev.component_version.contents.hashes if pev.component_version,
    *hash(child for child in pev.container_version.entity_list.children) if pev.container,
)

I think this would work OK with libraries as they are today. But, they would not capture any data that is hung off of the LC models in the future. It would break down as soon as someone, for example, made a custom Container subclass with some critical metadata on the ContainerVersion subclass... updating that metadata would fail to modify the dependency hash.

Maybe this could be resolved by allowing PE subclasses to register custom hashing functions? Probably not worth it.

A half-baked idea

What about just hashing just the PEV by its key and version number? We already guarantee that (key, version number) uniquely identifies a PEV in a package, so it should be sufficient input for a hash whose job is to capture whether any changes have happened to PE's dependencies (which are in that same package).

(key, version_num) only breaks down across Open edX instances, but I don't think we need to worry about that for the dependency hash, right?

@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 11, 2025

What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package?

I know you state this is a bad idea, but I do want to explicitly highlight the tradeoffs we're making here for other people to read later, because in many ways this sort of hashing would be the gold standard for how to do this. As you point out, it would require anyone extending the data models to properly calculate a hash for their extended data. Not only would we need to create API hooks for that, it's asking a lot of people to get that hashing right.

Instead of this, both the UUID approach in this PR and the (identifier + version num) approach suggested below try to reduce the burden on developers by taking the shortcut that any PublishableEntityVersion and all the models that may hang off of it are intended to be immutable. Therefore, whatever the actual content is, we can use any shorthand identifier for it, so long as those identifiers won't conflict with each other. This means that we aren't quite as accurate as hashing in the intrinsic content. If we add a piece of text to a problem in v2 and remove it again in v3, there is nothing in the system that marks v1 and v3 as being equivalent to each other. In exchange for this loss of accuracy, we get a simpler API and fast, predictable hash generation.

Going to reply to the (key, version) comment in a bit...

@kdmccormick kdmccormick self-requested a review October 24, 2025 14:40
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing, just a couple quick thoughts so far.

@bradenmacdonald
Copy link
Contributor

What's left to do before we can merge this PR? We need the full descendant publishing to fix some libraries bugs - see #421

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 28, 2025

@bradenmacdonald: I did a major shift over of where I'm storing the hashes (onto the log records), but I'm still chasing down some transaction related bugs with some very unhelpful traces.

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 30, 2025

Addressed all comments so far.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two more minor comments. the models all look good to me 👍🏻

currently reviewing the publishing-cascade, side-effect-calculation, and backfill code...

Comment on lines +21 to +31
@cached_property
def rows(self):
"""
Convenience method to iterate rows.
I'd normally make this the reverse lookup name for the EntityListRow ->
EntityList foreign key relation, but we already have references to
entitylistrow_set in various places, and I thought this would be better
than breaking compatibility.
"""
return self.entitylistrow_set.order_by("order_num")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] the only external references to entitylistrow_set are in a single test file in edx-platform. Since we're still <v1.0, imo now is the time to make these little breaking change that make the API nicer.

Comment on lines 76 to 78
it almost always will), then publishing one of those Components will alter
the published state of the Unit, even if the UnitVersion does not change. In
that case, we still consider the Unit to have been "published".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it almost always will), then publishing one of those Components will alter
the published state of the Unit, even if the UnitVersion does not change. In
that case, we still consider the Unit to have been "published".
it almost always will), then publishing one of those Components will alter
the published state of the Unit, even if the UnitVersion does not change.

[optional] IMO what we "consider to have been have been published" is a fuzzy UX-level concept that is subject to change as the product evolves. I suggest focusing on what's literally happening in the data model, and let the UX decide what it means for something to "have been published" from a user's perspective.

@kdmccormick kdmccormick self-requested a review October 31, 2025 15:10
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing)

]

# These are the Published or Draft objects where we need to repoint the
# log_record (publish_log_record or draft_change_log) to point to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# log_record (publish_log_record or draft_change_log) to point to the
# log_record (publish_log_record or draft_change_log_record) to point to the

branch_objs_to_update_with_side_effects = []

while changes_and_affected:
original_change, affected = changes_and_affected.pop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this inner variable to something like change or cause, because:

  • If we take "original change" to mean "a change coming directly from the change_log" (as is implied by affected_by_original_change), then I think anything at the 2nd layer of affect or lower is no longer "original".
  • It's shadowing the outer loop's original_change, which is confusing. I don't think it makes the code any more complicated for the inner loop (the one that descends the layers) to have its own variable.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at most of the code. As always, thanks for the careful docs, they've helped a lot. I'd like to give it at least one more pass early next week.

Comment on lines 860 to 863
# If the Draft or Published that is affected by this change is not
# already in the change_log, then we have to add it.
affected_version_pk = affected.version_id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If the Draft or Published that is affected by this change is not
# already in the change_log, then we have to add it.
affected_version_pk = affected.version_id

I think this comment is already covered (better) by your comment right above defaults={ and the extra variable here adds cognitive overhead-- I would just inline affected.version_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think this was some left-over from a previous iteration where I was doing something weirder.

Comment on lines 883 to 884
'old_version_id': affected_version_pk,
'new_version_id': affected_version_pk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'old_version_id': affected_version_pk,
'new_version_id': affected_version_pk
'old_version_id': affected.version_id,
'new_version_id': affected.version_id,

(this goes with my previous suggestion)

Comment on lines 464 to 465
By default, this will also publish all dependencies (e.g. children) of the
Drafts that are passed in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, this will also publish all dependencies (e.g. children) of the
Drafts that are passed in.
By default, this will also publish all dependencies (e.g. unpinned children) of the
Drafts that are passed in.

Took me a minute to realize why this function wasn't specially handling pinned children.

@ormsbee
Copy link
Contributor Author

ormsbee commented Nov 3, 2025

@kdmccormick: Addressed all comments except the EntityList.entitylistrow refactoring, which I made a separate ticket for.

Co-authored-by: Kyle McCormick <kyle@kylemccormick.me>
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I tested the backfill, tested that publish side affects percolate upwards, and tested the reverse migration.

Notes for future work, none of it critical for Ulmo AFAICT:

  • We should get rip out any code which checked for a container's unpublished changes the old way (walking down the tree), if there is any of that.
  • I think it could be good to add more cross-links in the admin interface, in support of devs and operators who are browsing around and trying to understand how the log changes and entities are related to one another. I didn't have the time to enumerate these places.
  • As we discussed earlier, I think a lot of this code could become more readable if common abstract mixins were factored out from DraftChange{Log,LogRecord,SideEffect} and Publish{Log,LogRecord,SideEffect}.

@ormsbee ormsbee merged commit 72d082c into openedx:main Nov 7, 2025
11 checks passed
@ormsbee ormsbee deleted the simpler-side-effects branch November 7, 2025 00:59
kdmccormick added a commit to openedx/edx-platform that referenced this pull request Nov 7, 2025
This pulls in publishing dependency changes from:
openedx/openedx-learning#369

This fixes a bug where publishing a Content Library v2 container would
publish only its direct children instead of publishing all ancestors.

Co-authored-by: Kyle McCormick <kyle@axim.org>
kdmccormick added a commit to ormsbee/edx-platform that referenced this pull request Nov 7, 2025
This pulls in publishing dependency changes from:
openedx/openedx-learning#369

This fixes a bug where publishing a Content Library v2 container would
publish only its direct children instead of publishing all ancestors.

Backports: 190a8b8

Co-authored-by: Kyle McCormick <kyle@axim.org>
kdmccormick added a commit to openedx/edx-platform that referenced this pull request Nov 7, 2025
This pulls in publishing dependency changes from:
openedx/openedx-learning#369

This fixes a bug where publishing a Content Library v2 container would
publish only its direct children instead of publishing all ancestors.

Backports: 190a8b8

Co-authored-by: Kyle McCormick <kyle@axim.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants