-
Notifications
You must be signed in to change notification settings - Fork 17
Publishing dependency tracking and side-effect calculation + dependency state summary for published and draft #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
72688d7 to
51511f6
Compare
|
@bradenmacdonald, @kdmccormick: Based on your feedback today, I reworked the admin page entries to look like this:
It's not quite obvious, but the version information for the draft is italicized if it's different from the published version + dependency state. |
0b452cf to
9323705
Compare
Create the PublishSideEffect model as the publishing analog for the DraftSideEffect model, and create these side effects for parent containers when one of the children is published. This allows us to efficiently query when any subtree of content was affected by a publish. Also introduces PublishableEntityVersionDependency as a way of tracking unpinned dependencies of a version, e.g. the children of a version of a Unit. Also introduces a new dependencies_hash_digest field to Draft and Published models to track dependency state. This PR adds Django admin functionality for this new data, as well as optimizing some existing calls that used to have to do more complex container -> entity list traversal.
9323705 to
2e9cc4c
Compare
|
@ormsbee Cool, I think that's a bit more useful and clear. |
|
@kdmccormick, @bradenmacdonald: I still have to write more tests, but I need to put this down for a day or two to attend to some long-overdue reviews. The code should be in a reviewable state, minus some union-related type annotation stuff that I have to look into later. Please review if you have time. |
|
Annoying late night thought: It sounds like we may at some point elevate the state summary hash to something that external clients would use, e.g. for history comparison or caching purposes. But it's not stored in the history, since it's calculated on the Draft and Published tables. But it doesn't have to be--since any change to a Draft or a Published (even side-effects) has to be represented in the change logs, that means that we could put those hash summaries on the DraftChangeLogRecord and PublishChangeLogRecord instead. That way, you could see where a particular state was in the history of that thing, and what caused it to become that way. |
|
@ormsbee I think it's a great idea |
|
Before locking ourselves into UUIDs, I'm still curious if there are any other ways to generate a dependency hash. A bad idea (just writing it down for posterity)What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package? For example, in pseudocode: I think this would work OK with libraries as they are today. But, they would not capture any data that is hung off of the LC models in the future. It would break down as soon as someone, for example, made a custom Container subclass with some critical metadata on the ContainerVersion subclass... updating that metadata would fail to modify the dependency hash. Maybe this could be resolved by allowing PE subclasses to register custom hashing functions? Probably not worth it. A half-baked ideaWhat about just hashing just the PEV by its key and version number? We already guarantee that (key, version number) uniquely identifies a PEV in a package, so it should be sufficient input for a hash whose job is to capture whether any changes have happened to PE's dependencies (which are in that same package). (key, version_num) only breaks down across Open edX instances, but I don't think we need to worry about that for the dependency hash, right? |
I know you state this is a bad idea, but I do want to explicitly highlight the tradeoffs we're making here for other people to read later, because in many ways this sort of hashing would be the gold standard for how to do this. As you point out, it would require anyone extending the data models to properly calculate a hash for their extended data. Not only would we need to create API hooks for that, it's asking a lot of people to get that hashing right. Instead of this, both the UUID approach in this PR and the (identifier + version num) approach suggested below try to reduce the burden on developers by taking the shortcut that any Going to reply to the (key, version) comment in a bit... |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing, just a couple quick thoughts so far.
openedx_learning/apps/authoring/publishing/models/publishable_entity.py
Outdated
Show resolved
Hide resolved
openedx_learning/apps/authoring/publishing/models/publishable_entity.py
Outdated
Show resolved
Hide resolved
|
What's left to do before we can merge this PR? We need the full descendant publishing to fix some libraries bugs - see #421 |
openedx_learning/apps/authoring/publishing/models/publishable_entity.py
Outdated
Show resolved
Hide resolved
|
@bradenmacdonald: I did a major shift over of where I'm storing the hashes (onto the log records), but I'm still chasing down some transaction related bugs with some very unhelpful traces. |
|
Addressed all comments so far. |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two more minor comments. the models all look good to me 👍🏻
currently reviewing the publishing-cascade, side-effect-calculation, and backfill code...
| @cached_property | ||
| def rows(self): | ||
| """ | ||
| Convenience method to iterate rows. | ||
| I'd normally make this the reverse lookup name for the EntityListRow -> | ||
| EntityList foreign key relation, but we already have references to | ||
| entitylistrow_set in various places, and I thought this would be better | ||
| than breaking compatibility. | ||
| """ | ||
| return self.entitylistrow_set.order_by("order_num") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] the only external references to entitylistrow_set are in a single test file in edx-platform. Since we're still <v1.0, imo now is the time to make these little breaking change that make the API nicer.
| it almost always will), then publishing one of those Components will alter | ||
| the published state of the Unit, even if the UnitVersion does not change. In | ||
| that case, we still consider the Unit to have been "published". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| it almost always will), then publishing one of those Components will alter | |
| the published state of the Unit, even if the UnitVersion does not change. In | |
| that case, we still consider the Unit to have been "published". | |
| it almost always will), then publishing one of those Components will alter | |
| the published state of the Unit, even if the UnitVersion does not change. |
[optional] IMO what we "consider to have been have been published" is a fuzzy UX-level concept that is subject to change as the product evolves. I suggest focusing on what's literally happening in the data model, and let the UX decide what it means for something to "have been published" from a user's perspective.
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still reviewing)
| ] | ||
|
|
||
| # These are the Published or Draft objects where we need to repoint the | ||
| # log_record (publish_log_record or draft_change_log) to point to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # log_record (publish_log_record or draft_change_log) to point to the | |
| # log_record (publish_log_record or draft_change_log_record) to point to the |
| branch_objs_to_update_with_side_effects = [] | ||
|
|
||
| while changes_and_affected: | ||
| original_change, affected = changes_and_affected.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this inner variable to something like change or cause, because:
- If we take "original change" to mean "a change coming directly from the change_log" (as is implied by
affected_by_original_change), then I think anything at the 2nd layer of affect or lower is no longer "original". - It's shadowing the outer loop's
original_change, which is confusing. I don't think it makes the code any more complicated for the inner loop (the one that descends the layers) to have its own variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at most of the code. As always, thanks for the careful docs, they've helped a lot. I'd like to give it at least one more pass early next week.
| # If the Draft or Published that is affected by this change is not | ||
| # already in the change_log, then we have to add it. | ||
| affected_version_pk = affected.version_id | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # If the Draft or Published that is affected by this change is not | |
| # already in the change_log, then we have to add it. | |
| affected_version_pk = affected.version_id |
I think this comment is already covered (better) by your comment right above defaults={ and the extra variable here adds cognitive overhead-- I would just inline affected.version_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think this was some left-over from a previous iteration where I was doing something weirder.
| 'old_version_id': affected_version_pk, | ||
| 'new_version_id': affected_version_pk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 'old_version_id': affected_version_pk, | |
| 'new_version_id': affected_version_pk | |
| 'old_version_id': affected.version_id, | |
| 'new_version_id': affected.version_id, |
(this goes with my previous suggestion)
| By default, this will also publish all dependencies (e.g. children) of the | ||
| Drafts that are passed in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| By default, this will also publish all dependencies (e.g. children) of the | |
| Drafts that are passed in. | |
| By default, this will also publish all dependencies (e.g. unpinned children) of the | |
| Drafts that are passed in. |
Took me a minute to realize why this function wasn't specially handling pinned children.
|
@kdmccormick: Addressed all comments except the |
Co-authored-by: Kyle McCormick <kyle@kylemccormick.me>
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I tested the backfill, tested that publish side affects percolate upwards, and tested the reverse migration.
Notes for future work, none of it critical for Ulmo AFAICT:
- We should get rip out any code which checked for a container's unpublished changes the old way (walking down the tree), if there is any of that.
- I think it could be good to add more cross-links in the admin interface, in support of devs and operators who are browsing around and trying to understand how the log changes and entities are related to one another. I didn't have the time to enumerate these places.
- As we discussed earlier, I think a lot of this code could become more readable if common abstract mixins were factored out from DraftChange{Log,LogRecord,SideEffect} and Publish{Log,LogRecord,SideEffect}.
This pulls in publishing dependency changes from: openedx/openedx-learning#369 This fixes a bug where publishing a Content Library v2 container would publish only its direct children instead of publishing all ancestors. Co-authored-by: Kyle McCormick <kyle@axim.org>
This pulls in publishing dependency changes from: openedx/openedx-learning#369 This fixes a bug where publishing a Content Library v2 container would publish only its direct children instead of publishing all ancestors. Backports: 190a8b8 Co-authored-by: Kyle McCormick <kyle@axim.org>
This pulls in publishing dependency changes from: openedx/openedx-learning#369 This fixes a bug where publishing a Content Library v2 container would publish only its direct children instead of publishing all ancestors. Backports: 190a8b8 Co-authored-by: Kyle McCormick <kyle@axim.org>

Status: Data models and fundamental logic are implemented. I need to clean up a lot and add tests and some admin debug screens. I also need to do the data migration for existing containers data.
This PR introduces a publishing app concept of dependency tracking, as well as adding a
PublishingSideEffectanalog of the existingDraftSideEffect. The high-level goal is to be able to efficiently do operations like, "publish this entity and all its children" or answer queries like "when was the last time the published state of this entity changed (including its children)?".The
publish_from_draftsfunction has been modified to publish all dependencies.Follow-on to #328, broadly implements what was discussed in #317, though I found it simpler to model all dependencies at the PublishableEntity level, rather than splitting them out into separate Draft and Published dependencies. That being said, the
dependencies_hash_digestmust be computed separately, since the published and draft versions of any given dependency can be different at any given time.