Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 70 additions & 58 deletions src/openedx_content/applets/collections/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,67 +2,73 @@
Core models for Collections

TLDR Guidelines:

1. DO NOT modify these models to store full version snapshots.
2. DO NOT use these models to try to reconstruct historical versions of
Collections for fast querying.

If you're trying to do either of these things, you probably want a new model or
app. For more details, read on.
If you're trying to do either of these things, you probably want a new model
or app. For more details, read on.

The goal of these models is to provide a lightweight method of organizing
PublishableEntities. The first use case for this is modeling the structure of a
v1 Content Library within a LearningPackage. This is what we'll use the
Collection model for.
:class:`PublishableEntity` objects. The first use case for this is modeling
the structure of a v1 Content Library within a :class:`LearningPackage`.
This is what we'll use the :class:`Collection` model for.

An important thing to note here is that Collections are *NOT* publishable
entities themselves. They have no "Draft" or "Published" versions. Collections
are never "published", though the things inside of them are.
entities themselves. They have no "Draft" or "Published" versions.
Collections are never "published", though the things inside of them are.

When a LibraryContentBlock makes use of a Content Library, it copies all of
the items it will use into the Course itself. It will also store a version
on the LibraryContentBlock -- this is a MongoDB ObjectID in v1 and an integer in
v2 Libraries. Later on, the LibraryContentBlock will want to check back to see
if any updates have been made, using its version as a key. If a new version
exists, the course team has the option of re-copying data from the Library.
on the LibraryContentBlock -- this is a MongoDB ObjectID in v1 and an
integer in v2 Libraries. Later on, the LibraryContentBlock will want to
check back to see if any updates have been made, using its version as a
key. If a new version exists, the course team has the option of re-copying
data from the Library.

ModuleStore-based v1 Libraries and OeXCore-based v2 libraries both version
the entire library in a series of snapshots. This makes it difficult to have
very large libraries, which is an explicit goal for Modular Learning. In
Open edX Core, we've moved to tracking the versions of individual Components to
address this issue. But that means we no longer have a single version indicator
for "has anything here changed"?

We *could* have put that version in the ``publishing`` app's PublishLog, but
that would make it too broad. We want the ability to eventually collapse many v1
Libraries into a single OeXCore backed v2 Library. If we tracked the
versioning in only a central location, then we'd have many false positives where
the version was bumped because something else in the Learning Package changed.
So instead, we're creating a new Collection model inside the LearningPackage to
track that concept.

A critical takeaway is that we don't have to store snapshots of every version of
a Collection, because that data has been copied over by the LibraryContentBlock.
We only need to store the current state of the Collection, and increment the
version numbers when changes happen. This will allow the LibraryContentBlock to
check in and re-copy over the latest version if the course team desires.

That's why these models only store the current state of a Collection. Unlike the
``components`` app, ``collections`` does not store fully materialized snapshots
of past versions. This is done intentionally in order to save space and reduce
the cost of writes. Collections may grow to be very large, and we don't want to
be writing N rows with every version, where N is the number of
PublishableEntities in a Collection.

MVP of these models does not store changesets, but we can add this when there's a
use case for it. The number of rows in these changesets would grow in proportion
to the number of things that are actually changing (instead of copying over
everything on every version). This is could be used to make it easier to figure out
what changed between two given versions of a Collection. A LibraryContentBlock
in a course would have stored the version number of the last time it copied data
from the Collection, and we can eventually surface this data to the user. Note that
while it may be possible to reconstruct past versions of Collections based off of
this changeset data, it's going to be a very slow process to do so, and it is
strongly discouraged.
the entire library in a series of snapshots. This makes it difficult to
have very large libraries, which is an explicit goal for Modular Learning.
In Open edX Core, we've moved to tracking the versions of individual
Components to address this issue. But that means we no longer have a single
version indicator for "has anything here changed"?

We *could* have put that version in the :mod:`publishing` app's
:class:`PublishLog`, but that would make it too broad. We want the ability
to eventually collapse many v1 Libraries into a single OeXCore backed v2
Library. If we tracked the versioning in only a central location, then we'd
have many false positives where the version was bumped because something
else in the Learning Package changed. So instead, we're creating a new
:class:`Collection` model inside the :class:`LearningPackage` to track that
concept.

A critical takeaway is that we don't have to store snapshots of every
version of a Collection, because that data has been copied over by the
LibraryContentBlock. We only need to store the current state of the
Collection, and increment the version numbers when changes happen. This
will allow the LibraryContentBlock to check in and re-copy over the latest
version if the course team desires.

That's why these models only store the current state of a Collection.
Unlike the :mod:`components` app, :mod:`collections` does not store fully
materialized snapshots of past versions. This is done intentionally in
order to save space and reduce the cost of writes. Collections may grow to
be very large, and we don't want to be writing N rows with every version,
where N is the number of :class:`PublishableEntity` objects in a
Collection.

MVP of these models does not store changesets, but we can add this when
there's a use case for it. The number of rows in these changesets would
grow in proportion to the number of things that are actually changing
(instead of copying over everything on every version). This is could be
used to make it easier to figure out what changed between two given
versions of a Collection. A LibraryContentBlock in a course would have
stored the version number of the last time it copied data from the
Collection, and we can eventually surface this data to the user. Note that
while it may be possible to reconstruct past versions of Collections based
off of this changeset data, it's going to be a very slow process to do so,
and it is strongly discouraged.
"""
from __future__ import annotations

Expand All @@ -83,11 +89,12 @@

class CollectionManager(models.Manager):
"""
Custom manager for Collection class.
Custom manager for :class:`Collection`.
"""
def get_by_code(self, learning_package_id: int, collection_code: str):
"""
Get the Collection for the given Learning Package + collection code.
Get the :class:`Collection` for the given :class:`LearningPackage` +
collection code.
"""
return self.select_related('learning_package') \
.get(learning_package_id=learning_package_id, collection_code=collection_code)
Expand All @@ -101,16 +108,21 @@ class Collection(models.Model):

id = models.AutoField(primary_key=True)

# Each collection belongs to a learning package
learning_package = models.ForeignKey(LearningPackage, on_delete=models.CASCADE)
"""
Each collection belongs to a :class:`LearningPackage`.
"""

# Every collection is uniquely and permanently identified within its learning package
# by a 'code' that is set during creation. Both will appear in the
# collection's opaque key:
# e.g. "lib-collection:{org_code}:{library_code}:{collection_code}"
# is the opaque key for a library collection.
# TODO: Consider supporting unicode https://github.com/openedx/openedx-platform/issues/38413
# TODO: Consider supporting unicode
# https://github.com/openedx/openedx-platform/issues/38413
collection_code = code_field(unicode=False)
"""
Every collection is uniquely and permanently identified within its
learning package by a 'code' that is set during creation. Both will
appear in the collection's opaque key: e.g.
``lib-collection:{org_code}:{library_code}:{collection_code}`` is the
opaque key for a library collection.
"""

title = case_insensitive_char_field(
null=False,
Expand Down Expand Up @@ -204,7 +216,7 @@ def __str__(self) -> str:

class CollectionPublishableEntity(models.Model):
"""
Collection -> PublishableEntity association.
:class:`Collection` → :class:`PublishableEntity` association.
"""
collection = models.ForeignKey(
Collection,
Expand Down
Loading