Refactor entity models and enhance document metadata management#64
Conversation
- Moved `NoActiveRevisionsError` import to `include.exceptions.misc` for better organization. - Updated import paths for `Document`, `DocumentRevision`, and `Folder` to reflect new structure in `obj.py`. - Created `base.py` for shared base functionality among entity models. - Introduced `metadata.py` for document metadata handling. - Added `obj.py` to define core entity models including `Document`, `DocumentRevision`, `Folder`, and their access rules. - Implemented `batch_count_other_revisions` in `count.py` for centralized reference counting. - Refactored `purge.py` to utilize new counting method. - Adjusted `document.py` and `search.py` to align with new import paths.
Reviewer's GuideRefactors shared entity model logic into a new BaseObject, extracts batch file-reference counting into a reusable utility, and introduces a full document metadata subsystem (creator, last-modified, tags) with API, permissions, migrations, and tests. ER diagram for new document metadata modelserDiagram
Document {
string id PK
string title
}
DocumentMetadata {
string document_id PK
string creator_username
string last_modified_by_username
}
DocumentMetadataTag {
string document_id PK
string tag PK
int position
}
User {
string username PK
}
Document ||--o| DocumentMetadata : has_metadata
DocumentMetadata ||--o{ DocumentMetadataTag : has_tags
User ||--o{ DocumentMetadata : created_by
User ||--o{ DocumentMetadata : last_modified_by
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
| else: | ||
| raise ValueError('the value of "match" must be "all" or "any"') | ||
|
|
||
| def match_primary_sub_group(per_match_group): |
|
|
||
| require_auth = True | ||
|
|
||
| def handle(self, handler: ConnectionHandler): |
There was a problem hiding this comment.
Hey - I've found 2 issues
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="src/include/handlers/document.py" line_range="1139" />
<code_context>
+ {"tags": normalized_tags},
+ "Document metadata tags updated successfully",
+ )
+ return 0, document_id, {"tags": normalized_tags}, handler.username
</code_context>
<issue_to_address>
**issue (bug_risk):** The return tuple shape for RequestSetDocumentMetadataTagsHandler is inconsistent with other handlers and may break upstream consumers.
This handler returns a 4‑tuple `(0, document_id, {"tags": normalized_tags}, handler.username)` while others in this module return a 3‑tuple like `(code, document_id, username)`. If callers assume a fixed 3‑tuple structure or a specific index for `username`, this can cause runtime errors or incorrect logging. Please match the existing convention (e.g. `return 0, document_id, handler.username`) and rely on `conclude_request` to convey `tags` in the response body.
</issue_to_address>
### Comment 2
<location path="tests/test_documents.py" line_range="41" />
<code_context>
data = assert_success(response)
assert isinstance(data, dict)
+ @pytest.mark.asyncio
+ async def test_document_metadata_tags(
+ self, authenticated_client: CFMSTestClient, test_document: dict
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test to verify that document metadata is only returned when the caller has VIEW_METADATA permission
Right now the tests only cover an admin/sysop user who has `Permissions.VIEW_METADATA`. Please also add a case with a user who lacks `VIEW_METADATA` (but still has read access) and assert that `get_document_info` either omits `metadata` or returns the sanitized form, per the expected behavior, to guard against metadata access-control regressions.
Suggested implementation:
```python
@pytest.mark.asyncio
async def test_document_metadata_tags(
self, authenticated_client: CFMSTestClient, test_document: dict
):
document_id = test_document["document_id"]
info_response = await authenticated_client.get_document_info(document_id)
info = assert_success(info_response)
assert info["metadata"] == {
"tags": [],
"creator": "admin",
"last_modified_by": "admin",
}
@pytest.mark.asyncio
async def test_document_metadata_hidden_without_view_metadata_permission(
self,
read_only_client: CFMSTestClient,
test_document: dict,
):
"""
A user who can read the document but does not have VIEW_METADATA must
not see full metadata in get_document_info.
"""
document_id = test_document["document_id"]
info_response = await read_only_client.get_document_info(document_id)
info = assert_success(info_response)
# Depending on implementation, either metadata is omitted entirely,
# or present but sanitized. Adjust expectations to match behavior.
if "metadata" in info:
# Minimal / sanitized form – adjust keys/values as appropriate
assert isinstance(info["metadata"], dict)
# Example: tags may be visible but creator/last_modified_by hidden
assert "tags" in info["metadata"]
assert "creator" not in info["metadata"]
assert "last_modified_by" not in info["metadata"]
else:
# Metadata is fully hidden from callers without VIEW_METADATA
assert "metadata" not in info
```
To make this test pass and accurately reflect your access-control semantics, you may need to:
1. Ensure there is a `read_only_client` (or similarly named) fixture in your test suite that:
- Authenticates as a non-admin user.
- Has read access to `test_document`.
- Explicitly does **not** have the `VIEW_METADATA` permission.
If the fixture has a different name (e.g. `authenticated_reader_client`, `user_client`, etc.), update the test signature accordingly.
2. Align the assertions in the test with the actual behavior of `get_document_info` for users without `VIEW_METADATA`:
- If your implementation omits `metadata` entirely, simplify the test to `assert "metadata" not in info`.
- If your implementation returns a sanitized `metadata` object, replace the example checks inside the `if "metadata" in info:` block with the exact expected shape (keys and values) for that sanitized form.
3. If your permission model requires explicitly granting read permission to the document for the non-admin user (e.g. sharing or group membership), ensure that is done in the fixture or a helper invoked by the fixture so that `get_document_info` returns a successful response instead of a 403/404.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| {"tags": normalized_tags}, | ||
| "Document metadata tags updated successfully", | ||
| ) | ||
| return 0, document_id, {"tags": normalized_tags}, handler.username |
There was a problem hiding this comment.
issue (bug_risk): The return tuple shape for RequestSetDocumentMetadataTagsHandler is inconsistent with other handlers and may break upstream consumers.
This handler returns a 4‑tuple (0, document_id, {"tags": normalized_tags}, handler.username) while others in this module return a 3‑tuple like (code, document_id, username). If callers assume a fixed 3‑tuple structure or a specific index for username, this can cause runtime errors or incorrect logging. Please match the existing convention (e.g. return 0, document_id, handler.username) and rely on conclude_request to convey tags in the response body.
| data = assert_success(response) | ||
| assert isinstance(data, dict) | ||
|
|
||
| @pytest.mark.asyncio |
There was a problem hiding this comment.
suggestion (testing): Add a test to verify that document metadata is only returned when the caller has VIEW_METADATA permission
Right now the tests only cover an admin/sysop user who has Permissions.VIEW_METADATA. Please also add a case with a user who lacks VIEW_METADATA (but still has read access) and assert that get_document_info either omits metadata or returns the sanitized form, per the expected behavior, to guard against metadata access-control regressions.
Suggested implementation:
@pytest.mark.asyncio
async def test_document_metadata_tags(
self, authenticated_client: CFMSTestClient, test_document: dict
):
document_id = test_document["document_id"]
info_response = await authenticated_client.get_document_info(document_id)
info = assert_success(info_response)
assert info["metadata"] == {
"tags": [],
"creator": "admin",
"last_modified_by": "admin",
}
@pytest.mark.asyncio
async def test_document_metadata_hidden_without_view_metadata_permission(
self,
read_only_client: CFMSTestClient,
test_document: dict,
):
"""
A user who can read the document but does not have VIEW_METADATA must
not see full metadata in get_document_info.
"""
document_id = test_document["document_id"]
info_response = await read_only_client.get_document_info(document_id)
info = assert_success(info_response)
# Depending on implementation, either metadata is omitted entirely,
# or present but sanitized. Adjust expectations to match behavior.
if "metadata" in info:
# Minimal / sanitized form – adjust keys/values as appropriate
assert isinstance(info["metadata"], dict)
# Example: tags may be visible but creator/last_modified_by hidden
assert "tags" in info["metadata"]
assert "creator" not in info["metadata"]
assert "last_modified_by" not in info["metadata"]
else:
# Metadata is fully hidden from callers without VIEW_METADATA
assert "metadata" not in infoTo make this test pass and accurately reflect your access-control semantics, you may need to:
-
Ensure there is a
read_only_client(or similarly named) fixture in your test suite that:- Authenticates as a non-admin user.
- Has read access to
test_document. - Explicitly does not have the
VIEW_METADATApermission.
If the fixture has a different name (e.g.authenticated_reader_client,user_client, etc.), update the test signature accordingly.
-
Align the assertions in the test with the actual behavior of
get_document_infofor users withoutVIEW_METADATA:- If your implementation omits
metadataentirely, simplify the test toassert "metadata" not in info. - If your implementation returns a sanitized
metadataobject, replace the example checks inside theif "metadata" in info:block with the exact expected shape (keys and values) for that sanitized form.
- If your implementation omits
-
If your permission model requires explicitly granting read permission to the document for the non-admin user (e.g. sharing or group membership), ensure that is done in the fixture or a helper invoked by the fixture so that
get_document_inforeturns a successful response instead of a 403/404.
… and RequestSetDocumentMetadataTagsHandler
…ocumentTagsHandler for consistency
Summary by Sourcery
Refactor entity models and introduce persistent document metadata with tag management and permission-controlled access.
New Features:
Enhancements:
Tests: