Skip to content

Refactor entity models and enhance document metadata management#64

Merged
Creeper19472 merged 12 commits into
masterfrom
feat-metadata
Jun 16, 2026
Merged

Refactor entity models and enhance document metadata management#64
Creeper19472 merged 12 commits into
masterfrom
feat-metadata

Conversation

@Creeper19472

@Creeper19472 Creeper19472 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary by Sourcery

Refactor entity models and introduce persistent document metadata with tag management and permission-controlled access.

New Features:

  • Add document metadata and tag models linked to documents, including creator and last-modified tracking.
  • Expose document metadata (tags, creator, last_modified_by) via the document info API when the caller has metadata view permission.
  • Introduce a handler and client API to set document metadata tags with validation and dedicated permissions.
  • Enable SQLite foreign key enforcement at connection time to honor relational constraints.

Enhancements:

  • Extract the shared BaseObject entity logic into a dedicated module and centralize batch file-reference counting utilities.
  • Automatically create metadata records for existing and initial documents and update last-modified metadata on key document and revision operations.
  • Extend Alembic migrations to create and evolve document metadata tables and grant related permissions to the sysop group.
  • Clarify contributor guidance on generating Alembic revisions in AGENTS documentation.

Tests:

  • Add integration tests covering document metadata exposure, tag updates, and permission enforcement in the document handlers and client API.

- Moved `NoActiveRevisionsError` import to `include.exceptions.misc` for better organization.
- Updated import paths for `Document`, `DocumentRevision`, and `Folder` to reflect new structure in `obj.py`.
- Created `base.py` for shared base functionality among entity models.
- Introduced `metadata.py` for document metadata handling.
- Added `obj.py` to define core entity models including `Document`, `DocumentRevision`, `Folder`, and their access rules.
- Implemented `batch_count_other_revisions` in `count.py` for centralized reference counting.
- Refactored `purge.py` to utilize new counting method.
- Adjusted `document.py` and `search.py` to align with new import paths.
@sourcery-ai

sourcery-ai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Refactors shared entity model logic into a new BaseObject, extracts batch file-reference counting into a reusable utility, and introduces a full document metadata subsystem (creator, last-modified, tags) with API, permissions, migrations, and tests.

ER diagram for new document metadata models

erDiagram
    Document {
        string id PK
        string title
    }

    DocumentMetadata {
        string document_id PK
        string creator_username
        string last_modified_by_username
    }

    DocumentMetadataTag {
        string document_id PK
        string tag PK
        int position
    }

    User {
        string username PK
    }

    Document ||--o| DocumentMetadata : has_metadata
    DocumentMetadata ||--o{ DocumentMetadataTag : has_tags
    User ||--o{ DocumentMetadata : created_by
    User ||--o{ DocumentMetadata : last_modified_by
Loading

File-Level Changes

Change Details Files
Refactor entity models to use a shared BaseObject and split entity definitions into separate modules.
  • Extract BaseObject (status fields and check_access_requirements) from the monolithic entity model into include/database/models/entity/base.py
  • Move Folder, Document, and related ORM mappings into include/database/models/entity/obj.py and update imports accordingly
  • Replace direct references to BaseObject in entity code with imports from the new base module, introducing TYPE_CHECKING guards for forward references
src/include/database/models/entity.py
src/include/database/models/entity/obj.py
src/include/database/models/entity/base.py
Extract batch file-reference counting logic into a reusable bulk utility and update call sites.
  • Move _batch_count_other_revisions implementation into new include/util/bulk/count.py as batch_count_other_revisions
  • Update Document.delete_all_revisions and purge_documents_bulk to call the new utility function instead of the in-model helper
src/include/database/models/entity/obj.py
src/include/util/bulk/purge.py
src/include/util/bulk/count.py
Add a document metadata model (creator, last modified by, tags) and wire it to Document.
  • Introduce DocumentMetadata and DocumentMetadataTag ORM models with relationships to Document and User, including ordered tag list and cascade delete
  • Add a one-to-one metadata_record relationship on Document and ensure the initial seeded document gets a metadata record
  • Expose metadata models to Alembic env and module exports
src/include/database/models/entity/metadata.py
src/include/database/models/entity/obj.py
src/alembic/env.py
src/main.py
src/include/database/models/entity/__init__.py
Expose document metadata (including tags, creator, last modified) through handlers and enforce permissions.
  • Add helper functions to create/serialize/mark document metadata in document handler
  • Include metadata in get_document_info responses when the caller has VIEW_METADATA
  • Initialize metadata (creator and last_modified_by) on document creation and update last_modified_by on modifications such as upload, delete, rename, move, access-rule changes, revision changes, and document restore
src/include/handlers/document.py
src/include/handlers/revision.py
Implement a new API for setting document metadata tags with access checks and client support.
  • Add Permissions.VIEW_METADATA and Permissions.SET_METADATA_TAGS and grant them to sysop in server_init and the Alembic migration
  • Introduce RequestSetDocumentMetadataTagsHandler with validation, deduplication/normalization, permission checks, and ordered tag persistence
  • Register the new request type in the router and add a CFMSTestClient helper and tests for metadata behavior and permission enforcement
src/include/classes/enum/permissions.py
src/include/handlers/document.py
src/include/router.py
tests/test_documents.py
tests/test_client.py
src/alembic/versions/a50674184a2c_document_metadata.py
src/main.py
Improve SQLite behavior by enforcing foreign key constraints at connection time.
  • Register a SQLAlchemy engine connect event for SQLite that enables PRAGMA foreign_keys=ON
src/include/database/handler.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

else:
raise ValueError('the value of "match" must be "all" or "any"')

def match_primary_sub_group(per_match_group):

require_auth = True

def handle(self, handler: ConnectionHandler):

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="src/include/handlers/document.py" line_range="1139" />
<code_context>
+                {"tags": normalized_tags},
+                "Document metadata tags updated successfully",
+            )
+            return 0, document_id, {"tags": normalized_tags}, handler.username
</code_context>
<issue_to_address>
**issue (bug_risk):** The return tuple shape for RequestSetDocumentMetadataTagsHandler is inconsistent with other handlers and may break upstream consumers.

This handler returns a 4‑tuple `(0, document_id, {"tags": normalized_tags}, handler.username)` while others in this module return a 3‑tuple like `(code, document_id, username)`. If callers assume a fixed 3‑tuple structure or a specific index for `username`, this can cause runtime errors or incorrect logging. Please match the existing convention (e.g. `return 0, document_id, handler.username`) and rely on `conclude_request` to convey `tags` in the response body.
</issue_to_address>

### Comment 2
<location path="tests/test_documents.py" line_range="41" />
<code_context>
         data = assert_success(response)
         assert isinstance(data, dict)

+    @pytest.mark.asyncio
+    async def test_document_metadata_tags(
+        self, authenticated_client: CFMSTestClient, test_document: dict
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test to verify that document metadata is only returned when the caller has VIEW_METADATA permission

Right now the tests only cover an admin/sysop user who has `Permissions.VIEW_METADATA`. Please also add a case with a user who lacks `VIEW_METADATA` (but still has read access) and assert that `get_document_info` either omits `metadata` or returns the sanitized form, per the expected behavior, to guard against metadata access-control regressions.

Suggested implementation:

```python
    @pytest.mark.asyncio
    async def test_document_metadata_tags(
        self, authenticated_client: CFMSTestClient, test_document: dict
    ):
        document_id = test_document["document_id"]

        info_response = await authenticated_client.get_document_info(document_id)
        info = assert_success(info_response)
        assert info["metadata"] == {
            "tags": [],
            "creator": "admin",
            "last_modified_by": "admin",
        }

    @pytest.mark.asyncio
    async def test_document_metadata_hidden_without_view_metadata_permission(
        self,
        read_only_client: CFMSTestClient,
        test_document: dict,
    ):
        """
        A user who can read the document but does not have VIEW_METADATA must
        not see full metadata in get_document_info.
        """
        document_id = test_document["document_id"]

        info_response = await read_only_client.get_document_info(document_id)
        info = assert_success(info_response)

        # Depending on implementation, either metadata is omitted entirely,
        # or present but sanitized. Adjust expectations to match behavior.
        if "metadata" in info:
            # Minimal / sanitized form – adjust keys/values as appropriate
            assert isinstance(info["metadata"], dict)
            # Example: tags may be visible but creator/last_modified_by hidden
            assert "tags" in info["metadata"]
            assert "creator" not in info["metadata"]
            assert "last_modified_by" not in info["metadata"]
        else:
            # Metadata is fully hidden from callers without VIEW_METADATA
            assert "metadata" not in info

```

To make this test pass and accurately reflect your access-control semantics, you may need to:

1. Ensure there is a `read_only_client` (or similarly named) fixture in your test suite that:
   - Authenticates as a non-admin user.
   - Has read access to `test_document`.
   - Explicitly does **not** have the `VIEW_METADATA` permission.
   If the fixture has a different name (e.g. `authenticated_reader_client`, `user_client`, etc.), update the test signature accordingly.

2. Align the assertions in the test with the actual behavior of `get_document_info` for users without `VIEW_METADATA`:
   - If your implementation omits `metadata` entirely, simplify the test to `assert "metadata" not in info`.
   - If your implementation returns a sanitized `metadata` object, replace the example checks inside the `if "metadata" in info:` block with the exact expected shape (keys and values) for that sanitized form.

3. If your permission model requires explicitly granting read permission to the document for the non-admin user (e.g. sharing or group membership), ensure that is done in the fixture or a helper invoked by the fixture so that `get_document_info` returns a successful response instead of a 403/404.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

{"tags": normalized_tags},
"Document metadata tags updated successfully",
)
return 0, document_id, {"tags": normalized_tags}, handler.username

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The return tuple shape for RequestSetDocumentMetadataTagsHandler is inconsistent with other handlers and may break upstream consumers.

This handler returns a 4‑tuple (0, document_id, {"tags": normalized_tags}, handler.username) while others in this module return a 3‑tuple like (code, document_id, username). If callers assume a fixed 3‑tuple structure or a specific index for username, this can cause runtime errors or incorrect logging. Please match the existing convention (e.g. return 0, document_id, handler.username) and rely on conclude_request to convey tags in the response body.

Comment thread tests/test_documents.py
data = assert_success(response)
assert isinstance(data, dict)

@pytest.mark.asyncio

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test to verify that document metadata is only returned when the caller has VIEW_METADATA permission

Right now the tests only cover an admin/sysop user who has Permissions.VIEW_METADATA. Please also add a case with a user who lacks VIEW_METADATA (but still has read access) and assert that get_document_info either omits metadata or returns the sanitized form, per the expected behavior, to guard against metadata access-control regressions.

Suggested implementation:

    @pytest.mark.asyncio
    async def test_document_metadata_tags(
        self, authenticated_client: CFMSTestClient, test_document: dict
    ):
        document_id = test_document["document_id"]

        info_response = await authenticated_client.get_document_info(document_id)
        info = assert_success(info_response)
        assert info["metadata"] == {
            "tags": [],
            "creator": "admin",
            "last_modified_by": "admin",
        }

    @pytest.mark.asyncio
    async def test_document_metadata_hidden_without_view_metadata_permission(
        self,
        read_only_client: CFMSTestClient,
        test_document: dict,
    ):
        """
        A user who can read the document but does not have VIEW_METADATA must
        not see full metadata in get_document_info.
        """
        document_id = test_document["document_id"]

        info_response = await read_only_client.get_document_info(document_id)
        info = assert_success(info_response)

        # Depending on implementation, either metadata is omitted entirely,
        # or present but sanitized. Adjust expectations to match behavior.
        if "metadata" in info:
            # Minimal / sanitized form – adjust keys/values as appropriate
            assert isinstance(info["metadata"], dict)
            # Example: tags may be visible but creator/last_modified_by hidden
            assert "tags" in info["metadata"]
            assert "creator" not in info["metadata"]
            assert "last_modified_by" not in info["metadata"]
        else:
            # Metadata is fully hidden from callers without VIEW_METADATA
            assert "metadata" not in info

To make this test pass and accurately reflect your access-control semantics, you may need to:

  1. Ensure there is a read_only_client (or similarly named) fixture in your test suite that:

    • Authenticates as a non-admin user.
    • Has read access to test_document.
    • Explicitly does not have the VIEW_METADATA permission.
      If the fixture has a different name (e.g. authenticated_reader_client, user_client, etc.), update the test signature accordingly.
  2. Align the assertions in the test with the actual behavior of get_document_info for users without VIEW_METADATA:

    • If your implementation omits metadata entirely, simplify the test to assert "metadata" not in info.
    • If your implementation returns a sanitized metadata object, replace the example checks inside the if "metadata" in info: block with the exact expected shape (keys and values) for that sanitized form.
  3. If your permission model requires explicitly granting read permission to the document for the non-admin user (e.g. sharing or group membership), ensure that is done in the fixture or a helper invoked by the fixture so that get_document_info returns a successful response instead of a 403/404.

@Creeper19472 Creeper19472 merged commit 71a0cf0 into master Jun 16, 2026
6 checks passed
@Creeper19472 Creeper19472 deleted the feat-metadata branch June 16, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant