Skip to content

Introduce orm for task tag#353

Merged
PGijsbers merged 9 commits into
mainfrom
introduce-orm-for-tag
Jul 2, 2026
Merged

Introduce orm for task tag#353
PGijsbers merged 9 commits into
mainfrom
introduce-orm-for-tag

Conversation

@PGijsbers

Copy link
Copy Markdown
Contributor

Description

Start using ORM layer for simple things, first up: tags.

@PGijsbers PGijsbers changed the title Introduce orm for tag Introduce orm for task tag Jul 2, 2026
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@PGijsbers, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 44 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5b8f31db-839d-4e29-8374-c1b132c02fcd

📥 Commits

Reviewing files that changed from the base of the PR and between 24bf030 and 9541aab.

📒 Files selected for processing (3)
  • src/database/setups.py
  • src/routers/dependencies.py
  • tests/conftest.py

Walkthrough

This PR replaces SQLAlchemy Row typing throughout database and router modules with a new UntypedRow alias, adds ORM schema modules (Base, ExpDBReflected, UserDBReflected, Tag, TaskTag) plus a reflect_db_schemas() startup hook invoked in main.py, and refactors task tagging (get_tags/tag) to use AsyncSession and ORM operations instead of raw SQL. It adds a new expdb_session dependency, removes optional defaults from several router dependency parameters (making them required), makes User._database optional with a guarded get_groups(), hardens IntegrityError handling in dataset tagging, and updates test fixtures/tests accordingly.

Changes

Area Change
Typing Replaced Row/RowMapping with UntypedRow across datasets, evaluations, flows, runs, setups, studies, tasks, formatting, and run/study routers
ORM schema New src/database/schema/base.py (Base, ExpDBReflected, UserDBReflected, reflect_db_schemas) and src/database/schema/tags.py (Tag, TaskTag)
Startup main.py invokes reflect_db_schemas() in lifespan
Task tagging get_tags/tag in src/database/tasks.py migrated to ORM + AsyncSession; IntegrityError guard for e.orig is None in datasets
Dependencies New expdb_session() dependency; task/study/dataset/run router endpoints made expdb/user_db required (no None default)
User model User._database optional; get_groups() caches and raises RuntimeError if database is missing
Tests New expdb_session fixture, autouse php_api, updated task_tag_test.py, tests/users.py, and runs_get_test.py
Tooling Added sqlalchemy to mirrors-mypy pre-commit dependencies

Sequence Diagram(s)

sequenceDiagram
  participant TagTaskEndpoint
  participant DatabaseTasks
  participant AsyncSession
  participant TaskTag
  TagTaskEndpoint->>DatabaseTasks: tag(id_, tag_, user_id, session)
  DatabaseTasks->>TaskTag: construct TaskTag instance
  DatabaseTasks->>AsyncSession: session.add(task_tag)
  DatabaseTasks->>AsyncSession: session.flush()
  AsyncSession-->>DatabaseTasks: IntegrityError on duplicate
Loading

Possibly related PRs

  • openml/server-api#219: Both PRs change how row results are typed/cast in src/database/evaluations.py and src/database/flows.py.
  • openml/server-api#246: Both PRs modify src/database/setups.py's get/get_tags functions and their row-return typing.
  • openml/server-api#350: Both PRs modify the tag(...)/get_tags(...) behavior and signatures in src/database/tasks.py, including IntegrityError handling.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main change: introducing ORM support for task tags.
Description check ✅ Passed The description is clearly related to the change and correctly notes the initial ORM rollout for tags.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch introduce-orm-for-tag

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="src/database/users.py" line_range="138-139" />
<code_context>
         return None

     async def get_groups(self) -> list[UserGroup]:
-        if self._groups is None:
-            self._groups = await get_user_groups_for(
-                user_id=self.user_id,
-                connection=self._database,
-            )
+        if self._groups:
+            return self._groups
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Caching of user groups breaks for users without groups.

Because `[]` is falsy, this guard causes users with no groups to query the DB on every `get_groups` call instead of caching the empty list. Using a `None` sentinel preserves the distinction between “not yet loaded” and “loaded but empty`:

```python
async def get_groups(self) -> list[UserGroup]:
    if self._groups is not None:
        return self._groups

    if self._database is None:
        msg = "`get_groups` can only be used when `connection` is provided on instantiation."
        raise RuntimeError(msg)

    self._groups = await get_user_groups_for(
        user_id=self.user_id,
        connection=self._database,
    )
    return self._groups
```

This avoids redundant queries for users without groups while keeping the caching behavior intact.
</issue_to_address>

### Comment 2
<location path="src/routers/openml/runs.py" line_range="181-183" />
<code_context>
         setup_id=run.setup,
         setup_string=ctx.setup.setup_string if ctx.setup else None,
         parameter_setting=[
-            ParameterSetting(name=p["name"], value=p["value"], component=p["flow_id"])
+            ParameterSetting(name=p.name, value=p.value, component=p.flow_id)
             for p in ctx.parameter_rows
         ],
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Switching from mapping-style access to attribute-style access on ORM rows may be brittle.

Using `p.name` / `p.value` / `p.flow_id` relies on `UntypedRow` consistently exposing these as attributes, which can be fragile if column labels change or collide. Since this feeds public response construction, consider keeping mapping-style access, or ensure the upstream query defines explicit, stable labels so attribute names remain reliable.

```suggestion
        parameter_setting=[
            ParameterSetting(name=p["name"], value=p["value"], component=p["flow_id"])
            for p in ctx.parameter_rows
        ],
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/database/users.py
Comment thread src/routers/openml/runs.py
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.48936% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.54%. Comparing base (8adb636) to head (9541aab).

Files with missing lines Patch % Lines
src/database/users.py 57.14% 2 Missing and 1 partial ⚠️
src/database/datasets.py 60.00% 1 Missing and 1 partial ⚠️
src/database/setups.py 60.00% 1 Missing and 1 partial ⚠️
src/database/tasks.py 90.47% 1 Missing and 1 partial ⚠️
src/database/schema/tags.py 93.33% 1 Missing ⚠️
src/database/studies.py 85.71% 1 Missing ⚠️
src/routers/openml/study.py 66.66% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (91.48%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #353      +/-   ##
==========================================
- Coverage   95.17%   94.54%   -0.64%     
==========================================
  Files          74       76       +2     
  Lines        3710     3775      +65     
  Branches      244      248       +4     
==========================================
+ Hits         3531     3569      +38     
- Misses        114      139      +25     
- Partials       65       67       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/conftest.py (1)

140-150: 🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Monkeypatched routers.dependencies.expdb_connection is never restored — leaks into later tests.

Line 140 reassigns the module attribute routers.dependencies.expdb_connection = override_expdb, but the teardown at lines 149-150 only resets app.dependency_overrides[...], not the module attribute itself. After this fixture's teardown, routers.dependencies.expdb_connection permanently points to override_expdb, which closes over a stale, already-rolled-back expdb_test connection from this test. Any later call to expdb_session() (which reads the module-level expdb_connection at call time) — in a later test or even in production code paths sharing this process — will use that dead connection.

🛠️ Proposed fix: restore the module attribute on teardown
     app.dependency_overrides[expdb_connection] = expdb_connection
     app.dependency_overrides[userdb_connection] = userdb_connection
+    routers.dependencies.expdb_connection = expdb_connection

Note: refactoring expdb_session() to take connection: Annotated[AsyncConnection, Depends(expdb_connection)] (flagged separately in src/routers/dependencies.py) would remove the need for this monkeypatch entirely.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/conftest.py` around lines 140 - 150, The test fixture leaves the
monkeypatched routers.dependencies.expdb_connection in place, so later tests can
pick up a stale override. In the fixture that sets
routers.dependencies.expdb_connection = override_expdb, add teardown to restore
the original module-level expdb_connection after the async client context exits,
alongside the existing app.dependency_overrides cleanup. Keep the fix localized
to this fixture and use the expdb_connection and override_expdb symbols to
ensure the module attribute is reset reliably.
🧹 Nitpick comments (4)
src/routers/openml/datasets.py (2)

335-352: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

_get_dataset_raise_otherwise still typed as Row[Any] instead of UntypedRow.

The rest of this cohort (e.g. study.py's _get_study_raise_otherwise) migrated equivalent helpers to the new UntypedRow alias, but this helper's return type annotation is left as Row[Any].

♻️ Suggested fix
+from database.schema.base import UntypedRow
+
 async def _get_dataset_raise_otherwise(
     dataset_id: Identifier,
     user: User | None,
     expdb: AsyncConnection,
-) -> Row[Any]:
+) -> UntypedRow:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/routers/openml/datasets.py` around lines 335 - 352, Update
`_get_dataset_raise_otherwise` to match the cohort’s new typing by replacing the
`Row[Any]` return annotation with `UntypedRow`. Keep the function logic
unchanged, and ensure any related imports in `datasets.py` are adjusted so the
helper aligns with `_get_study_raise_otherwise` and other migrated helpers.

356-362: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Stale assert ... is not None left over from optional-dependency era.

expdb/user_db/expdb_db are now required Depends(...) parameters (no = None default), so these asserts can never fail and are dead code. list_datasets above already had its analogous assert removed when the parameter became required; these two spots were missed.

🧹 Suggested cleanup
 async def get_dataset_features(
     dataset_id: Identifier,
     expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
     user: Annotated[User | None, Depends(fetch_user)] = None,
 ) -> list[Feature]:
-    assert expdb is not None  # noqa: S101
     await _get_dataset_raise_otherwise(dataset_id, user, expdb)
 async def get_dataset(
     dataset_id: Identifier,
     user_db: Annotated[AsyncConnection, Depends(userdb_connection)],
     expdb_db: Annotated[AsyncConnection, Depends(expdb_connection)],
     user: Annotated[User | None, Depends(fetch_user)] = None,
 ) -> DatasetMetadata:
-    assert user_db is not None  # noqa: S101
-    assert expdb_db is not None  # noqa: S101
     dataset = await _get_dataset_raise_otherwise(dataset_id, user, expdb_db)

Also applies to: 453-461

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/routers/openml/datasets.py` around lines 356 - 362, Remove the stale
non-null assertion in get_dataset_features, since expdb is now a required
Depends parameter and the assert can never fail. Apply the same cleanup to the
analogous required-dependency check in the other affected router function
mentioned in the review (the one using user_db/expdb_db), following the pattern
already used in list_datasets.
src/routers/openml/study.py (1)

66-72: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Stale assert expdb is not None left over from optional-dependency era.

expdb is now a required Depends(expdb_connection) parameter in all three endpoints (no = None default), so these asserts are unreachable dead code. Same pattern flagged in src/routers/openml/datasets.py.

🧹 Suggested cleanup
 async def attach_to_study(
     ...
     expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
 ) -> AttachDetachResponse:
-    assert expdb is not None  # noqa: S101
     if user is None:
 async def create_study(
     ...
     expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
 ) -> dict[Literal["study_id"], int]:
-    assert expdb is not None  # noqa: S101
     if study.main_entity_type == StudyType.RUN and study.tasks:
 async def get_study(
     alias_or_id: Identifier | str,
     expdb: Annotated[AsyncConnection, Depends(expdb_connection)],
     user: Annotated[User | None, Depends(fetch_user)] = None,
 ) -> Study:
-    assert expdb is not None  # noqa: S101
     study = await _get_study_raise_otherwise(alias_or_id, user, expdb)

Also applies to: 121-126, 158-163

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/routers/openml/study.py` around lines 66 - 72, Remove the stale `assert
expdb is not None` checks from `attach_to_study` and the other OpenML endpoints
using the same `expdb_connection` dependency pattern, since `expdb` is now
always required and the asserts are dead code. Keep the
`Depends(expdb_connection)` parameter on each handler and clean up the redundant
null-checks in the affected `study.py` and matching `datasets.py` endpoint
functions.
src/database/datasets.py (1)

113-121: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

Good defensive fix for e.orig is None.

Prevents an AttributeError when unpacking e.orig.args if orig is unexpectedly None. Note the same code, msg = e.orig.args unpacking pattern is likely present in flows.tag, runs.tag, and setups.tag (per graph references) — consider applying the same guard there for consistency if not already covered.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/database/datasets.py` around lines 113 - 121, Add the same defensive null
check used in datasets.py before unpacking `e.orig.args` in the other
IntegrityError handlers mentioned by the graph references, especially in
`flows.tag`, `runs.tag`, and `setups.tag`. Locate the matching exception blocks
that do `code, msg = e.orig.args`, guard against `e.orig is None` there as well,
and keep the existing ForeignKeyConstraintError/DuplicatePrimaryKeyError mapping
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/database/users.py`:
- Around line 139-141: The caching check in the user groups lookup is using a
truthy test, which causes an already-cached empty list to be treated as missing.
Update the groups accessor in the user model to use an explicit None check on
the cached field (for example, in the method that returns self._groups) so an
empty list is returned from cache instead of re-fetching or raising again.

---

Outside diff comments:
In `@tests/conftest.py`:
- Around line 140-150: The test fixture leaves the monkeypatched
routers.dependencies.expdb_connection in place, so later tests can pick up a
stale override. In the fixture that sets routers.dependencies.expdb_connection =
override_expdb, add teardown to restore the original module-level
expdb_connection after the async client context exits, alongside the existing
app.dependency_overrides cleanup. Keep the fix localized to this fixture and use
the expdb_connection and override_expdb symbols to ensure the module attribute
is reset reliably.

---

Nitpick comments:
In `@src/database/datasets.py`:
- Around line 113-121: Add the same defensive null check used in datasets.py
before unpacking `e.orig.args` in the other IntegrityError handlers mentioned by
the graph references, especially in `flows.tag`, `runs.tag`, and `setups.tag`.
Locate the matching exception blocks that do `code, msg = e.orig.args`, guard
against `e.orig is None` there as well, and keep the existing
ForeignKeyConstraintError/DuplicatePrimaryKeyError mapping unchanged.

In `@src/routers/openml/datasets.py`:
- Around line 335-352: Update `_get_dataset_raise_otherwise` to match the
cohort’s new typing by replacing the `Row[Any]` return annotation with
`UntypedRow`. Keep the function logic unchanged, and ensure any related imports
in `datasets.py` are adjusted so the helper aligns with
`_get_study_raise_otherwise` and other migrated helpers.
- Around line 356-362: Remove the stale non-null assertion in
get_dataset_features, since expdb is now a required Depends parameter and the
assert can never fail. Apply the same cleanup to the analogous
required-dependency check in the other affected router function mentioned in the
review (the one using user_db/expdb_db), following the pattern already used in
list_datasets.

In `@src/routers/openml/study.py`:
- Around line 66-72: Remove the stale `assert expdb is not None` checks from
`attach_to_study` and the other OpenML endpoints using the same
`expdb_connection` dependency pattern, since `expdb` is now always required and
the asserts are dead code. Keep the `Depends(expdb_connection)` parameter on
each handler and clean up the redundant null-checks in the affected `study.py`
and matching `datasets.py` endpoint functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0780e173-dcec-4820-935c-f5780119637f

📥 Commits

Reviewing files that changed from the base of the PR and between 8adb636 and 24bf030.

📒 Files selected for processing (23)
  • .pre-commit-config.yaml
  • src/core/formatting.py
  • src/database/datasets.py
  • src/database/evaluations.py
  • src/database/flows.py
  • src/database/runs.py
  • src/database/schema/__init__.py
  • src/database/schema/base.py
  • src/database/schema/tags.py
  • src/database/setups.py
  • src/database/studies.py
  • src/database/tasks.py
  • src/database/users.py
  • src/main.py
  • src/routers/dependencies.py
  • src/routers/openml/datasets.py
  • src/routers/openml/runs.py
  • src/routers/openml/study.py
  • src/routers/openml/tasks.py
  • tests/conftest.py
  • tests/routers/openml/runs_get_test.py
  • tests/routers/openml/task_tag_test.py
  • tests/users.py

Comment thread src/database/users.py
@PGijsbers

Copy link
Copy Markdown
Contributor Author

Code coverage decrease explained by:

  • introducing a check on presence of exception (e.orig is None). Other than forcing it with some mock, I wouldn't even know what scenario it would occur in.
  • some uncovered lines of code that were moved
  • some files becoming smaller (decreasing % of file but having stable number of lines covered)

@PGijsbers PGijsbers merged commit eb59ffd into main Jul 2, 2026
6 of 9 checks passed
@PGijsbers PGijsbers deleted the introduce-orm-for-tag branch July 2, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant