diff --git a/.github/agents/architecture-domain-specialist.md b/.github/agents/architecture-domain-specialist.md index 8bb7d5bf8b..33b4570296 100644 --- a/.github/agents/architecture-domain-specialist.md +++ b/.github/agents/architecture-domain-specialist.md @@ -131,6 +131,7 @@ Marshmallow schemas define the canonical format for parameter dictionaries. All - [ ] **Dictionary access**: Verify code uses dict keys from `data_key`, not Python attributes - [ ] **Parameter modification**: Check `pop()`, `del`, assignment operations use correct keys - [ ] **Storage consistency**: Ensure DataSource.attributes, job.meta use schema format +- [ ] **Schema parity**: When adding a filter/parameter to `Sensor.search_beliefs`, verify it is added to BOTH `Input` (io.py) AND `BeliefsSearchConfigSchema` (reporting/__init__.py). These two schemas serve overlapping purposes but are distinct classes β€” omitting one creates a silent gap where documented features silently fail at schema validation time. **Domain Pattern: Schema Format Migrations** @@ -588,3 +589,18 @@ After each assignment: Change: - Added guidance on ``` + +### Lessons Learned + +**Session 2026-03-24 (PR #2058 β€” add account_id to DataSource)**: + +- **New domain invariant**: User-type DataSources now have `account_id` populated. Document new FK relationships and invariants in the Domain Knowledge section immediately. +- **Migration checklist**: Added an explicit Alembic migration checklist after reviewing the migration for this PR. Key patterns: correlated subquery for bulk backfill, SQLAlchemy Core stubs (no ORM imports), `batch_alter_table` for all ALTER operations, exact constraint name matching. +- **Missed API Specialist coordination**: The PR changed endpoint behavior (POST sensor data sets account_id on the created data source). The API Specialist should have been engaged to verify backward compatibility. When domain model changes affect how endpoints behave or what they return, flag for API Specialist review. +- **Self-improvement failure**: Despite having explicit self-improvement requirements, no agent updated its instructions during this PR session. This was caught by the Coordinator post-hoc. The agent must update its own instructions as the LAST step of every assignment, not skip it. + +**Session 2026-04 (PR #2065 β€” add account_id filter to search_beliefs)**: + +- **Schema parity gap**: The PR added `account_id` to `BeliefsSearchConfigSchema` but not to `Input` (io.py). These two schemas both expose `Sensor.search_beliefs` parameters; omitting a parameter from one creates a silent gap. The architecture agent must check both schemas on any search_beliefs parameter addition. +- **Documentation vs. implementation mismatch**: The `reporting.rst` docs stated reporters can filter by `account_id`, but this only works if `Input` also has the field. Docs that outrun schema support mislead users. Always verify the full schema chain before documenting a feature. +- **DataSource account_id=None for non-user sources**: The existing invariant (reporters/schedulers/forecasters have `account_id=None`) limits the usefulness of `account_id` filtering: it only matches user-type sources. PRs adding `account_id` filters should either document this limitation explicitly or reconsider the invariant. diff --git a/.github/agents/coordinator.md b/.github/agents/coordinator.md index b287670ab4..bdfe744c62 100644 --- a/.github/agents/coordinator.md +++ b/.github/agents/coordinator.md @@ -366,30 +366,30 @@ Agents should escalate to the Coordinator when: - Encourage agent autonomy and expertise - Provide actionable feedback via review comments -### Review Lead Delegation Pattern Monitoring +### Lead Delegation Pattern Monitoring -**The Coordinator MUST verify Review Lead delegation patterns during governance reviews.** +**The Coordinator MUST verify Lead delegation patterns during governance reviews.** -**Context:** Review Lead has a recurring failure mode of working solo instead of delegating to specialists (observed in session 2026-02-08). +**Context:** Lead has a recurring failure mode of working solo instead of delegating to specialists (observed in session 2026-02-08). **What to check:** -When reviewing a session where Review Lead was involved: +When reviewing a session where Lead was involved: -- [ ] **Delegation occurred**: Did Review Lead invoke appropriate specialists? -- [ ] **No solo execution**: Did Review Lead make code/API/docs changes itself? -- [ ] **Git commit author check**: Are there Review Lead commits with production code? -- [ ] **Request interpretation**: Did Review Lead parse user intent correctly? +- [ ] **Delegation occurred**: Did Lead invoke appropriate specialists? +- [ ] **No solo execution**: Did Lead make code/API/docs changes itself? +- [ ] **Git commit author check**: Are there Lead commits with production code? +- [ ] **Request interpretation**: Did Lead parse user intent correctly? - [ ] **Regression indicators**: Any signs of "too simple to delegate" thinking? **Red flags (immediate governance concern):** -- 🚩 Review Lead commits containing code changes (should be specialist commits) -- 🚩 Review Lead commits containing test changes (should be Test Specialist) -- 🚩 Review Lead commits containing doc changes (should be Documentation Specialist) +- 🚩 Lead commits containing code changes (should be specialist commits) +- 🚩 Lead commits containing test changes (should be Test Specialist) +- 🚩 Lead commits containing doc changes (should be Documentation Specialist) - 🚩 User says "You are regressing" or "You must handle requests as a team" - 🚩 Session closed without specialist involvement on implementation tasks -- 🚩 Review Lead justifies solo work with "too simple to delegate" +- 🚩 Lead justifies solo work with "too simple to delegate" **Verification commands:** @@ -397,27 +397,27 @@ When reviewing a session where Review Lead was involved: # Check who made commits git log --oneline --all --since="1 day ago" --format="%h %an %s" -# Check Review Lead commit types -git log --author="Review Lead" --oneline -10 +# Check Lead commit types +git log --author="Lead" --oneline -10 -# Look for code changes by Review Lead (should be empty or synthesis only) -git log --author="Review Lead" --stat -5 +# Look for code changes by Lead (should be empty or synthesis only) +git log --author="Lead" --stat -5 ``` **When delegation failure detected:** 1. **Document in session review** - What was the failure? -2. **Check Review Lead instructions** - Were they followed? +2. **Check Lead instructions** - Were they followed? 3. **Identify gap** - What prevented proper delegation? 4. **Recommend fix** - How to prevent recurrence? -5. **Update Review Lead instructions** - Add enforcement mechanism +5. **Update Lead instructions** - Add enforcement mechanism 6. **Verify fix works** - Test with hypothetical scenario **Escalation pattern:** -If Review Lead repeatedly violates delegation requirements: +If Lead repeatedly violates delegation requirements: - This is a systemic issue requiring Coordinator intervention -- Review Lead instructions need stronger enforcement +- Lead instructions need stronger enforcement - Consider adding mandatory checkpoints before work execution - May need explicit blockers to prevent solo execution @@ -425,20 +425,20 @@ If Review Lead repeatedly violates delegation requirements: | Pattern | Indicator | Action | |---------|-----------|--------| -| Solo execution | Review Lead makes code commits | Flag as regression | -| "Too simple" trap | Review Lead justifies not delegating | Update instructions with example | -| Request misinterpretation | Review Lead confirms instead of implements | Strengthen request parsing guidance | +| Solo execution | Lead makes code commits | Flag as regression | +| "Too simple" trap | Lead justifies not delegating | Update instructions with example | +| Request misinterpretation | Lead confirms instead of implements | Strengthen request parsing guidance | | Delegation omission | Specialists not invoked on implementation | Verify Session Close Checklist followed | **Success indicators:** -- βœ… Review Lead invoked appropriate specialists +- βœ… Lead invoked appropriate specialists - βœ… Specialists made the actual changes -- βœ… Review Lead synthesized findings +- βœ… Lead synthesized findings - βœ… Team-based execution pattern maintained - βœ… Session Close Checklist verified delegation -**This monitoring ensures Review Lead maintains its orchestration role and doesn't regress to solo execution.** +**This monitoring ensures Lead maintains its orchestration role and doesn't regress to solo execution.** ## Self-Improvement Notes @@ -625,25 +625,25 @@ Lead should now invoke Coordinator as subagent. - Governance process shown to be optional (dangerous precedent) **Solution implemented**: -1. βœ… Added mandatory "Session Close Checklist" to Review Lead (commit 3ad8908) +1. βœ… Added mandatory "Session Close Checklist" to Lead (commit 3ad8908) 2. βœ… Added "Full Test Suite Requirement" to Test Specialist (commit 8d67f3c) 3. βœ… Added "Pre-commit Hook Enforcement" to Tooling & CI Specialist (commit dfe67e8) 4. βœ… Added "Session Close Verification" pattern to Coordinator (this commit) **Structural changes**: -- Review Lead now has comprehensive checklist before closing any session +- Lead now has comprehensive checklist before closing any session - Test Specialist must execute full suite, not just feature-specific tests - Tooling & CI Specialist must verify pre-commit execution -- Coordinator enforces Review Lead checklist completion +- Coordinator enforces Lead checklist completion **New Coordinator pattern (Pattern #7)**: When invoked for governance review, Coordinator must verify: -- [ ] Review Lead followed session close checklist +- [ ] Lead followed session close checklist - [ ] No checklist items were skipped without justification - [ ] Evidence provided for each checklist item **Enforcement escalation**: -If Review Lead repeatedly closes sessions without completing checklist: +If Lead repeatedly closes sessions without completing checklist: 1. First occurrence: Document and update instructions (this session) 2. Second occurrence: Require explicit justification for skips 3. Third occurrence: Escalate to architectural solution (automated checks) @@ -658,11 +658,63 @@ If Review Lead repeatedly closes sessions without completing checklist: These patterns must not repeat. Agent instructions have been updated to prevent recurrence. +### Additional Pattern Discovered (2026-03-24) + +**Pattern**: Persistent self-improvement failure and missing API Specialist agent selection + +**Session**: PR #2058 β€” Add `account_id` to DataSource table + +**Observation**: After three sessions now, the same two failures recur: +1. Coordinator is not invoked at end of session (despite MUST requirement in Lead instructions) +2. No agent updates its own instructions (despite MUST requirement in all agents) + +**Root cause analysis**: +- "Coordinator invocation" and "self-improvement" are both documented as mandatory last steps +- But the session ends before they are reached β€” they are treated as optional epilogue, not gating requirements +- The Lead agent selection is ad-hoc, with no explicit checklist forcing API Specialist engagement when endpoints change + +**What was missed in PR #2058**: +- API Specialist not engaged: POST sensor data now sets `account_id` on the resulting data source β€” this is an endpoint behavior change that should be reviewed for backward compatibility +- Zero agent instruction updates across all three participating agents (Architecture Specialist, Test Specialist, Lead) +- No Coordinator invocation despite explicit user request in the original prompt + +**Solutions implemented**: +- Architecture Specialist: Added Alembic migration checklist + DataSource domain invariants +- Test Specialist: Added DataSource property testing pattern + lessons learned +- Lead: Added Agent Selection Checklist mapping code change types to required agents; documented 3rd recurrence of these failures +- Coordinator (this file): Documented case study + +**Governance escalation**: The Lead's "Must Always Run Coordinator" requirement has now been documented in three sessions without being followed. If it fails a fourth time, consider structural changes β€” e.g., making Coordinator invocation the FIRST step of a session rather than the last, so it sets context rather than being a forgotten epilogue. + +**Code observation from PR #2058 worth tracking**: +- An early draft used `user.account_id or (user.account.id if user.account else None)` β€” the `or` pattern is fragile for `account_id=0` (unrealistic but worth noting). The final implementation correctly uses `if user.account_id is not None` (see `data_sources.py` lines 340-343) β€” this is the right pattern to follow. +- Empty "Initial plan" commit adds git history noise. When orchestrating agents, the first commit should be functional code, not a planning marker. + +### Additional Pattern Discovered (2026-03-25) + +**Pattern**: No-FK columns for data lineage preservation + +**Session**: PR #2058 continued β€” Drop FK constraints on `data_source.user_id` and `data_source.account_id` + +**Design decision documented**: +FlexMeasures now intentionally drops DB-level FK constraints on `DataSource.user_id` and `DataSource.account_id` so that historical lineage references survive user/account deletion. The ORM uses `passive_deletes="all"` to prevent auto-nullification. + +**Checklist implication for future PRs**: +When reviewing schema changes that affect FK constraints: +- [ ] If a FK is dropped intentionally for lineage: verify `passive_deletes="all"` on the ORM relationship AND its backref +- [ ] Verify tests check that the orphaned column values are NOT nullified after parent deletion +- [ ] Verify changelog describes the *behavior change* (lineage preservation), not just the schema change (column added) + +**Changelog completeness check** β€” lessons from this session: +- The initial changelog entry for PR #2058 only described adding `account_id`; it omitted the FK drop and behavior change +- When a migration both adds a column AND changes deletion semantics (e.g., drops a FK), the changelog must cover BOTH aspects +- Coordinator caught this and updated the entry to read: "...also drop FK constraints on `data_source.user_id` and `data_source.account_id` to preserve data lineage (historical user/account IDs are no longer nullified when users or accounts are deleted)" + ### Session 2026-02-10: Annotation API Implementation (#470) **Pattern**: Systemic self-improvement failure across all agents -**Observation**: Five agents completed substantial work (Architecture, API, Test, Documentation, Review Lead): +**Observation**: Five agents completed substantial work (Architecture, API, Test, Documentation, Lead): - Created new API endpoints (3 POST endpoints) - Wrote 17 comprehensive test functions - Created 494-line feature guide documentation @@ -679,17 +731,17 @@ These patterns must not repeat. Agent instructions have been updated to prevent **Root causes identified**: 1. **Self-improvement not enforced**: No blocking requirement, agents treat as optional 2. **Unclear triggers**: Agents don't know when to update instructions ("after completing work" too vague) -3. **No verification**: Review Lead doesn't check if agents self-improved +3. **No verification**: Lead doesn't check if agents self-improved 4. **Invisible requirement**: Self-improvement not in task completion checklist **Secondary violations observed**: - Temporary file committed (`API_REVIEW_ANNOTATIONS.md`, 575 lines) then removed - Non-atomic commits mixing multiple concerns - Test claims without execution evidence -- Review Lead didn't invoke Coordinator despite governance request +- Lead didn't invoke Coordinator despite governance request **Solution implemented**: -1. Added self-improvement enforcement to Review Lead checklist (see below) +1. Added self-improvement enforcement to Lead checklist (see below) 2. Documented temporary file prevention patterns 3. Added test execution evidence requirement 4. Strengthened Coordinator invocation triggers @@ -702,14 +754,14 @@ These patterns must not repeat. Agent instructions have been updated to prevent **Future sessions**: Monitor whether self-improvement enforcement works. If pattern recurs 3+ times, escalate to architectural solution (e.g., automated checks, mandatory prompts). -**Session 2026-02-10 (Annotation API Tests)**: Pattern recurred despite Review Lead update. Test Specialist fixed 32 annotation API tests (100% passing), made excellent technical commits, but did NOT update instructions with learned lessons (permission semantics, fixture selection, error code expectations). Review Lead enforcement unclearβ€”may not have been involved in session. **Status**: Pattern persists. Approaching threshold for architectural solution. +**Session 2026-02-10 (Annotation API Tests)**: Pattern recurred despite Lead update. Test Specialist fixed 32 annotation API tests (100% passing), made excellent technical commits, but did NOT update instructions with learned lessons (permission semantics, fixture selection, error code expectations). Lead enforcement unclearβ€”may not have been involved in session. **Status**: Pattern persists. Approaching threshold for architectural solution. ### Enforcement Mechanism Added -**New requirement for Review Lead**: Before marking task complete, verify: +**New requirement for Lead**: Before marking task complete, verify: ```markdown -## Task Completion Checklist (Review Lead) +## Task Completion Checklist (Lead) - [ ] Code review completed and feedback addressed - [ ] Security scan completed and alerts investigated @@ -719,10 +771,31 @@ These patterns must not repeat. Agent instructions have been updated to prevent - [ ] No temporary analysis files committed ``` -If any agent hasn't self-improved, Review Lead must: +If any agent hasn't self-improved, Lead must: 1. Request agent update their instructions 2. Wait for update 3. Review update for quality 4. Then mark task complete **This makes self-improvement blocking, not optional.** + +### Pattern: Schema Surface Parity (2026-04 β€” PR #2065) + +**Context**: PR #2065 adds `account_id` as a filter parameter to `Sensor.search_beliefs` / `TimedBelief.search` / `GenericAsset.search_beliefs`. + +**Gap discovered**: `account_id` was added to `BeliefsSearchConfigSchema` (used in `StatusSchema.staleness_search`) but NOT to `Input` in `flexmeasures/data/schemas/io.py` (used by reporters like `PandasReporter` and `AggregatorReporter` for their input parameter lists). The `reporting.rst` documentation stated reporters can filter by `account_id`, but users attempting to do so would receive a Marshmallow `ValidationError`. + +**Root cause**: Two Marshmallow schemas expose `Sensor.search_beliefs` parameters to users, but they are distinct classes maintained separately: +- `Input` (io.py): used for reporter/forecaster `input` parameter list +- `BeliefsSearchConfigSchema` (reporting/__init__.py): used for sensor status config + +**Coordinator rule added**: When any PR adds a parameter to `Sensor.search_beliefs`, the Coordinator should check that **all schemas exposing search_beliefs parameters** receive the same addition. Current list: +- `flexmeasures/data/schemas/io.py` β†’ `Input` (reporter parameters) +- `flexmeasures/data/schemas/reporting/__init__.py` β†’ `BeliefsSearchConfigSchema` (status config) + +**Agent coordination implication**: +- Architecture Specialist owns the schema parity checklist (added to its review checklist) +- Documentation Specialist must verify documentation examples are exercisable via the actual available schemas +- Test Specialist must cover all model classes that receive the new parameter + +**Additional gap**: `account_id` for non-user DataSources (reporters, schedulers, forecasters) remains `None`. The filter therefore only matches user-type sources. This architectural constraint (documented in Architecture Specialist) limits the feature's utility and should be prominently noted in documentation whenever `account_id` filtering is described. diff --git a/.github/agents/test-specialist.md b/.github/agents/test-specialist.md index 05d048758c..00ef4f5bbe 100644 --- a/.github/agents/test-specialist.md +++ b/.github/agents/test-specialist.md @@ -118,11 +118,11 @@ If ANY test fails during full suite execution: **Click context errors**: - Check IdField decorators (`@with_appcontext` vs `@with_appcontext_if_needed()`) - Compare against SensorIdField pattern -- See Review Lead's Click context error pattern +- See Lead's Click context error pattern -### Integration with Review Lead +### Integration with Lead -The Test Specialist MUST provide evidence of full test suite execution to Review Lead. +The Test Specialist MUST provide evidence of full test suite execution to Lead. **Required evidence format:** ``` @@ -134,14 +134,14 @@ Full test suite execution: - Coverage: 87.2% (unchanged) ``` -**Review Lead verification:** -Review Lead's session close checklist includes: +**Lead verification:** +Lead's session close checklist includes: - [ ] Test Specialist confirmed full test suite execution - [ ] All tests pass (100%) - [ ] Test output captured and reviewed **Enforcement:** -Review Lead cannot close session until Test Specialist provides evidence of full test suite execution with 100% pass rate. +Lead cannot close session until Test Specialist provides evidence of full test suite execution with 100% pass rate. ## Testing Patterns for flexmeasures @@ -880,6 +880,30 @@ flask db current - **Assertions**: Use descriptive assertion messages for failures - **Mocking**: Use pytest fixtures and mocking when testing external dependencies +### Testing DataSource Properties After API Calls + +When writing tests that verify data source properties (e.g. `account_id`, `user`, `type`) after an API call: + +1. **Use `fresh_db` fixture** β€” tests that POST data and then query the resulting data source are modifying the DB and must use the function-scoped `fresh_db` fixture. Place these tests in a `_fresh_db` module. + +2. **Query by user, not just name** β€” data sources created by the same user across test runs may collide; use `filter_by(user=user)` or `filter_by(user_id=user.id)` for precision. + +3. **Pattern** (from `test_post_sensor_data_sets_account_id_on_data_source`): + ```python + # Fetch the user that made the request + user = db.session.execute( + select(User).filter_by(email="test_supplier_user_4@seita.nl") + ).scalar_one() + # Fetch the data source created for that user + data_source = db.session.execute( + select(Source).filter_by(user=user) + ).scalar_one_or_none() + assert data_source is not None + assert data_source.account_id == user.account_id + ``` + +4. **Check both existence and value** β€” don't just assert `data_source is not None`; also assert the specific field value you're testing. + ## Understanding Test Design Intent (CRITICAL) **Before changing a test, understand WHY it's designed that way.** @@ -1060,14 +1084,14 @@ After each assignment: - Added guidance on ``` -Example: +### Lessons Learned -``` -agents/test-specialist: learned to verify claims with actual test runs -Context: -- Session #456 claimed tests passed but they were never actually run -- Led to bug slipping through to production -Change: -- Added "Actually Run Tests" section with verification steps -- Emphasized checking test output and coverage -``` \ No newline at end of file +**Session 2026-03-24 (PR #2058 β€” add account_id to DataSource)**: + +- **Self-improvement failure**: Despite having explicit instructions to update this agent file after each assignment, no update was made during this PR session. This was caught by the Coordinator post-hoc. The agent must treat instruction updates as the LAST mandatory step of any assignment. +- **DataSource property testing**: Added guidance in "Testing DataSource Properties After API Calls" above. When testing properties set by the API on a data source (like `account_id`), use `fresh_db`, query by user to avoid ambiguity, and assert both existence and the specific field value. + +**Session 2026-04 (PR #2065 β€” add account_id filter to search_beliefs)**: + +- **Model layer coverage gap**: The PR added `account_id` parameter to three model methods: `Sensor.search_beliefs`, `TimedBelief.search`, and `GenericAsset.search_beliefs`. Tests covered only the first two. When a parameter is added to multiple model-layer methods, each method must be tested independently β€” especially when they use different code paths (e.g., `GenericAsset.search_beliefs` delegates through `Sensor.search_beliefs` but may not if the delegation chain changes). Add a checklist item: "When a filter/param is added to `search_beliefs` across multiple models, ensure at least one test exercises each model class." +- **Empty-list edge case**: `account_id=[]` is schema-valid but semantically means "match nothing" (SQL `IN ()` returns zero rows). Tests should include this edge case and assert the expected empty result rather than letting it pass as a valid no-op. diff --git a/.github/agents/tooling-ci-specialist.md b/.github/agents/tooling-ci-specialist.md index fb9d178a2f..b30995e71c 100644 --- a/.github/agents/tooling-ci-specialist.md +++ b/.github/agents/tooling-ci-specialist.md @@ -163,12 +163,12 @@ Code that bypasses pre-commit: **Who runs pre-commit:** - **During code changes**: Agent making changes runs pre-commit before committing -- **Before PR close**: Review Lead verifies pre-commit execution +- **Before PR close**: Lead verifies pre-commit execution - **In PR review**: Tooling & CI Specialist validates config matches CI **Enforcement:** -- Review Lead's session close checklist includes pre-commit verification -- Review Lead cannot close session without pre-commit evidence +- Lead's session close checklist includes pre-commit verification +- Lead cannot close session without pre-commit evidence - If pre-commit fails, agent must fix all issues before proceeding #### Common Failures and Fixes @@ -206,9 +206,9 @@ black . ci/run_mypy.sh ``` -#### Integration with Review Lead +#### Integration with Lead -**Review Lead checklist items:** +**Lead checklist items:** - [ ] Pre-commit hooks installed - [ ] All hooks pass: `pre-commit run --all-files` - [ ] Zero failures from flake8, black, mypy @@ -219,7 +219,7 @@ ci/run_mypy.sh - Or confirm: "Pre-commit verified: all hooks passed" **Enforcement:** -Review Lead MUST verify pre-commit execution before closing session. +Lead MUST verify pre-commit execution before closing session. ### Agent Environment Setup diff --git a/AGENTS.md b/AGENTS.md index 57034f25fa..540d6fa319 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -98,7 +98,7 @@ This avoids "agent spam" and ensures unified results. ## Quick Navigation for Critical Sections -**Before starting ANY session, Review Lead MUST consult:** +**Before starting ANY session, Lead MUST consult:** 1. **Parse user intent** β†’ Section 1.1 (Request Interpretation) 2. **Check delegation requirements** β†’ Section 2.1 (Mandatory Delegation Triggers) @@ -139,7 +139,7 @@ The Lead: ### 1.1. Parse User Intent (FIRST STEP - ALWAYS DO THIS) -**Before selecting agents or doing ANY work, Review Lead MUST verify understanding.** +**Before selecting agents or doing ANY work, Lead MUST verify understanding.** This prevents misinterpreting requests and working on the wrong thing. @@ -191,11 +191,11 @@ Or do you want me to [Y] instead?" User: "migrate endpoints to /api/v3_0/accounts//annotations" ❌ **Wrong interpretation:** User wants confirmation of their migration -β†’ Review Lead confirms work, doesn't do migration +β†’ Lead confirms work, doesn't do migration β†’ User: "That was rather useless... you basically ignored my request" βœ… **Correct interpretation:** "migrate" = implementation verb = action request -β†’ Review Lead delegates to specialists to DO the migration +β†’ Lead delegates to specialists to DO the migration β†’ Test Specialist, API Specialist, Documentation Specialist all participate * * * @@ -226,9 +226,9 @@ Notably: ### 2.1. Delegation Requirements (NON-NEGOTIABLE) -**The Review Lead MUST NEVER work alone on implementation tasks.** +**The Lead MUST NEVER work alone on implementation tasks.** -This is the most critical anti-pattern to avoid: Review Lead working solo instead of delegating. +This is the most critical anti-pattern to avoid: Lead working solo instead of delegating. **Mandatory Delegation Triggers:** @@ -259,9 +259,9 @@ This is the most critical anti-pattern to avoid: Review Lead working solo instea - βœ… ALL endpoint changes β†’ Test + API + Documentation Specialists - βœ… ALL agent/process changes β†’ Coordinator governance -**Review Lead's role in implementation:** +**Lead's role in implementation:** -The Review Lead: +The Lead: - βœ… Orchestrates specialists - βœ… Synthesizes their findings - βœ… Manages coordination @@ -284,13 +284,13 @@ Correct pattern: - [ ] Test Specialist made code changes and verified tests βœ… - [ ] API Specialist reviewed backward compatibility βœ… - [ ] Documentation Specialist updated docs βœ… -- [ ] Review Lead synthesized findings βœ… +- [ ] Lead synthesized findings βœ… **Example from session 2026-02-08 (failure):** User: "migrate endpoints to /api/v3_0/accounts//annotations" -❌ **What Review Lead did:** +❌ **What Lead did:** - Migrated AccountAPI, AssetAPI, SensorAPI endpoints ALONE - Updated test URLs ALONE - Ran pre-commit hooks ALONE @@ -299,7 +299,7 @@ User: "migrate endpoints to /api/v3_0/accounts//annotations" ❌ **Result:** User: "You are regressing. You must handle my requests as a team" -βœ… **What Review Lead should have done:** +βœ… **What Lead should have done:** ```python # Delegate to Test Specialist task(agent_type="test-specialist", @@ -453,9 +453,9 @@ The coordinator will: ### Must Enforce Agent Self-Improvement (CRITICAL) -**The Review Lead MUST ensure all participating agents update their instructions.** +**The Lead MUST ensure all participating agents update their instructions.** -This is the Review Lead's responsibility from the coordinator's governance review. When agents complete work, the Review Lead must: +This is the Lead's responsibility from the coordinator's governance review. When agents complete work, the Lead must: #### 1. Identify Participating Agents After work is complete, identify which agents contributed: @@ -516,11 +516,11 @@ If an agent doesn't update or provides insufficient update: - Instructions stay current and relevant **Example from Session 2026-02-10:** -- 5 agents participated (Architecture, API, Test, Documentation, Review Lead) -- Only Review Lead updated instructions initially +- 5 agents participated (Architecture, API, Test, Documentation, Lead) +- Only Lead updated instructions initially - Coordinator flagged 100% failure rate -- Review Lead should have prompted all 4 other agents -- Review Lead should have verified updates before closing +- Lead should have prompted all 4 other agents +- Lead should have verified updates before closing ### Must Not Create PRs Prematurely @@ -993,13 +993,13 @@ Before completing an assignment and closing the session: ### Regression Prevention (CRITICAL) -**The Review Lead can backslide to solo execution mode.** +**The Lead can backslide to solo execution mode.** This is the primary failure pattern observed in session 2026-02-08. **What regression looks like:** -When Review Lead starts working alone instead of delegating to specialists: +When Lead starts working alone instead of delegating to specialists: - Writing code directly - Updating tests without Test Specialist - Modifying docs without Documentation Specialist @@ -1016,9 +1016,9 @@ When Review Lead starts working alone instead of delegating to specialists: **Regression indicators (how to detect):** -- 🚩 Review Lead making code commits (should be specialist commits) -- 🚩 Review Lead updating tests (should be Test Specialist) -- 🚩 Review Lead modifying docs (should be Documentation Specialist) +- 🚩 Lead making code commits (should be specialist commits) +- 🚩 Lead updating tests (should be Test Specialist) +- 🚩 Lead modifying docs (should be Documentation Specialist) - 🚩 User says "You are regressing" - 🚩 User says "You must handle my requests as a team" - 🚩 Session closes without specialist involvement @@ -1062,40 +1062,40 @@ Ask these questions before ANY work execution: **The correct workflow:** 1. User requests implementation -2. Review Lead parses intent (section 1.1) -3. Review Lead identifies required specialists (section 2.1) -4. **Review Lead delegates to specialists** ← THIS IS THE JOB +2. Lead parses intent (section 1.1) +3. Lead identifies required specialists (section 2.1) +4. **Lead delegates to specialists** ← THIS IS THE JOB 5. Specialists do the actual work -6. Review Lead synthesizes findings -7. Review Lead runs session close checklist +6. Lead synthesizes findings +7. Lead runs session close checklist **Example from session 2026-02-08 (regression case study):** **Request:** "migrate endpoints to /api/v3_0/accounts//annotations" -**What Review Lead did (WRONG):** +**What Lead did (WRONG):** ``` -βœ— Review Lead migrated AccountAPI endpoints -βœ— Review Lead updated AssetAPI endpoints -βœ— Review Lead modified SensorAPI endpoints -βœ— Review Lead changed test URLs -βœ— Review Lead ran pre-commit hooks +βœ— Lead migrated AccountAPI endpoints +βœ— Lead updated AssetAPI endpoints +βœ— Lead modified SensorAPI endpoints +βœ— Lead changed test URLs +βœ— Lead ran pre-commit hooks βœ— NO specialist involvement ``` **User response:** "You are regressing. You must handle my requests as a team" -**What Review Lead should have done (CORRECT):** +**What Lead should have done (CORRECT):** ``` -βœ“ Review Lead parsed intent: Implementation request -βœ“ Review Lead identified specialists needed: +βœ“ Lead parsed intent: Implementation request +βœ“ Lead identified specialists needed: - Test Specialist (test URL updates) - API Specialist (backward compatibility) - Documentation Specialist (doc updates) -βœ“ Review Lead delegated to each specialist +βœ“ Lead delegated to each specialist βœ“ Specialists did the actual work -βœ“ Review Lead synthesized findings +βœ“ Lead synthesized findings βœ“ Team-based execution ``` @@ -1103,7 +1103,7 @@ Ask these questions before ANY work execution: "Simple task" is a cognitive trap. **NO task is too simple to delegate.** -The Review Lead's job is orchestration, not execution. +The Lead's job is orchestration, not execution. ### Learning from Failures @@ -1154,7 +1154,7 @@ Track and document when the Lead: **Specific lesson learned (2026-02-10 follow-up)**: - **Session**: Implementing coordinator's governance review recommendations -- **Failure**: Review Lead updated own instructions but didn't ensure other agents did the same +- **Failure**: Lead updated own instructions but didn't ensure other agents did the same - **What went wrong**: Didn't take ownership of follow-through on coordinator recommendations - **Impact**: 4 out of 5 participating agents didn't update their instructions (80% failure rate) - **Root cause**: No enforcement mechanism; assumed agents would self-update without prompting @@ -1165,7 +1165,7 @@ Track and document when the Lead: 3. Verify updates are substantive and committed 4. Re-prompt if necessary 5. Don't close session until all agents have updated -- **Key insight**: "Review Lead owns follow-through on coordinator recommendations" +- **Key insight**: "Lead owns follow-through on coordinator recommendations" - **Test execution learning**: Test specialist couldn't run tests because PostgreSQL setup was skipped; must follow copilot-setup-steps.yml workflow **Specific lesson learned (2026-02-10 test fixes)**: @@ -1201,24 +1201,24 @@ Track and document when the Lead: **Specific lesson learned (2026-02-08 endpoint migration)**: - **Session**: Annotation API endpoint migration (flat to nested RESTful pattern) -- **Failures identified**: Review Lead worked solo instead of delegating to specialists +- **Failures identified**: Lead worked solo instead of delegating to specialists - **Root cause**: Treated "simple" endpoint URL changes as not requiring delegation - **Impact**: User intervention required ("You are regressing. You must handle my requests as a team") - **Failure pattern**: 1. User: "migrate endpoints to /api/v3_0/accounts//annotations" - 2. Review Lead misunderstood as confirmation request (Failure #1) + 2. Lead misunderstood as confirmation request (Failure #1) 3. User corrected: "That was rather useless... you basically ignored my request" - 4. Review Lead did entire migration alone without delegation (Failure #2): + 4. Lead did entire migration alone without delegation (Failure #2): - Migrated AccountAPI, AssetAPI, SensorAPI endpoints - Updated test URLs - Ran pre-commit hooks - NO delegation to Test/API/Documentation specialists 5. User: "You are regressing. You must handle my requests as a team" - 6. Review Lead then properly delegated after explicit user checklist + 6. Lead then properly delegated after explicit user checklist - **Key insights**: - "Simple task" is a cognitive trap that triggers solo execution mode - - NO task is too simple to delegate - delegation is the Review Lead's core job - - Regression pattern: Review Lead forgets team-based model under time pressure + - NO task is too simple to delegate - delegation is the Lead's core job + - Regression pattern: Lead forgets team-based model under time pressure - Request interpretation MUST happen before work starts - **Prevention**: Added sections to this file: 1. **Request Interpretation** (Section 1.1) - Parse intent before work @@ -1226,25 +1226,25 @@ Track and document when the Lead: 3. **Regression Prevention** - How to detect and correct backsliding 4. **Delegation Verification** - Session close checklist item 5. **Quick Navigation** - Prominent links to critical sections -- **Verification**: Review Lead must now answer "Am I working solo?" before ANY execution +- **Verification**: Lead must now answer "Am I working solo?" before ANY execution Update this file to prevent repeating the same mistakes. ## Session Close Checklist (MANDATORY) -**Before closing ANY session, the Review Lead MUST verify ALL items in this checklist.** +**Before closing ANY session, the Lead MUST verify ALL items in this checklist.** This is non-negotiable. Skipping items without explicit justification and user approval is a governance failure. ### Delegation Verification (CRITICAL - NEW) -**Before closing session, verify Review Lead did NOT work solo:** +**Before closing session, verify Lead did NOT work solo:** - [ ] **Task type identified**: Code/API/docs/time/performance/governance changes -- [ ] **Specialists involved**: Appropriate specialists were invoked (not Review Lead alone) +- [ ] **Specialists involved**: Appropriate specialists were invoked (not Lead alone) - [ ] **Evidence of delegation**: Show task() calls that invoked specialists -- [ ] **No solo execution**: Review Lead did NOT make code/API/docs changes itself +- [ ] **No solo execution**: Lead did NOT make code/API/docs changes itself - [ ] **Synthesis provided**: Combined specialist findings into unified output **Evidence required:** @@ -1254,7 +1254,7 @@ List which specialists were invoked and what each did: βœ“ Test Specialist - Updated test URLs, verified 32 tests pass βœ“ API Specialist - Verified backward compatibility βœ“ Documentation Specialist - Updated API docs with new structure -βœ“ Review Lead - Synthesized findings, managed coordination +βœ“ Lead - Synthesized findings, managed coordination ``` **FORBIDDEN patterns (immediate governance failure):** @@ -1262,14 +1262,14 @@ List which specialists were invoked and what each did: - ❌ "I handled it myself" (regression to solo mode) - ❌ "Too simple to delegate" (invalid justification) - ❌ "No specialists needed" (delegation always needed for code/API/docs) -- ❌ Review Lead commits containing code changes (should be specialist commits) -- ❌ Review Lead commits containing test changes (should be Test Specialist) -- ❌ Review Lead commits containing doc changes (should be Documentation Specialist) +- ❌ Lead commits containing code changes (should be specialist commits) +- ❌ Lead commits containing test changes (should be Test Specialist) +- ❌ Lead commits containing doc changes (should be Documentation Specialist) **Git commit check:** ```bash -git log --oneline -10 --author="Review Lead" +git log --oneline -10 --author="Lead" ``` Should show ONLY: @@ -1348,7 +1348,7 @@ This is a regression (see Regression Prevention section). You MUST: ### Enforcement -**The Review Lead MUST NOT close a session until ALL checklist items are verified.** +**The Lead MUST NOT close a session until ALL checklist items are verified.** If you cannot complete an item: 1. Document why in session notes @@ -1358,7 +1358,7 @@ If you cannot complete an item: If you close without completing checklist: - This is a governance failure - Coordinator will document it -- Review Lead instructions will be updated to prevent recurrence +- Lead instructions will be updated to prevent recurrence ### Continuous Improvement diff --git a/documentation/changelog.rst b/documentation/changelog.rst index e4db3f6516..e883f119c3 100644 --- a/documentation/changelog.rst +++ b/documentation/changelog.rst @@ -24,6 +24,7 @@ New features Infrastructure / Support ---------------------- +* Support data source assignment/filtering by account, and preserve user ID and account ID references in audit logs and data sources for traceability and compliance [see `PR #2058 `_ and `PR #2065 `_] * Stop creating new toy assets when restarting the docker-compose stack [see `PR #2018 `_] * Migrate from ``pip`` to ``uv`` for dependency management, and from ``make`` to ``poe`` [see `PR #1973 `_] * Improve contact information to get in touch with the FlexMeasures community [see `PR #2022 `_] diff --git a/documentation/features/reporting.rst b/documentation/features/reporting.rst index 0796d351c3..9576b035bd 100644 --- a/documentation/features/reporting.rst +++ b/documentation/features/reporting.rst @@ -69,6 +69,14 @@ This parameterizes the computation (from which sensors does data come from, whic "end" : "2023-01-03T00:00:00+00:00", } +.. note:: + In addition to filtering by specific data source IDs (``source`` / ``sources``), reporter input data can be filtered using: + + - ``source_types``: list of source type names to include (e.g. ``["forecaster", "scheduler"]``) + - ``exclude_source_types``: list of source type names to exclude + - ``account_id``: list of account IDs to include only data from sources belonging to those accounts (note: only matches user-type data sources β€” DataSources created by reporters, schedulers, and forecasters have no account and will not be matched by this filter) + + These correspond to the same filters available on ``Sensor.search_beliefs``. Example: Profits & losses diff --git a/flexmeasures/api/v3_0/tests/test_sensor_data_fresh_db.py b/flexmeasures/api/v3_0/tests/test_sensor_data_fresh_db.py index 3912bc0cca..f297e97ef6 100644 --- a/flexmeasures/api/v3_0/tests/test_sensor_data_fresh_db.py +++ b/flexmeasures/api/v3_0/tests/test_sensor_data_fresh_db.py @@ -162,3 +162,39 @@ def test_post_sensor_instantaneous_data_round( assert data.reset_index().event_start[0] == pd.Timestamp( "2021-06-06 22:00:00+0000", tz="UTC" ) + + +@pytest.mark.parametrize("requesting_user", ["improper_user@seita.nl"], indirect=True) +def test_post_sensor_data_sets_account_id_on_data_source( + client, + setup_api_fresh_test_data, + setup_user_without_data_source, + requesting_user, + db, +): + """When sensor data is posted, the resulting data source should have account_id + set to the posting user's account_id. + """ + sensor = setup_api_fresh_test_data["some gas sensor"] + post_data = make_sensor_data_request_for_gas_sensor( + num_values=6, unit="mΒ³/h", include_a_null=False + ) + + # Make sure the user is not yet registered as a data source + data_source = db.session.execute( + select(Source).filter_by(user=setup_user_without_data_source) + ).scalar_one_or_none() + assert data_source is None + + response = client.post( + url_for("SensorAPI:post_data", id=sensor.id), + json=post_data, + ) + assert response.status_code == 200 + + # Make sure the user is now registered as a data source with account_id set + data_source = db.session.execute( + select(Source).filter_by(user=setup_user_without_data_source) + ).scalar_one_or_none() + assert data_source is not None + assert data_source.account_id == setup_user_without_data_source.account_id diff --git a/flexmeasures/cli/data_add.py b/flexmeasures/cli/data_add.py index d0bde2cb59..112b21dd4c 100755 --- a/flexmeasures/cli/data_add.py +++ b/flexmeasures/cli/data_add.py @@ -516,13 +516,23 @@ def add_initial_structure(): type=str, help=f"Type of source (free, but FlexMeasures has support for {DEFAULT_DATASOURCE_TYPES}).", ) -def add_source(name: str, model: str, version: str, source_type: str): +@click.option( + "--account", + "account", + required=False, + type=AccountIdField(), + help="Organisation account associated with the source.", +) +def add_source( + name: str, model: str, version: str, source_type: str, account: Account | None +): """Add a data source.""" source = get_or_create_source( source=name, model=model, version=version, source_type=source_type, + account=account, ) db.session.commit() click.secho(f"Added source {source.__repr__()}", **MsgStyle.SUCCESS) diff --git a/flexmeasures/cli/tests/test_data_delete.py b/flexmeasures/cli/tests/test_data_delete.py index 44ace78500..7fd9a2b558 100644 --- a/flexmeasures/cli/tests/test_data_delete.py +++ b/flexmeasures/cli/tests/test_data_delete.py @@ -9,15 +9,29 @@ def test_delete_account( fresh_db, setup_roles_users_fresh_db, setup_assets_fresh_db, app ): - """Check account is deleted + old audit log entries get affected_account_id set to None.""" + """Check account is deleted + old audit log entries get affected_account_id set to None. + Also check that data source lineage is preserved: account_id is NOT nullified after account deletion. + """ from flexmeasures.cli.data_delete import delete_account + from flexmeasures.data.models.data_sources import DataSource prosumer: User = find_user_by_email("test_prosumer_user@seita.nl") prosumer_account_id = prosumer.account_id num_accounts = fresh_db.session.scalar(select(func.count()).select_from(Account)) - # Add creation audit log record + # Find data sources belonging to the account's users (for lineage check after deletion) + data_sources_before = fresh_db.session.scalars( + select(DataSource).filter_by(account_id=prosumer_account_id) + ).all() + assert ( + len(data_sources_before) > 0 + ), "Data sources linked to the account should exist before deletion." + data_source_ids_and_lineage = [ + (ds.id, ds.user_id, ds.account_id) for ds in data_sources_before + ] + + # Add creation audit log record, as that has not automatically been done when setting up test data user_creation_audit_log = AuditLog( event="User Test Prosumer User created test", affected_user_id=prosumer.id, @@ -46,4 +60,102 @@ def test_delete_account( .filter_by(event="User Test Prosumer User created test") .one_or_none() ) - assert user_creation_audit_log.affected_account_id is None + assert user_creation_audit_log.affected_account_id == prosumer_account_id, ( + "Audit log affected_account_id should be preserved (not nullified) " + "after account deletion for lineage purposes." + ) + assert user_creation_audit_log.affected_user_id == prosumer.id, ( + "Audit log affected_user_id should be preserved (not nullified) " + "after account deletion for lineage purposes." + ) + + # Check that data source lineage is preserved: account_id and user_id are NOT nullified after account deletion + for ds_id, original_user_id, original_account_id in data_source_ids_and_lineage: + data_source = fresh_db.session.get(DataSource, ds_id) + assert ( + data_source is not None + ), f"Data source {ds_id} should still exist after account deletion." + assert data_source.account_id == original_account_id, ( + f"Data source {ds_id} account_id should be preserved (not nullified) " + "after account deletion for lineage purposes." + ) + if original_user_id is not None: + assert data_source.user_id == original_user_id, ( + f"Data source {ds_id} user_id should be preserved (not nullified) " + "after account deletion for lineage purposes." + ) + + +def test_delete_user(fresh_db, setup_roles_users_fresh_db, setup_assets_fresh_db, app): + """Check user is deleted + old audit log entries get affected_user_id preserved. + Also check that data source lineage is preserved: user_id is NOT nullified after user deletion. + """ + from flexmeasures.cli.data_delete import delete_a_user + from flexmeasures.data.models.data_sources import DataSource + + prosumer: User = find_user_by_email("test_prosumer_user@seita.nl") + prosumer_id = prosumer.id + prosumer_email = prosumer.email + prosumer_account_id = prosumer.account_id + + num_users = fresh_db.session.scalar(select(func.count()).select_from(User)) + + # Find data sources belonging to the user (for lineage check after deletion) + data_source_before = fresh_db.session.execute( + select(DataSource).filter_by(user_id=prosumer_id) + ).scalar_one_or_none() + if data_source_before is not None: + data_source_id = data_source_before.id + data_source_user_id_before = data_source_before.user_id + data_source_account_id_before = data_source_before.account_id + else: + data_source_id = None + + # Add creation audit log record + user_creation_audit_log = AuditLog( + event="User Test Prosumer User created test", + affected_user_id=prosumer_id, + affected_account_id=prosumer_account_id, + ) + fresh_db.session.add(user_creation_audit_log) + fresh_db.session.commit() + + # Delete the user via CLI + cli_input = { + "email": prosumer_email, + } + runner = app.test_cli_runner() + result = runner.invoke(delete_a_user, to_flags(cli_input), input="y\n") + check_command_ran_without_error(result) + + # Check user is deleted + assert find_user_by_email(prosumer_email) is None + assert ( + fresh_db.session.scalar(select(func.count()).select_from(User)) == num_users - 1 + ) + + # Check that old audit log entries preserve affected_user_id (not set to None) + user_creation_audit_log_after = ( + fresh_db.session.query(AuditLog) + .filter_by(event="User Test Prosumer User created test") + .one_or_none() + ) + assert user_creation_audit_log_after.affected_user_id == prosumer_id, ( + "Audit log affected_user_id should be preserved (not nullified) " + "after user deletion for lineage purposes." + ) + + # Check that data source lineage is preserved: user_id is NOT nullified after user deletion + if data_source_id is not None: + data_source_after = fresh_db.session.get(DataSource, data_source_id) + assert ( + data_source_after is not None + ), f"Data source {data_source_id} should still exist after user deletion." + assert data_source_after.user_id == data_source_user_id_before, ( + f"Data source {data_source_id} user_id should be preserved (not nullified) " + "after user deletion for lineage purposes." + ) + assert data_source_after.account_id == data_source_account_id_before, ( + f"Data source {data_source_id} account_id should be preserved (not nullified) " + "after user deletion for lineage purposes." + ) diff --git a/flexmeasures/conftest.py b/flexmeasures/conftest.py index d4cf0fe766..06767abeee 100644 --- a/flexmeasures/conftest.py +++ b/flexmeasures/conftest.py @@ -1754,6 +1754,69 @@ def setup_multiple_sources(db, add_battery_assets): return test_sensor, s1, s2, s3 +@pytest.fixture(scope="module") +def setup_sources_with_accounts(db, add_battery_assets, setup_accounts): + """Set up a sensor with beliefs from data sources that belong to different accounts. + + Returns a tuple of (sensor, source_account_a, source_account_b, source_no_account) + where each source has beliefs at different time slots: + - source_account_a: account "Prosumer", belief at 2024-02-01T10:00+01:00 + - source_account_b: account "Supplier", belief at 2024-02-01T11:00+01:00 + - source_no_account: no account, belief at 2024-02-01T12:00+01:00 + """ + battery = add_battery_assets["Test battery with dynamic power capacity"] + + test_sensor = Sensor( + name="test sensor for account filtering", + generic_asset=battery, + unit="kW", + event_resolution=timedelta(minutes=15), + ) + db.session.add(test_sensor) + + source_account_a = DataSource( + name="Source Account A", + type="demo script", + account_id=setup_accounts["Prosumer"].id, + ) + source_account_b = DataSource( + name="Source Account B", + type="demo script", + account_id=setup_accounts["Supplier"].id, + ) + source_no_account = DataSource( + name="Source No Account", + type="demo script", + ) + db.session.add_all([source_account_a, source_account_b, source_no_account]) + + add_beliefs( + db=db, + sensor=test_sensor, + time_slots=[pd.Timestamp("2024-02-01T10:00:00+01:00")], + values=[1.0], + source=source_account_a, + ) + add_beliefs( + db=db, + sensor=test_sensor, + time_slots=[pd.Timestamp("2024-02-01T11:00:00+01:00")], + values=[2.0], + source=source_account_b, + ) + add_beliefs( + db=db, + sensor=test_sensor, + time_slots=[pd.Timestamp("2024-02-01T12:00:00+01:00")], + values=[3.0], + source=source_no_account, + ) + + db.session.commit() + + return test_sensor, source_account_a, source_account_b, source_no_account + + def add_beliefs( db, sensor: Sensor, diff --git a/flexmeasures/data/migrations/versions/9877450113f6_add_account_id_to_data_source.py b/flexmeasures/data/migrations/versions/9877450113f6_add_account_id_to_data_source.py new file mode 100644 index 0000000000..f0e046d120 --- /dev/null +++ b/flexmeasures/data/migrations/versions/9877450113f6_add_account_id_to_data_source.py @@ -0,0 +1,101 @@ +"""Add account_id to data_source table and drop FK constraints for lineage preservation + +Adds account_id to data_source (without a DB-level FK constraint, so that referenced +accounts can be deleted while the historical account_id value is preserved for lineage). +Also drops the existing user_id FK constraint for the same reason: when a user is deleted, +the data_source.user_id should remain intact rather than being cascaded or nullified. + +Revision ID: 9877450113f6 +Revises: 8b62f8129f34 +Create Date: 2026-03-24 22:10:00.000000 + +""" + +from alembic import op +import sqlalchemy as sa +from sqlalchemy import inspect + + +# revision identifiers, used by Alembic. +revision = "9877450113f6" +down_revision = "e26d02ed1621" +branch_labels = None +depends_on = None + +# Minimal table definitions for the data migration (SQLAlchemy Core only, no ORM) +t_data_source = sa.Table( + "data_source", + sa.MetaData(), + sa.Column("id", sa.Integer), + sa.Column("user_id", sa.Integer), + sa.Column("account_id", sa.Integer), +) + +t_fm_user = sa.Table( + "fm_user", + sa.MetaData(), + sa.Column("id", sa.Integer), + sa.Column("account_id", sa.Integer), +) + + +def upgrade(): + # 1. Add the account_id column (nullable, no DB-level FK so lineage is preserved) + with op.batch_alter_table("data_source", schema=None) as batch_op: + batch_op.add_column(sa.Column("account_id", sa.Integer(), nullable=True)) + + # 2. Data migration: populate account_id from the related user's account. + # Use a correlated subquery to avoid N+1 queries and ensure portability. + connection = op.get_bind() + connection.execute( + sa.update(t_data_source) + .values( + account_id=sa.select(t_fm_user.c.account_id) + .where(t_fm_user.c.id == t_data_source.c.user_id) + .scalar_subquery() + ) + .where(t_data_source.c.user_id.isnot(None)) + ) + + # 3. Drop old UniqueConstraint and recreate it with account_id included + with op.batch_alter_table("data_source", schema=None) as batch_op: + batch_op.drop_constraint("data_source_name_key", type_="unique") + batch_op.create_unique_constraint( + "data_source_name_key", + ["name", "user_id", "account_id", "model", "version", "attributes_hash"], + ) + + # 4. Drop the user_id FK constraint so that deleting a user preserves the lineage + # reference in data_source rows (no cascade, no SET NULL). + bind = op.get_bind() + inspector = inspect(bind) + existing_fks = inspector.get_foreign_keys("data_source") + existing_fk_names = [fk["name"] for fk in existing_fks] + with op.batch_alter_table("data_source", schema=None) as batch_op: + for fk_name in existing_fk_names: + if "user_id" in fk_name: + batch_op.drop_constraint(fk_name, type_="foreignkey") + break + + +def downgrade(): + # 1. Re-add the user_id FK constraint + with op.batch_alter_table("data_source", schema=None) as batch_op: + batch_op.create_foreign_key( + "data_source_user_id_fkey", + "fm_user", + ["user_id"], + ["id"], + ) + + # 2. Restore the original UniqueConstraint without account_id + with op.batch_alter_table("data_source", schema=None) as batch_op: + batch_op.drop_constraint("data_source_name_key", type_="unique") + batch_op.create_unique_constraint( + "data_source_name_key", + ["name", "user_id", "model", "version", "attributes_hash"], + ) + + # 3. Drop the account_id column + with op.batch_alter_table("data_source", schema=None) as batch_op: + batch_op.drop_column("account_id") diff --git a/flexmeasures/data/migrations/versions/a5b26c3f8e91_recompute_attributes_hash_with_sort_keys.py b/flexmeasures/data/migrations/versions/a5b26c3f8e91_recompute_attributes_hash_with_sort_keys.py index 7d06e7df75..4a708932e4 100644 --- a/flexmeasures/data/migrations/versions/a5b26c3f8e91_recompute_attributes_hash_with_sort_keys.py +++ b/flexmeasures/data/migrations/versions/a5b26c3f8e91_recompute_attributes_hash_with_sort_keys.py @@ -2,22 +2,22 @@ Previously the hash was computed without sorting JSON object keys, so a PostgreSQL JSONB round-trip (which always returns keys in alphabetical order) -produced a different hash than the one stored in the database. This caused +produced a different hash than the one stored in the database. This caused get_or_create_source() to silently create duplicate DataSource rows when it was called with attributes that had been loaded back from the database. The upgrade also handles the case where the bug already produced duplicate rows (same logical content, but saved with different key-insertion-order hashes). -For each group of duplicates the newest row (highest ID) is kept as-is. Older +For each group of duplicates the newest row (highest ID) is kept as-is. Older duplicates receive a synthetic ``{"flexmeasures-hash-conflict": N}`` attribute so that their hashes remain unique without touching the timed_belief table. Downgrade note: since PostgreSQL JSONB already serialises all object keys in alphabetical order when storing, ``json.dumps(attrs)`` and ``json.dumps(attrs, sort_keys=True)`` produce identical strings for any data -that has gone through JSONB. Therefore recomputing the hash without +that has gone through JSONB. Therefore, recomputing the hash without ``sort_keys`` would yield the same bytes as the upgrade, making a downgrade -data-migration a no-op. The downgrade function is intentionally left empty. +data-migration a no-op. The downgrade function is intentionally left empty. Revision ID: a5b26c3f8e91 Revises: 8b62f8129f34 @@ -51,7 +51,7 @@ def upgrade(): Duplicate rows (same logical content, different key-order hashes) are resolved by tagging older rows with a synthetic conflict-marker attribute so - that each row still gets a unique hash. The newest row (highest id) is + that each row still gets a unique hash. The newest row (highest id) is always kept clean. """ bind = op.get_bind() @@ -64,7 +64,7 @@ def upgrade(): ).fetchall() # Group rows by their normalised unique key (name, user_id, model, version, - # sorted-attributes hash). Any group with >1 member means the bug created + # sorted-attributes hash). Any group with >1 member means the bug created # duplicate rows. groups: dict = defaultdict(list) attrs_by_id: dict = {} @@ -112,7 +112,7 @@ def downgrade(): """No data migration needed on downgrade. PostgreSQL JSONB always serialises object keys in alphabetical order when - storing. This means ``json.dumps(attrs)`` and + storing. This means ``json.dumps(attrs)`` and ``json.dumps(attrs, sort_keys=True)`` produce identical strings for any attributes that have been round-tripped through the database, so recomputing hashes without ``sort_keys`` would yield the same bytes. @@ -120,7 +120,7 @@ def downgrade(): Note: rows that were tagged with ``flexmeasures-hash-conflict`` during upgrade are NOT cleaned up here, because doing so would require knowing which rows were duplicates and which one was the "canonical" row -- that - information is not reliably recoverable. After a downgrade, those rows + information is not reliably recoverable. After a downgrade, those rows will have a slightly different ``attributes`` dict than before the upgrade, but ``get_or_create_source`` will still find them correctly via their hash. """ diff --git a/flexmeasures/data/migrations/versions/b2c3d4e5f6a7_drop_fk_constraint_from_audit_log_active_user_id.py b/flexmeasures/data/migrations/versions/b2c3d4e5f6a7_drop_fk_constraint_from_audit_log_active_user_id.py new file mode 100644 index 0000000000..6eac719b54 --- /dev/null +++ b/flexmeasures/data/migrations/versions/b2c3d4e5f6a7_drop_fk_constraint_from_audit_log_active_user_id.py @@ -0,0 +1,58 @@ +"""Drop FK constraint from audit_log.active_user_id for data lineage preservation + +When users are deleted, we want to preserve the historical active_user_id +values in audit_log rows for lineage purposes, rather than cascade-deleting +or nullifying them. + +Revision ID: b2c3d4e5f6a7 +Revises: a1b2c3d4e5f6 +Create Date: 2026-03-27 00:00:00.000000 + +""" + +from alembic import op + + +# revision identifiers, used by Alembic. +revision = "b2c3d4e5f6a7" +down_revision = "9877450113f6" +branch_labels = None +depends_on = None + + +def upgrade(): + with op.batch_alter_table("audit_log", schema=None) as batch_op: + batch_op.drop_constraint( + op.f("audit_log_active_user_id_fm_user_fkey"), type_="foreignkey" + ) + batch_op.drop_constraint( + op.f("audit_log_affected_user_id_fm_user_fkey"), type_="foreignkey" + ) + batch_op.drop_constraint( + op.f("audit_log_affected_account_id_account_fkey"), type_="foreignkey" + ) + + +def downgrade(): + with op.batch_alter_table("audit_log", schema=None) as batch_op: + batch_op.create_foreign_key( + op.f("audit_log_affected_account_id_account_fkey"), + "account", + ["affected_account_id"], + ["id"], + ondelete="SET NULL", + ) + batch_op.create_foreign_key( + op.f("audit_log_affected_user_id_fm_user_fkey"), + "fm_user", + ["affected_user_id"], + ["id"], + ondelete="SET NULL", + ) + batch_op.create_foreign_key( + op.f("audit_log_active_user_id_fm_user_fkey"), + "fm_user", + ["active_user_id"], + ["id"], + ondelete="SET NULL", + ) diff --git a/flexmeasures/data/models/audit_log.py b/flexmeasures/data/models/audit_log.py index 3e213fd2e4..86c0b250f6 100644 --- a/flexmeasures/data/models/audit_log.py +++ b/flexmeasures/data/models/audit_log.py @@ -30,14 +30,35 @@ class AuditLog(db.Model, AuthModelMixin): event_datetime = Column(DateTime()) event = Column(String(500)) active_user_name = Column(String(255)) - active_user_id = Column( - "active_user_id", Integer(), ForeignKey("fm_user.id", ondelete="SET NULL") + # No DB-level FK with cascade for any user_id or account_id so that deleting a user preserves the lineage reference in this column. + active_user_id = Column("active_user_id", Integer(), nullable=True) + affected_user_id = Column("affected_user_id", Integer(), nullable=True) + affected_account_id = Column("affected_account_id", Integer(), nullable=True) + + # Relationships to navigate to User and Account without database-level FK constraints + # This allows audit logs to maintain references to deleted users/accounts for lineage purposes + # The foreign_keys= parameter inside db.relationship(...) is a SQLAlchemy ORM hint only β€” it has zero effect on the database schema. + # It's needed here because SQLAlchemy can't automatically infer which column is the "FK side" of the join when there's no actual ForeignKey() in the column definition + active_user = db.relationship( + "User", + primaryjoin="AuditLog.active_user_id == User.id", + foreign_keys="[AuditLog.active_user_id]", + backref=db.backref("active_audit_logs", lazy=True, passive_deletes="all"), + passive_deletes="all", ) - affected_user_id = Column( - "affected_user_id", Integer(), ForeignKey("fm_user.id", ondelete="SET NULL") + affected_user = db.relationship( + "User", + primaryjoin="AuditLog.affected_user_id == User.id", + foreign_keys="[AuditLog.affected_user_id]", + backref=db.backref("affected_audit_logs", lazy=True, passive_deletes="all"), + passive_deletes="all", ) - affected_account_id = Column( - "affected_account_id", Integer(), ForeignKey("account.id", ondelete="SET NULL") + affected_account = db.relationship( + "Account", + primaryjoin="AuditLog.affected_account_id == Account.id", + foreign_keys="[AuditLog.affected_account_id]", + backref=db.backref("affected_audit_logs", lazy=True, passive_deletes="all"), + passive_deletes="all", ) @classmethod diff --git a/flexmeasures/data/models/data_sources.py b/flexmeasures/data/models/data_sources.py index 46c81d79fb..7da583679e 100644 --- a/flexmeasures/data/models/data_sources.py +++ b/flexmeasures/data/models/data_sources.py @@ -271,18 +271,36 @@ class DataSource(db.Model, tb.BeliefSourceDBMixin): __tablename__ = "data_source" __table_args__ = ( - db.UniqueConstraint("name", "user_id", "model", "version", "attributes_hash"), + db.UniqueConstraint( + "name", "user_id", "account_id", "model", "version", "attributes_hash" + ), ) # The type of data source (e.g. user, forecaster or scheduler) # just a string, but preferably one of DEFAULT_DATASOURCE_TYPES type = db.Column(db.String(80), default="") - # The id of the user source (can link e.g. to fm_user table) - user_id = db.Column( - db.Integer, db.ForeignKey("fm_user.id"), nullable=True, unique=True + # The id of the user source (can link e.g. to fm_user table). + # No DB-level FK so that deleting a user preserves the lineage reference in this column. + user_id = db.Column(db.Integer, nullable=True, unique=True) + user = db.relationship( + "User", + primaryjoin="DataSource.user_id == User.id", + foreign_keys="[DataSource.user_id]", + backref=db.backref("data_source", lazy=True, passive_deletes="all"), + passive_deletes="all", + ) + + # The account this data source belongs to (populated from user.account for user-type sources). + # No DB-level FK so that deleting an account preserves the lineage reference in this column. + account_id = db.Column(db.Integer, nullable=True) + account = db.relationship( + "Account", + primaryjoin="DataSource.account_id == Account.id", + foreign_keys="[DataSource.account_id]", + backref=db.backref("data_sources", lazy=True, passive_deletes="all"), + passive_deletes="all", ) - user = db.relationship("User", backref=db.backref("data_source", lazy=True)) attributes = db.Column(MutableDict.as_mutable(JSONB), nullable=False, default={}) @@ -316,6 +334,13 @@ def __init__( name = user.username type = "user" self.user = user + # Prefer the FK column directly (avoids triggering a lazy load/autoflush). + # Fall back to the account relationship for users not yet flushed to the DB + # (where account_id may not be set on the column yet). + if user.account_id is not None: + self.account_id = user.account_id + elif user.account is not None: + self.account_id = user.account.id elif user is None and type == "user": raise TypeError("A data source cannot have type 'user' but no user set.") self.type = type diff --git a/flexmeasures/data/models/generic_assets.py b/flexmeasures/data/models/generic_assets.py index 08827df779..8613989cc5 100644 --- a/flexmeasures/data/models/generic_assets.py +++ b/flexmeasures/data/models/generic_assets.py @@ -1067,6 +1067,7 @@ def search_beliefs( # noqa C901 source: ( DataSource | list[DataSource] | int | list[int] | str | list[str] | None ) = None, + account_id: int | list[int] | None = None, use_latest_version_per_event: bool = True, most_recent_beliefs_only: bool = True, most_recent_events_only: bool = False, @@ -1086,6 +1087,7 @@ def search_beliefs( # noqa C901 :param horizons_at_least: only return beliefs with a belief horizon equal or greater than this timedelta (for example, use timedelta(0) to get ante knowledge time beliefs) :param horizons_at_most: only return beliefs with a belief horizon equal or less than this timedelta (for example, use timedelta(0) to get post knowledge time beliefs) :param source: search only beliefs by this source (pass the DataSource, or its name or id) or list of sources + :param account_id: Optional account id (or list thereof) to filter sources by their owning account :param use_latest_version_per_event: only return the belief from the latest version of a source, for each event :param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) :param as_json: return beliefs in JSON format (e.g. for use in charts) rather than as BeliefsDataFrame @@ -1110,6 +1112,7 @@ def search_beliefs( # noqa C901 horizons_at_least=horizons_at_least, horizons_at_most=horizons_at_most, source=source, + account_id=account_id, use_latest_version_per_event=use_latest_version_per_event, most_recent_beliefs_only=most_recent_beliefs_only, most_recent_events_only=most_recent_events_only, diff --git a/flexmeasures/data/models/time_series.py b/flexmeasures/data/models/time_series.py index b4bed45d85..d6df1633e6 100644 --- a/flexmeasures/data/models/time_series.py +++ b/flexmeasures/data/models/time_series.py @@ -397,6 +397,7 @@ def search_beliefs( # noqa: C901 user_source_ids: int | list[int] | None = None, source_types: list[str] | None = None, exclude_source_types: list[str] | None = None, + account_id: int | list[int] | None = None, use_latest_version_per_event: bool = True, most_recent_beliefs_only: bool = True, most_recent_events_only: bool = False, @@ -421,6 +422,7 @@ def search_beliefs( # noqa: C901 :param user_source_ids: Optional list of user source ids to query only specific user sources :param source_types: Optional list of source type names to query only specific source types * :param exclude_source_types: Optional list of source type names to exclude specific source types * + :param account_id: Optional account id (or list thereof) to filter sources by their owning account :param use_latest_version_per_event: only return the belief from the latest version of a source, for each event :param most_recent_beliefs_only: only return the most recent beliefs for each event from each source (minimum belief horizon). Defaults to True. :param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start). Defaults to False. @@ -444,6 +446,7 @@ def search_beliefs( # noqa: C901 user_source_ids=user_source_ids, source_types=source_types, exclude_source_types=exclude_source_types, + account_id=account_id, use_latest_version_per_event=use_latest_version_per_event, most_recent_beliefs_only=most_recent_beliefs_only, most_recent_events_only=most_recent_events_only, @@ -857,6 +860,7 @@ def search( user_source_ids: int | list[int] | None = None, source_types: list[str] | None = None, exclude_source_types: list[str] | None = None, + account_id: int | list[int] | None = None, use_latest_version_per_event: bool = True, most_recent_beliefs_only: bool = True, most_recent_events_only: bool = False, @@ -881,6 +885,7 @@ def search( :param user_source_ids: Optional list of user source ids to query only specific user sources :param source_types: Optional list of source type names to query only specific source types * :param exclude_source_types: Optional list of source type names to exclude specific source types * + :param account_id: Optional account id (or list thereof) to filter sources by their owning account :param use_latest_version_per_event: only return the belief from the latest version of a source, for each event :param most_recent_beliefs_only: only return the most recent beliefs for each event from each source (minimum belief horizon). Defaults to True. :param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) @@ -919,7 +924,7 @@ def search( parsed_sources = parse_source_arg(source) source_criteria = get_source_criteria( - cls, user_source_ids, source_types, exclude_source_types + cls, user_source_ids, source_types, exclude_source_types, account_id ) custom_join_targets = [] if parsed_sources else [DataSource] diff --git a/flexmeasures/data/queries/utils.py b/flexmeasures/data/queries/utils.py index c439f41da1..b23845957a 100644 --- a/flexmeasures/data/queries/utils.py +++ b/flexmeasures/data/queries/utils.py @@ -87,6 +87,7 @@ def get_source_criteria( user_source_ids: int | list[int], source_types: list[str], exclude_source_types: list[str], + account_id: int | list[int] | None = None, ) -> list[BinaryExpression]: source_criteria: list[BinaryExpression] = [] if user_source_ids is not None: @@ -99,6 +100,8 @@ def get_source_criteria( if user_source_ids and "user" in exclude_source_types: exclude_source_types.remove("user") source_criteria.append(source_type_exclusion_criterion(exclude_source_types)) + if account_id is not None: + source_criteria.append(source_account_criterion(account_id)) return source_criteria @@ -141,6 +144,18 @@ def source_type_exclusion_criterion(source_types: list[str]) -> BinaryExpression return DataSource.type.not_in(source_types) +def source_account_criterion(account_id) -> BinaryExpression: + """Criterion to collect only data from sources belonging to the given account(s). + + Accepts account IDs as integers or Account model instances (or a list of either). + """ + if not isinstance(account_id, list): + account_id = [account_id] + # Support both integer IDs and Account model instances + account_ids = [a.id if not isinstance(a, int) else a for a in account_id] + return DataSource.account_id.in_(account_ids) + + def get_belief_timing_criteria( cls: "Type[ts.TimedValue]", asset_class: db.Model, diff --git a/flexmeasures/data/schemas/__init__.py b/flexmeasures/data/schemas/__init__.py index 8f56be0ab2..0e79ace9bb 100644 --- a/flexmeasures/data/schemas/__init__.py +++ b/flexmeasures/data/schemas/__init__.py @@ -2,7 +2,7 @@ Data schemas (Marshmallow) """ -from .account import AccountIdField +from .account import AccountIdField, AccountIdOrListField from .generic_assets import GenericAssetIdField as AssetIdField from .locations import LatitudeField, LongitudeField from .sensors import SensorIdField, VariableQuantityField diff --git a/flexmeasures/data/schemas/account.py b/flexmeasures/data/schemas/account.py index b71af2f86c..9bbf156c89 100644 --- a/flexmeasures/data/schemas/account.py +++ b/flexmeasures/data/schemas/account.py @@ -79,3 +79,30 @@ def _deserialize(self, value: Any, attr, data, **kwargs) -> Account: def _serialize(self, value: Account, attr, obj, **kwargs): """Turn an Account into a source id.""" return value.id + + +class AccountIdOrListField(fields.Field): + """Field that accepts a single account ID or a non-empty list of account IDs. + + Both ``42`` and ``[42, 99]`` are accepted. Always deserializes to a list of + :class:`~flexmeasures.data.models.user.Account` instances. + + The field is intentionally expressed as a union of ``integer`` and + ``array[integer]`` rather than always requiring a list, so that future + OpenAPI generation can emit a ``oneOf`` schema for it. + """ + + def _deserialize(self, value: Any, attr, data, **kwargs) -> list[Account]: + _item_field = AccountIdField() + if isinstance(value, list): + if len(value) == 0: + raise FMValidationError("Must be a non-empty list of account IDs.") + return [_item_field._deserialize(v, attr, data, **kwargs) for v in value] + return [_item_field._deserialize(value, attr, data, **kwargs)] + + def _serialize(self, value: Any, attr, obj, **kwargs): + if value is None: + return None + if isinstance(value, list): + return [a.id if hasattr(a, "id") else a for a in value] + return value.id if hasattr(value, "id") else value diff --git a/flexmeasures/data/schemas/io.py b/flexmeasures/data/schemas/io.py index afecb4b5b9..d8754fd75e 100644 --- a/flexmeasures/data/schemas/io.py +++ b/flexmeasures/data/schemas/io.py @@ -3,6 +3,7 @@ from flexmeasures.data.schemas.sensors import SensorIdField from flexmeasures.data.schemas import AwareDateTimeField, DurationField from flexmeasures.data.schemas.sources import DataSourceIdField +from flexmeasures.data.schemas.account import AccountIdOrListField from flask import current_app @@ -37,6 +38,7 @@ class Input(Schema): user_source_ids = fields.List(DataSourceIdField()) source_types = fields.List(fields.Str()) exclude_source_types = fields.List(fields.Str()) + account_id = AccountIdOrListField() most_recent_beliefs_only = fields.Boolean() most_recent_events_only = fields.Boolean() diff --git a/flexmeasures/data/schemas/reporting/__init__.py b/flexmeasures/data/schemas/reporting/__init__.py index 94257599ef..d97ef7d9ba 100644 --- a/flexmeasures/data/schemas/reporting/__init__.py +++ b/flexmeasures/data/schemas/reporting/__init__.py @@ -1,6 +1,7 @@ from marshmallow import Schema, fields, validate from flexmeasures.data.schemas.sources import DataSourceIdField +from flexmeasures.data.schemas.account import AccountIdOrListField from flexmeasures.data.schemas import AwareDateTimeField, DurationField from flexmeasures.data.schemas.io import Input, Output @@ -58,6 +59,7 @@ class BeliefsSearchConfigSchema(Schema): source_types = fields.List(fields.Str()) exclude_source_types = fields.List(fields.Str()) + account_id = AccountIdOrListField() most_recent_beliefs_only = fields.Boolean() most_recent_events_only = fields.Boolean() diff --git a/flexmeasures/data/schemas/tests/test_reporting.py b/flexmeasures/data/schemas/tests/test_reporting.py index efa1ab2c59..cf7427186b 100644 --- a/flexmeasures/data/schemas/tests/test_reporting.py +++ b/flexmeasures/data/schemas/tests/test_reporting.py @@ -6,6 +6,7 @@ ProfitOrLossReporterConfigSchema, ProfitOrLossReporterParametersSchema, ) +from flexmeasures.data.schemas.reporting import BeliefsSearchConfigSchema from marshmallow.exceptions import ValidationError import pytest @@ -228,3 +229,59 @@ def test_profit_reporter_parameters_schema( else: with pytest.raises(ValidationError): schema.load(parameters) + + +@pytest.mark.parametrize( + "make_config, is_valid", + [ + ( + lambda p, s: {"account_id": p}, + True, + ), + ( + lambda p, s: {"account_id": [p]}, + True, + ), + ( + lambda p, s: {"account_id": [p, s]}, + True, + ), + ( + lambda p, s: {"account_id": []}, + False, + ), + ( + lambda p, s: {"account_id": "not-a-list"}, + False, + ), + ( + lambda p, s: {"account_id": ["a", "b"]}, + False, + ), + ], + ids=[ + "single-int", + "single-item-list", + "multi-item-list", + "empty-list", + "string", + "list-of-strings", + ], +) +def test_beliefs_search_config_schema_account_id( + make_config, is_valid, db, app, setup_accounts +): + """Check that BeliefsSearchConfigSchema accepts account_id as a single int or list of ints, + rejects empty lists and non-integer values, and validates accounts exist in the DB. + """ + prosumer_id = setup_accounts["Prosumer"].id + supplier_id = setup_accounts["Supplier"].id + config = make_config(prosumer_id, supplier_id) + schema = BeliefsSearchConfigSchema() + if is_valid: + result = schema.load(config) + assert isinstance(result["account_id"], list) + assert len(result["account_id"]) >= 1 + else: + with pytest.raises(ValidationError): + schema.load(config) diff --git a/flexmeasures/data/services/data_sources.py b/flexmeasures/data/services/data_sources.py index c2f260806b..d8260b022e 100644 --- a/flexmeasures/data/services/data_sources.py +++ b/flexmeasures/data/services/data_sources.py @@ -6,7 +6,7 @@ from sqlalchemy import select from typing import Type, TypeVar -from flexmeasures import User, Source +from flexmeasures import Account, Source, User from flexmeasures.data import db from flexmeasures.data.models.data_sources import DataSource, DataGenerator from flexmeasures.data.models.user import is_user @@ -22,6 +22,7 @@ def get_or_create_source( model: str | None = None, version: str | None = None, attributes: dict | None = None, + account: Account | None = None, flush: bool = True, ) -> DataSource: if is_user(source): @@ -35,6 +36,8 @@ def get_or_create_source( query = query.filter( DataSource.attributes_hash == DataSource.hash_attributes(attributes) ) + if account is not None: + query = query.filter(DataSource.account == account) if is_user(source): query = query.filter(DataSource.user == source) elif isinstance(source, str): @@ -54,6 +57,7 @@ def get_or_create_source( version=version, type=source_type, attributes=attributes, + account=account, ) current_app.logger.info(f"Setting up {_source} as new data source...") db.session.add(_source) diff --git a/flexmeasures/data/services/users.py b/flexmeasures/data/services/users.py index f33dbfd87f..ce96ba13b6 100644 --- a/flexmeasures/data/services/users.py +++ b/flexmeasures/data/services/users.py @@ -235,7 +235,7 @@ def delete_user(user: User): event=f"User {user.username} deleted", active_user_id=active_user_id, active_user_name=active_user_name, - affected_user_id=None, # add the audit log record even if the user is gone + affected_user_id=user.id, # add the audit log record even if the user will be gone affected_account_id=user.account_id, ) db.session.add(user_audit_log) diff --git a/flexmeasures/data/tests/test_queries.py b/flexmeasures/data/tests/test_queries.py index 0fb4a248b1..a51ad46ea3 100644 --- a/flexmeasures/data/tests/test_queries.py +++ b/flexmeasures/data/tests/test_queries.py @@ -305,3 +305,115 @@ def get_sources_names(vec: list[DataSource]) -> list[str]: event_ends_before="2024-01-02T00:00:00+01:00", ) ) == ["S1", "S2"] + + +def test_search_beliefs_account_id_filter(db, setup_sources_with_accounts): + """Check that search_beliefs with account_id only returns beliefs from sources + belonging to the specified account. + + We set up three sources: + - source_account_a: belongs to the Prosumer account + - source_account_b: belongs to the Supplier account + - source_no_account: no account assigned + + Filtering by account_a should return only source_account_a's belief, + filtering by account_b should return only source_account_b's belief, + filtering by both accounts should return beliefs from both sources, + and filtering with no account_id should return all three beliefs. + """ + sensor, source_account_a, source_account_b, source_no_account = ( + setup_sources_with_accounts + ) + + # Filter by Prosumer account: should return only 1 belief + bdf_a = sensor.search_beliefs( + account_id=source_account_a.account_id, + most_recent_beliefs_only=False, + ) + assert len(bdf_a) == 1 + assert bdf_a.index.get_level_values("source")[0] == source_account_a + + # Filter by Supplier account: should return only 1 belief + bdf_b = sensor.search_beliefs( + account_id=source_account_b.account_id, + most_recent_beliefs_only=False, + ) + assert len(bdf_b) == 1 + assert bdf_b.index.get_level_values("source")[0] == source_account_b + + # Filter by both accounts: should return 2 beliefs + bdf_ab = sensor.search_beliefs( + account_id=[source_account_a.account_id, source_account_b.account_id], + most_recent_beliefs_only=False, + ) + assert len(bdf_ab) == 2 + sources_ab = set(bdf_ab.index.get_level_values("source")) + assert source_account_a in sources_ab + assert source_account_b in sources_ab + + # No account_id filter: should return all 3 beliefs (from all 3 sources) + bdf_all = sensor.search_beliefs(most_recent_beliefs_only=False) + assert len(bdf_all) == 3 + + # Empty list: matches nothing (consistent with SQL IN () semantics) + bdf_empty_list = sensor.search_beliefs( + account_id=[], + most_recent_beliefs_only=False, + ) + assert len(bdf_empty_list) == 0 + + +def test_timed_belief_search_account_id_filter(db, setup_sources_with_accounts): + """Check that TimedBelief.search with account_id returns only beliefs from + sources belonging to the specified account. + """ + sensor, source_account_a, source_account_b, source_no_account = ( + setup_sources_with_accounts + ) + + # Filter by Prosumer account + bdf = TimedBelief.search( + sensor, + account_id=source_account_a.account_id, + most_recent_beliefs_only=False, + ) + assert len(bdf) == 1 + assert bdf.index.get_level_values("source")[0] == source_account_a + + # Filter by a non-existent account id: should return no beliefs + bdf_empty = TimedBelief.search( + sensor, + account_id=999999, + most_recent_beliefs_only=False, + ) + assert len(bdf_empty) == 0 + + +def test_generic_asset_search_beliefs_account_id_filter( + db, setup_sources_with_accounts +): + """Check that GenericAsset.search_beliefs with account_id returns only beliefs + from sources belonging to the specified account. + """ + sensor, source_account_a, source_account_b, source_no_account = ( + setup_sources_with_accounts + ) + asset = sensor.generic_asset + + # Filter by Prosumer account: should return only 1 belief + bdf_dict_a = asset.search_beliefs( + sensors=[sensor], + account_id=source_account_a.account_id, + most_recent_beliefs_only=False, + ) + bdf_a = bdf_dict_a[sensor] + assert len(bdf_a) == 1 + assert bdf_a.index.get_level_values("source")[0] == source_account_a + + # No account_id filter: should return all 3 beliefs + bdf_dict_all = asset.search_beliefs( + sensors=[sensor], + most_recent_beliefs_only=False, + ) + bdf_all = bdf_dict_all[sensor] + assert len(bdf_all) == 3 diff --git a/flexmeasures/data/tests/test_user_services.py b/flexmeasures/data/tests/test_user_services.py index eb08734b5f..8d90fae4dd 100644 --- a/flexmeasures/data/tests/test_user_services.py +++ b/flexmeasures/data/tests/test_user_services.py @@ -133,11 +133,25 @@ def test_create_invalid_user( def test_delete_user(fresh_db, setup_roles_users_fresh_db, setup_assets_fresh_db, app): """Check that deleting a user does not lead to deleting their organisation's (asset/sensor/beliefs) data. Also check that an audit log entry is created + old audit log entries get affected_user_id set to None. + Also check that the data source's user_id is preserved (not nullified) after user deletion for lineage. """ prosumer: User = find_user_by_email("test_prosumer_user@seita.nl") + prosumer_id = prosumer.id prosumer_account_id = prosumer.account_id num_users_before = fresh_db.session.scalar(select(func.count(User.id))) + # Find the data source linked to the prosumer user (for lineage check after deletion) + data_source_before = fresh_db.session.execute( + select(DataSource).filter_by(user_id=prosumer_id) + ).scalar_one_or_none() + assert ( + data_source_before is not None + ), "A data source linked to the prosumer user should exist before deletion." + data_source_account_id_before = data_source_before.account_id + assert ( + data_source_account_id_before is not None + ), "The data source linked to the prosumer user should have an account_id before deletion." + # Find assets belonging to the user's organisation asset_query = select(GenericAsset).filter_by(account_id=prosumer_account_id) assets_before = fresh_db.session.scalars(asset_query).all() @@ -184,9 +198,25 @@ def test_delete_user(fresh_db, setup_roles_users_fresh_db, setup_assets_fresh_db .filter_by(event="User Test Prosumer User deleted") .one_or_none() ) - assert user_deletion_audit_log.affected_user_id is None + assert user_deletion_audit_log.affected_user_id == prosumer_id, ( + "Audit log affected_user_id should be preserved (not nullified) " + "after user deletion for lineage purposes." + ) assert user_deletion_audit_log.affected_account_id == prosumer_account_id assert user_deletion_audit_log.active_user_id is None fresh_db.session.refresh(user_creation_audit_log) - assert user_creation_audit_log.affected_user_id is None + assert user_creation_audit_log.affected_user_id == prosumer_id, ( + "Audit log affected_user_id should be preserved (not nullified) " + "after user deletion for lineage purposes." + ) + + # Check that data source lineage is preserved: user_id and account_id are NOT nullified after user deletion + fresh_db.session.expire(data_source_before) + fresh_db.session.refresh(data_source_before) + assert ( + data_source_before.user_id == prosumer_id + ), "Data source user_id should be preserved (not nullified) after user deletion for lineage purposes." + assert ( + data_source_before.account_id == data_source_account_id_before + ), "Data source account_id should be preserved (not nullified) after user deletion for lineage purposes."