Skip to content

perf(spp_programs): replace Python uniqueness checks with SQL constraints#116

Open
kneckinator wants to merge 1 commit into19.0from
ken/optimize_spp_program
Open

perf(spp_programs): replace Python uniqueness checks with SQL constraints#116
kneckinator wants to merge 1 commit into19.0from
ken/optimize_spp_program

Conversation

@kneckinator
Copy link
Contributor

Summary

Replace Python @api.constrains uniqueness checks with SQL UNIQUE constraints on spp.program.membership, spp.cycle.membership, and spp.entitlement. Add a pre-migration script to deduplicate existing data before constraints apply. Bump module version to 19.0.2.1.0.

The Python constrains methods performed per-record search() calls during bulk create, causing O(N²) behavior. With 1M records, this made bulk enrollment and cycle membership creation prohibitively slow. SQL UNIQUE constraints enforce the same invariants at the database level in O(1) per row.


Changes

File Change
models/program_membership.py Added UNIQUE(partner_id, program_id), removed _check_unique_partner_per_program()
models/cycle_membership.py Added UNIQUE(partner_id, cycle_id), removed _check_unique_partner_per_cycle()
models/entitlement_base_model.py Added UNIQUE(code), removed _check_unique_code()
models/entitlement.py Removed duplicate _check_unique_code()
migrations/19.0.2.1.0/pre-migration.py Deduplicates existing data using ROW_NUMBER() OVER (PARTITION BY ...) before constraints apply. Memberships: deletes duplicates keeping oldest. Entitlements: appends -{id} suffix to duplicate codes (preserves financial records).
tests/test_sql_constraints.py 10 tests verifying constraint existence at pg_constraint level, duplicate blocking via raw SQL INSERT, and legitimate multi-record scenarios

Context

This is Phase 1 of 9 in the spp_programs performance optimization effort. SQL constraints are a prerequisite for Phase 7 (INSERT ON CONFLICT bulk membership creation).


Test Plan

  • ./scripts/test_single_module.sh spp_programs passes
  • Verify constraints exist:
    SELECT conname FROM pg_constraint
    WHERE conrelid = 'spp_program_membership'::regclass
      AND contype = 'u';
  • Test migration on a database with existing duplicate data
  • Verify ORM-level duplicate creation raises ValidationError (Odoo wraps IntegrityError)

…ints

Python @api.constrains methods performed per-record search() calls during
bulk create, causing O(N^2) behavior. SQL UNIQUE constraints enforce
uniqueness at the database level in O(1) per row.

Includes pre-migration to deduplicate existing data before constraints apply.
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and data integrity of the spp_programs module by shifting uniqueness enforcement from inefficient Python application-level checks to robust and performant SQL database constraints. This change addresses a critical performance bottleneck in bulk data operations and includes a safe migration path for existing data, ensuring that the system can handle large datasets more efficiently and reliably.

Highlights

  • Performance Optimization: Replaced Python @api.constrains uniqueness checks with SQL UNIQUE constraints for spp.program.membership, spp.cycle.membership, and spp.entitlement to improve performance, especially during bulk operations.
  • Data Deduplication: Introduced a pre-migration script to deduplicate existing data in the database before the new SQL constraints are applied. This script handles program and cycle memberships by deleting duplicates (keeping the oldest) and entitlement codes by appending -{id} to duplicate codes.
  • Module Version Update: Bumped the spp_programs module version from 19.0.2.0.0 to 19.0.2.1.0.
  • Testing: Added comprehensive tests to verify the existence and correct functionality of the new SQL UNIQUE constraints, ensuring duplicates are blocked and valid entries are allowed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great performance improvement, correctly replacing inefficient Python constraints with robust SQL UNIQUE constraints. The inclusion of a pre-migration script to handle existing data demonstrates thoroughness and foresight. The new tests are also excellent, verifying the database-level changes directly. I have one suggestion in the migration script to improve maintainability by reducing code duplication.

Comment on lines +21 to +62
def _deduplicate_program_memberships(cr):
"""Remove duplicate (partner_id, program_id) rows from spp_program_membership."""
cr.execute(
"""
DELETE FROM spp_program_membership
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (
PARTITION BY partner_id, program_id
ORDER BY id
) AS rn
FROM spp_program_membership
) sub
WHERE rn > 1
)
"""
)
if cr.rowcount:
_logger.info("Deduplicated %d duplicate program membership rows", cr.rowcount)


def _deduplicate_cycle_memberships(cr):
"""Remove duplicate (partner_id, cycle_id) rows from spp_cycle_membership."""
cr.execute(
"""
DELETE FROM spp_cycle_membership
WHERE id IN (
SELECT id FROM (
SELECT id,
ROW_NUMBER() OVER (
PARTITION BY partner_id, cycle_id
ORDER BY id
) AS rn
FROM spp_cycle_membership
) sub
WHERE rn > 1
)
"""
)
if cr.rowcount:
_logger.info("Deduplicated %d duplicate cycle membership rows", cr.rowcount)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions _deduplicate_program_memberships and _deduplicate_cycle_memberships contain nearly identical SQL logic. To improve maintainability and reduce code duplication, consider refactoring them into a single, generic helper function. This would make the script more concise and provide a reusable pattern for similar deduplication tasks in future migrations.

Here is an example of how you could refactor it:

def _deduplicate_records(cr, table_name, partition_by_cols, log_message):
    """Generic helper to delete duplicate records from a table."""
    # Using .format() for table/column names is safe here as they are not from user input.
    query = """
        DELETE FROM {table}
        WHERE id IN (
            SELECT id FROM (
                SELECT id,
                       ROW_NUMBER() OVER (
                           PARTITION BY {columns}
                           ORDER BY id
                       ) AS rn
                FROM {table}
            ) sub
            WHERE rn > 1
        )
    """.format(table=table_name, columns=', '.join(partition_by_cols))
    cr.execute(query)
    if cr.rowcount:
        _logger.info(log_message, cr.rowcount)

def _deduplicate_program_memberships(cr):
    """Remove duplicate (partner_id, program_id) rows from spp_program_membership."""
    _deduplicate_records(
        cr,
        "spp_program_membership",
        ["partner_id", "program_id"],
        "Deduplicated %d duplicate program membership rows",
    )

def _deduplicate_cycle_memberships(cr):
    """Remove duplicate (partner_id, cycle_id) rows from spp_cycle_membership."""
    _deduplicate_records(
        cr,
        "spp_cycle_membership",
        ["partner_id", "cycle_id"],
        "Deduplicated %d duplicate cycle membership rows",
    )

This approach centralizes the deletion logic, making the script easier to read and maintain.

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 69.95%. Comparing base (1a91296) to head (3a21bfe).
⚠️ Report is 4 commits behind head on 19.0.

Files with missing lines Patch % Lines
spp_programs/models/entitlement_base_model.py 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             19.0     #116      +/-   ##
==========================================
- Coverage   70.14%   69.95%   -0.20%     
==========================================
  Files         739      783      +44     
  Lines       43997    46568    +2571     
==========================================
+ Hits        30863    32578    +1715     
- Misses      13134    13990     +856     
Flag Coverage Δ
spp_api_v2 79.96% <ø> (ø)
spp_api_v2_change_request 66.66% <ø> (ø)
spp_api_v2_cycles 71.12% <ø> (ø)
spp_api_v2_data 64.41% <ø> (ø)
spp_api_v2_entitlements 70.19% <ø> (ø)
spp_api_v2_gis 71.52% <ø> (ø)
spp_api_v2_products 66.27% <ø> (ø)
spp_api_v2_service_points 70.94% <ø> (ø)
spp_api_v2_simulation 71.12% <ø> (ø)
spp_api_v2_vocabulary 57.26% <ø> (ø)
spp_audit 64.19% <ø> (ø)
spp_base_common 90.26% <ø> (ø)
spp_case_entitlements 97.61% <ø> (ø)
spp_case_programs 97.14% <ø> (ø)
spp_cel_event 85.11% <ø> (?)
spp_claim_169 58.11% <ø> (?)
spp_dci_client_dr 55.87% <ø> (?)
spp_dci_client_ibr 60.17% <ø> (?)
spp_programs 45.50% <80.00%> (-0.02%) ⬇️
spp_security 66.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
spp_programs/__manifest__.py 0.00% <ø> (ø)
spp_programs/models/cycle_membership.py 32.60% <100.00%> (-8.14%) ⬇️
spp_programs/models/entitlement.py 45.88% <ø> (+0.57%) ⬆️
spp_programs/models/program_membership.py 28.77% <100.00%> (-3.20%) ⬇️
spp_programs/models/entitlement_base_model.py 0.00% <0.00%> (ø)

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant