Skip to content

perf(analyses): optimize queries to reduce memory usage#322

Merged
sylvainkalache merged 1 commit intomainfrom
perf/analyses-load-only
Feb 6, 2026
Merged

perf(analyses): optimize queries to reduce memory usage#322
sylvainkalache merged 1 commit intomainfrom
perf/analyses-load-only

Conversation

@sylvainkalache
Copy link
Contributor

Summary

Two query optimizations in the get_analysis endpoint to reduce memory usage:

  1. Select only email column from UserCorrelation instead of loading full objects (was loading all columns for 10K+ members when only .email is needed)
  2. Use load_only(id, uuid) on the 404 fallback query to avoid loading the potentially 30MB+ results JSON column just to build an error message

Expected Impact

  • Memory: Reduces peak memory by 5-50MB per request (depending on org size and analysis results)
  • Query time: ~50-200ms savings on UserCorrelation query for large orgs

Test plan

  • All 18 analysis-related tests pass
  • No regressions in full test suite

Two optimizations in the get_analysis endpoint:

1. Select only email column from UserCorrelation instead of loading
   full objects (was loading all columns for 10K+ members when only
   email is needed)

2. Use load_only(id, uuid) on the 404 fallback query to avoid loading
   the potentially 30MB+ results JSON column just for an error message
@vercel
Copy link

vercel bot commented Feb 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rootly-burnout-detector-dev Ready Ready Preview, Comment Feb 6, 2026 1:55am
rootly-burnout-detector-web Ready Ready Preview, Comment Feb 6, 2026 1:55am

Request Review

@claude
Copy link

claude bot commented Feb 6, 2026

Claude finished @sylvainkalache's task —— View job


PR Review: Query Optimization Analysis

Todo List:

  • Read and analyze the modified file
  • Check for bugs and logic errors
  • Review for security issues
  • Assess code quality concerns
  • Provide final assessment

Analysis Complete

Code Quality Issue Found:

In backend/app/api/endpoints/analyses.py:677-682, the email filtering logic has a subtle bug:

member_emails = [
    row[0] for row in db.query(UserCorrelation.email).filter(
        UserCorrelation.organization_id == analysis.organization_id,
        UserCorrelation.organization_id.isnot(None),
        UserCorrelation.email.isnot(None)
    ).all()
]

Issue: The list comprehension directly uses row[0] without checking if the email is actually non-null. While the query filters for UserCorrelation.email.isnot(None), this only ensures the database column isn't NULL, but doesn't prevent empty strings or other falsy values from being included.

Original code was more robust:

member_emails = [c.email for c in correlations if c.email]

The if c.email condition properly filtered out both NULL and empty string values.

Recommended fix: Add a proper null/empty check:

member_emails = [
    row[0] for row in db.query(UserCorrelation.email).filter(
        UserCorrelation.organization_id == analysis.organization_id,
        UserCorrelation.organization_id.isnot(None),
        UserCorrelation.email.isnot(None)
    ).all()
    if row[0]  # Filter out empty strings
]

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 6, 2026

Greptile Overview

Greptile Summary

This PR implements two query optimizations to reduce memory usage in the get_analysis endpoint:

  1. UserCorrelation query (lines 677-683): Changed from loading full objects to selecting only the email column, avoiding loading unnecessary data for potentially 10K+ members
  2. 404 fallback query (lines 591-593): Added load_only(Analysis.id, Analysis.uuid) to avoid loading the potentially 30MB+ results JSON column when building error messages

Key Changes

  • Memory reduction of 5-50MB per request depending on organization size
  • Query time improvements of 50-200ms for large organizations
  • All 18 analysis-related tests pass

Issues Found

  • The new SQL-level null check on UserCorrelation.email won't filter empty string emails like the old Python-level check did. If any empty emails exist in the database, they would now be included in results.

Confidence Score: 4/5

  • This PR is safe to merge with one minor consideration about empty email handling
  • Score reflects that the optimization is solid and well-tested, but there's a subtle behavioral change in empty string email filtering that should be addressed
  • backend/app/api/endpoints/analyses.py needs attention for the empty email filtering logic

Important Files Changed

Filename Overview
backend/app/api/endpoints/analyses.py Optimized two queries to reduce memory usage by selecting only needed columns and avoiding loading large JSON results column

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +677 to +683
member_emails = [
row[0] for row in db.query(UserCorrelation.email).filter(
UserCorrelation.organization_id == analysis.organization_id,
UserCorrelation.organization_id.isnot(None),
UserCorrelation.email.isnot(None)
).all()
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new SQL-level null check won't filter empty string emails '' like the old Python-level if c.email check did. if empty emails exist in the DB, they'll now be included

Suggested change
member_emails = [
row[0] for row in db.query(UserCorrelation.email).filter(
UserCorrelation.organization_id == analysis.organization_id,
UserCorrelation.organization_id.isnot(None),
UserCorrelation.email.isnot(None)
).all()
]
member_emails = [
row[0] for row in db.query(UserCorrelation.email).filter(
UserCorrelation.organization_id == analysis.organization_id,
UserCorrelation.organization_id.isnot(None),
UserCorrelation.email.isnot(None),
UserCorrelation.email != ''
).all()
]
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/app/api/endpoints/analyses.py
Line: 677:683

Comment:
the new SQL-level null check won't filter empty string emails `''` like the old Python-level `if c.email` check did. if empty emails exist in the DB, they'll now be included

```suggestion
    member_emails = [
        row[0] for row in db.query(UserCorrelation.email).filter(
            UserCorrelation.organization_id == analysis.organization_id,
            UserCorrelation.organization_id.isnot(None),
            UserCorrelation.email.isnot(None),
            UserCorrelation.email != ''
        ).all()
    ]
```

How can I resolve this? If you propose a fix, please make it concise.

@sylvainkalache sylvainkalache merged commit 0f7b8a2 into main Feb 6, 2026
10 checks passed
@sylvainkalache sylvainkalache deleted the perf/analyses-load-only branch February 6, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant