Skip to content

fix(user): Add email-based user lookup to prevent duplicates (#719)#721

Merged
manavgup merged 1 commit intomainfrom
fix/user-email-lookup-719
Dec 5, 2025
Merged

fix(user): Add email-based user lookup to prevent duplicates (#719)#721
manavgup merged 1 commit intomainfrom
fix/user-email-lookup-719

Conversation

@manavgup
Copy link
Owner

@manavgup manavgup commented Dec 5, 2025

Summary

  • Added get_by_email() method to UserRepository for email-based user lookup
  • Added get_user_by_email() and get_or_create_by_email() methods to UserService
  • Updated MCP auth to use service layer instead of direct DB queries
  • Changed exception handling to catch specific exceptions (DomainError, RepositoryError) instead of general Exception

Problem

When using trusted proxy authentication with the X-Authenticated-User header, users were being looked up by ibm_id instead of email. This caused duplicate users when the existing user had a different ibm_id than their email address.

Solution

The get_or_create_by_email() method now:

  1. First looks up the user by email address
  2. Only creates a new user if not found by email
  3. Uses email as the ibm_id for new users created via trusted proxy

This prevents duplicate user creation while maintaining backward compatibility with existing users.

Test plan

  • Verify existing user lookup works by email
  • Verify new user creation when no matching email exists
  • Verify MCP authentication returns correct user data
  • Verify no duplicate users are created during authentication

Fixes #719

🤖 Generated with Claude Code

When using trusted proxy authentication with X-Authenticated-User header,
users were being looked up by ibm_id instead of email. This caused duplicate
users when the existing user had a different ibm_id than their email.

Changes:
- Add get_by_email() method to UserRepository for email-based lookup
- Add get_user_by_email() and get_or_create_by_email() to UserService
- Update MCP auth to use service layer instead of direct DB queries
- Catch specific exceptions (DomainError, RepositoryError) instead of general Exception

The get_or_create_by_email() method first checks if a user exists by email,
and only creates a new user if not found. This prevents duplicates when
authenticating via trusted proxy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

  1. Click the green Code button above
  2. Select the Codespaces tab
  3. Click Create codespace on fix/user-email-lookup-719
  4. Wait 2-3 minutes for environment setup
  5. Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

  1. Install Docker Desktop
  2. Install VS Code
  3. Install the Dev Containers extension
  4. Clone this PR branch locally
  5. Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout fix/user-email-lookup-719

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:


This automated message helps reviewers quickly set up the development environment.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

Code Review - PR #721: Email-Based User Lookup

Overview

This PR adds email-based user lookup functionality to prevent duplicate user creation in trusted proxy authentication scenarios. The implementation follows the repository pattern and service layer architecture consistently.


Strengths

  1. Excellent Architecture Adherence

    • Properly follows the repository → service → controller pattern
    • Removes direct DB queries from mcp_server/auth.py (lines 345-374)
    • Implements separation of concerns consistently
  2. Robust Error Handling

    • Uses specific exception types (DomainError, RepositoryError) instead of generic Exception (auth.py:366)
    • Proper exception propagation through layers
    • Clear error messages with context
  3. Defensive Programming

    • get_or_create_by_email() includes defensive checks for user defaults (user_service.py:99-123)
    • Automatic recovery when users exist but lack required templates
    • Follows the same pattern as existing get_or_create_user() method
  4. Good Documentation

    • Clear docstrings explaining the "why" (trusted proxy scenarios)
    • Helpful inline comments about email-based lookup logic
    • PR description clearly explains the problem and solution

⚠️ Critical Issue: Missing Test Coverage

The PR currently has ZERO test coverage for the new methods. This is a critical gap that must be addressed.

Required Tests:

1. Repository Layer Tests (test_user_repository.py):

@pytest.mark.unit
def test_get_by_email_success():
    # Test successful user lookup by email
    
@pytest.mark.unit
def test_get_by_email_not_found():
    # Test NotFoundError when email doesn't exist
    
@pytest.mark.unit
def test_get_by_email_database_error():
    # Test RepositoryError on database failures

2. Service Layer Tests (tests/unit/services/test_user_service.py):

@pytest.mark.unit
def test_get_user_by_email_success():
    # Test successful email lookup via service
    
@pytest.mark.unit
def test_get_or_create_by_email_existing_user():
    # Test finding existing user by email
    
@pytest.mark.unit
def test_get_or_create_by_email_creates_new_user():
    # Test creating new user when email not found
    
@pytest.mark.unit
def test_get_or_create_by_email_recovery_on_missing_defaults():
    # Test defensive reinitialization when user lacks templates

3. Integration Tests (tests/integration/test_user_database.py):

@pytest.mark.integration
def test_get_by_email_with_real_database():
    # Test email lookup with actual database
    
@pytest.mark.integration
def test_get_or_create_by_email_prevents_duplicates():
    # Test that calling twice with same email doesn't create duplicates

4. MCP Auth Tests (tests/unit/test_mcp_auth.py):

@pytest.mark.unit
async def test_authenticate_trusted_user_existing():
    # Test trusted proxy auth finds existing user by email
    
@pytest.mark.unit
async def test_authenticate_trusted_user_creates_new():
    # Test trusted proxy auth creates new user if not found

According to the project's CLAUDE.md:

"Testing Strategy: 947+ automated tests... Write tests for new features (unit + integration)"


🔍 Code Quality Observations

1. Potential Race Condition (user_service.py:96-132)

In get_or_create_by_email(), there's a TOCTOU (time-of-check-time-of-use) race condition:

try:
    existing_user = self.user_repository.get_by_email(email)  # Check
    # ... defensive initialization ...
    return existing_user
except NotFoundError:
    return self.create_user(...)  # Create

Issue: If two requests call this method simultaneously with the same new email, both could pass the get_by_email() check and attempt to create a user, causing an AlreadyExistsError.

Recommendation: Add exception handling for AlreadyExistsError in the except block:

except NotFoundError:
    try:
        return self.create_user(...)
    except AlreadyExistsError:
        # Another request created the user, fetch it
        return self.user_repository.get_by_email(email)

2. Email Validation (user_service.py:74, 128)

The method accepts EmailStr (Pydantic validated) but also calls email.split("@")[0] without checking for empty result.

Edge Case: If somehow an invalid email like "@example.com" passes validation, split("@")[0] returns empty string.

Recommendation: Add defensive check:

display_name = name if name else (email.split("@")[0] or "User")

3. Logging Consistency (user_service.py:71, 129)

Minor: Use structured logging with context for better traceability:

# Current
logger.info("Fetching user with email: %s", email)

# Better
logger.info("Fetching user by email", extra={"email": email})

Per CLAUDE.md:

"Enhanced Logging (Issue #218): Use structured logging with context tracking"

4. Transaction Management (user_service.py:96-132)

The defensive recovery logic calls initialize_user_defaults() which may modify the database, but there's no explicit transaction management. If the recovery fails, the user is returned in an inconsistent state.

Recommendation: Consider adding transaction rollback on recovery failure:

except Exception as e:
    self.db.rollback()  # Ensure clean state
    logger.error("Failed to recover user %s: %s", existing_user.id, str(e))
    raise ValidationError(...)

📊 Performance Considerations

  1. Additional Query Overhead

    • The defensive check in get_or_create_by_email() adds 1 extra query to fetch templates (line 100)
    • This is acceptable for correctness but adds ~10-20ms per call
    • Consider caching user defaults if this becomes a bottleneck
  2. Index Usage (user_repository.py:98)

    • filter(User.email == email) should use the ix_users_email index
    • Verify index exists: CREATE INDEX ix_users_email ON users(email);

🔒 Security Review

  1. ✅ No SQL Injection: Uses ORM parameterized queries
  2. ✅ Exception Handling: Properly catches and logs auth failures
  3. ✅ Input Validation: EmailStr validates email format
  4. ⚠️ Timing Attacks: Not a concern here (no secrets compared)

📝 Documentation & Style

  1. ✅ Follows CLAUDE.md conventions:

    • 120 character line length respected
    • Type hints present
    • Docstrings follow project style
  2. Minor: Consider adding a migration note to CHANGELOG.md about the new email-based lookup behavior


🎯 Recommendations Summary

Must Fix (Blocking):

  1. Add comprehensive test coverage (repository, service, integration, MCP auth)
  2. ⚠️ Handle race condition in get_or_create_by_email()

Should Fix (Non-blocking):
3. 💡 Improve email split edge case handling
4. 💡 Add explicit transaction rollback on recovery failure
5. 💡 Use structured logging for better traceability

Nice to Have:
6. 📖 Add CHANGELOG entry
7. 🔍 Verify ix_users_email index exists in migrations


Approval Status

Status: ⏸️ Changes Requested

This is high-quality code that solves a real problem with a clean architectural approach. However, the complete lack of test coverage is a critical issue that must be addressed before merging.

According to the project's testing standards (CLAUDE.md):

"947+ automated tests... Write tests for new features (unit + integration)"

Once tests are added, this PR will be ready to merge. The code itself is well-structured and follows best practices.


📚 References

@manavgup manavgup merged commit 172e467 into main Dec 5, 2025
24 checks passed
@manavgup manavgup deleted the fix/user-email-lookup-719 branch December 5, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add get_by_email lookup to UserService for consistent user resolution

1 participant

Comments