Skip to content

refactor(services): preserve specific exceptions in service error handlers #1591

@crivetimihai

Description

@crivetimihai

Summary

Service methods like delete_server have broad exception handlers that catch specific exceptions (e.g., ServerNotFoundError) and convert them to generic errors (e.g., ServerError), causing incorrect HTTP status codes.

Related

Problem

The delete_server method in mcpgateway/services/server_service.py has a broad exception handler that catches ServerNotFoundError and converts it to a generic ServerError:

# mcpgateway/services/server_service.py:1305-1375
async def delete_server(self, db: Session, server_id: str, user_email: Optional[str] = None) -> None:
    try:
        server = db.get(DbServer, server_id)
        if not server:
            raise ServerNotFoundError(f"Server not found: {server_id}")  # Line 1308
        # ... deletion logic ...
    except PermissionError as pe:
        db.rollback()
        raise pe
    except Exception as e:  # Line 1361 - catches ServerNotFoundError!
        db.rollback()
        raise ServerError(f"Failed to delete server: {str(e)}")  # Converts to ServerError

Since ServerNotFoundError extends ServerError, and the except Exception block catches everything, the specific ServerNotFoundError is swallowed and re-raised as a generic ServerError, which maps to HTTP 400 instead of HTTP 404.

Current Workaround (PR #1571)

The workaround calls get_server() before delete_server() in the endpoint:

# mcpgateway/main.py:1969
await server_service.get_server(db, server_id)  # Raises ServerNotFoundError -> 404
await server_service.delete_server(db, server_id, user_email=user_email)

This works because get_server() raises ServerNotFoundError outside the delete_server try/except block.

Proposed Fix

Add explicit exception handlers for specific exceptions before the generic except Exception handler:

async def delete_server(self, db: Session, server_id: str, user_email: Optional[str] = None) -> None:
    try:
        server = db.get(DbServer, server_id)
        if not server:
            raise ServerNotFoundError(f"Server not found: {server_id}")
        # ... deletion logic ...
    except PermissionError as pe:
        db.rollback()
        # ... logging ...
        raise pe
    except ServerNotFoundError as snfe:  # NEW: Handle explicitly
        db.rollback()
        self._structured_logger.log(
            level="WARNING",
            message="Server deletion failed - server not found",
            event_type="server_deletion_not_found",
            component="server_service",
            server_id=server_id,
            user_email=user_email,
        )
        raise snfe  # Re-raise as-is to preserve HTTP 404
    except Exception as e:
        db.rollback()
        # ... logging ...
        raise ServerError(f"Failed to delete server: {str(e)}")

Scope

This pattern should be audited across all service methods. Potential candidates:

  1. server_service.py - delete_server, update_server, toggle_server_status
  2. gateway_service.py - Similar methods
  3. tool_service.py - Similar methods
  4. resource_service.py - Similar methods
  5. prompt_service.py - Similar methods

Acceptance Criteria

  • delete_server service method preserves ServerNotFoundError
  • Endpoint returns HTTP 404 for non-existent servers without needing get_server() call
  • Remove redundant get_server() call from delete_server endpoint (optional cleanup)
  • Audit other service methods for same pattern
  • Add unit tests for service-level exception handling

Priority

Low - The workaround in PR #1571 is functional and follows existing patterns (delete_gateway uses the same approach). This is a code quality improvement rather than a bug fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions