Skip to content

Replace Bare Exception Handling with Specific Exception Types #7

@Huneybaked

Description

@Huneybaked

The codebase contains 25+ instances of overly broad except Exception: blocks that catch all exceptions indiscriminately. This makes debugging difficult and can mask serious bugs.

Examples

Example 1: Image Web Task (image/web.py)

try:
    dataset = load_dataset(dataset_id)
except Exception:  # ❌ TOO BROAD
    logger.error(f"Failed to load dataset {dataset_id}")
    return None

Problems:

  • Network errors, auth errors, and malformed data all handled the same way
  • Swallows keyboard interrupts and system exits
  • No actionable error information logged

Example 2: Document Parsing

try:
    parsed_doc = parse_pdf(file_path)
except Exception as e:  # ❌ TOO BROAD
    logger.warning(f"Parse failed: {e}")
    pass  # ❌ Silent failure

Impact

  • Debugging Nightmare - "Something failed" doesn't help
  • Silent Failures - Critical errors hidden
  • Incorrect Recovery - Can't distinguish recoverable vs. fatal errors
  • Masks Bugs - Programming errors treated like expected failures

Affected Files

Based on codebase analysis:

  • sdgsystem/tasks/image/web.py (multiple instances)
  • sdgsystem/documents/parse.py
  • sdgsystem/documents/load.py
  • sdgsystem/huggingface/download.py
  • sdgsystem/parallel.py
  • Others across the codebase

Recommended Fix Pattern

Pattern 1: Specific Exception Types

from requests.exceptions import RequestException, Timeout
from datasets.exceptions import DatasetNotFoundError

try:
    dataset = load_dataset(dataset_id)
except DatasetNotFoundError:
    logger.error(f"Dataset not found: {dataset_id}")
    return None
except Timeout:
    logger.error(f"Timeout loading {dataset_id} - retrying...")
    # Implement retry logic
except RequestException as e:
    logger.error(f"Network error: {e}", exc_info=True)
    raise

Pattern 2: Custom Exceptions

# Define in exceptions.py
class DocumentParseError(Exception):
    """Raised when document parsing fails."""
    pass

class InvalidDatasetError(Exception):
    """Raised when dataset format is invalid."""
    pass

# Use in code
try:
    doc = parse_pdf(path)
except PyMuPDFError as e:
    raise DocumentParseError(f"Failed to parse {path}: {e}") from e

Pattern 3: Fail Fast on Unexpected Errors

try:
    critical_operation()
except KnownRecoverableError:
    handle_gracefully()
except Exception as e:
    # Log full traceback for unexpected errors
    logger.exception(f"Unexpected error in critical_operation: {e}")
    raise  # ✅ Don't swallow unknown errors

Suggested Exception Hierarchy

# sdgsystem/exceptions.py
class SynDataError(Exception):
    """Base exception for all SynData errors."""
    pass

class ConfigurationError(SynDataError):
    """Invalid configuration."""
    pass

class DataGenerationError(SynDataError):
    """Error during data generation."""
    pass

class ModelClientError(SynDataError):
    """LLM/VLM client error."""
    pass

class DocumentProcessingError(SynDataError):
    """Document parsing/chunking error."""
    pass

class DatasetError(SynDataError):
    """Dataset loading/validation error."""
    pass

Implementation Plan

  1. Create sdgsystem/exceptions.py with custom exception hierarchy
  2. Audit all except Exception: occurrences
  3. Replace with specific exception types
  4. Add proper logging with exc_info=True for tracebacks
  5. Document exception handling in CONTRIBUTING.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions