-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
The codebase contains 25+ instances of overly broad except Exception: blocks that catch all exceptions indiscriminately. This makes debugging difficult and can mask serious bugs.
Examples
Example 1: Image Web Task (image/web.py)
try:
dataset = load_dataset(dataset_id)
except Exception: # ❌ TOO BROAD
logger.error(f"Failed to load dataset {dataset_id}")
return NoneProblems:
- Network errors, auth errors, and malformed data all handled the same way
- Swallows keyboard interrupts and system exits
- No actionable error information logged
Example 2: Document Parsing
try:
parsed_doc = parse_pdf(file_path)
except Exception as e: # ❌ TOO BROAD
logger.warning(f"Parse failed: {e}")
pass # ❌ Silent failureImpact
- Debugging Nightmare - "Something failed" doesn't help
- Silent Failures - Critical errors hidden
- Incorrect Recovery - Can't distinguish recoverable vs. fatal errors
- Masks Bugs - Programming errors treated like expected failures
Affected Files
Based on codebase analysis:
sdgsystem/tasks/image/web.py(multiple instances)sdgsystem/documents/parse.pysdgsystem/documents/load.pysdgsystem/huggingface/download.pysdgsystem/parallel.py- Others across the codebase
Recommended Fix Pattern
Pattern 1: Specific Exception Types
from requests.exceptions import RequestException, Timeout
from datasets.exceptions import DatasetNotFoundError
try:
dataset = load_dataset(dataset_id)
except DatasetNotFoundError:
logger.error(f"Dataset not found: {dataset_id}")
return None
except Timeout:
logger.error(f"Timeout loading {dataset_id} - retrying...")
# Implement retry logic
except RequestException as e:
logger.error(f"Network error: {e}", exc_info=True)
raisePattern 2: Custom Exceptions
# Define in exceptions.py
class DocumentParseError(Exception):
"""Raised when document parsing fails."""
pass
class InvalidDatasetError(Exception):
"""Raised when dataset format is invalid."""
pass
# Use in code
try:
doc = parse_pdf(path)
except PyMuPDFError as e:
raise DocumentParseError(f"Failed to parse {path}: {e}") from ePattern 3: Fail Fast on Unexpected Errors
try:
critical_operation()
except KnownRecoverableError:
handle_gracefully()
except Exception as e:
# Log full traceback for unexpected errors
logger.exception(f"Unexpected error in critical_operation: {e}")
raise # ✅ Don't swallow unknown errorsSuggested Exception Hierarchy
# sdgsystem/exceptions.py
class SynDataError(Exception):
"""Base exception for all SynData errors."""
pass
class ConfigurationError(SynDataError):
"""Invalid configuration."""
pass
class DataGenerationError(SynDataError):
"""Error during data generation."""
pass
class ModelClientError(SynDataError):
"""LLM/VLM client error."""
pass
class DocumentProcessingError(SynDataError):
"""Document parsing/chunking error."""
pass
class DatasetError(SynDataError):
"""Dataset loading/validation error."""
passImplementation Plan
- Create
sdgsystem/exceptions.pywith custom exception hierarchy - Audit all
except Exception:occurrences - Replace with specific exception types
- Add proper logging with
exc_info=Truefor tracebacks - Document exception handling in CONTRIBUTING.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels