Skip to content

fix: replace bare except with except Exception in tokenizer#840

Open
harshadkhetpal wants to merge 1 commit intolightly-ai:mainfrom
harshadkhetpal:fix/bare-except-tokenizer
Open

fix: replace bare except with except Exception in tokenizer#840
harshadkhetpal wants to merge 1 commit intolightly-ai:mainfrom
harshadkhetpal:fix/bare-except-tokenizer

Conversation

@harshadkhetpal
Copy link
Copy Markdown

@harshadkhetpal harshadkhetpal commented Mar 25, 2026

Summary

Replace bare except: with except Exception: in tokenizer.py (line 199).

Bare except: catches SystemExit, KeyboardInterrupt, and GeneratorExit in addition to regular exceptions. This except clause catches a ValueError from list.index(), so except Exception: is the correct narrower alternative.

Change:

# Before
                except:
                    new_word.extend(word[i:])

# After
                except Exception:
                    new_word.extend(word[i:])

Testing

No behavior change for normal exceptions — style/correctness fix only.

Summary by CodeRabbit

  • Refactor
    • Enhanced exception handling in the vision encoder tokenizer for improved application stability and reliability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

The exception handler in SimpleTokenizer.bpe was refined to catch only Exception instances instead of all throwables via bare except:. This prevents suppression of system-level interrupts like KeyboardInterrupt and SystemExit while maintaining existing fallback behavior for word tokenization errors.

Changes

Cohort / File(s) Summary
Exception Handling Tightening
lightly_studio/src/lightly_studio/vendor/perception_encoder/vision_encoder/tokenizer.py
Narrowed exception handler in SimpleTokenizer.bpe from bare except: to except Exception:, allowing system interrupts to propagate while preserving error recovery logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A tokenizer's heart beats true,
No more bare excepts to catch the dew,
System signals now slip through,
Exceptions caught—just the right few! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the primary change: replacing a bare except clause with a more specific exception handler in the tokenizer module.
Description check ✅ Passed The description covers the change summary, rationale, and code examples clearly. However, it does not address the CHANGELOG.md update requirement or testing details from the template.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lightly_studio/src/lightly_studio/vendor/perception_encoder/vision_encoder/tokenizer.py (1)

195-201: ⚠️ Potential issue | 🟡 Minor

Catch ValueError instead of broad Exception.

The word.index(first, i) call on line 196 raises ValueError when the element is not found. Using except Exception: can unintentionally swallow unrelated errors.

Proposed fix
-                except Exception:
+                except ValueError:
                     new_word.extend(word[i:])
                     break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@lightly_studio/src/lightly_studio/vendor/perception_encoder/vision_encoder/tokenizer.py`
around lines 195 - 201, Replace the broad except Exception around the
word.index(first, i) call with except ValueError so only the "not found" case is
caught; update the try/except that uses variables word, first, new_word, and i
to catch ValueError and otherwise let other exceptions propagate, ensuring you
still extend new_word with word[i:] and break in the ValueError branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@lightly_studio/src/lightly_studio/vendor/perception_encoder/vision_encoder/tokenizer.py`:
- Around line 195-201: Replace the broad except Exception around the
word.index(first, i) call with except ValueError so only the "not found" case is
caught; update the try/except that uses variables word, first, new_word, and i
to catch ValueError and otherwise let other exceptions propagate, ensuring you
still extend new_word with word[i:] and break in the ValueError branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 227f4e01-4cbb-43ba-9fc0-ebd0e46d7185

📥 Commits

Reviewing files that changed from the base of the PR and between c46c3f7 and 9955aee.

📒 Files selected for processing (1)
  • lightly_studio/src/lightly_studio/vendor/perception_encoder/vision_encoder/tokenizer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants