feat: load the model once and keep it loaded#82
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the translation service from using a context manager pattern to a persistent model loading approach, improving performance by loading the model once at thread initialization rather than for each translation request.
Key changes:
- Introduced
load_model()method to initialize tokenizer and translator as instance attributes - Removed the
translate_contextcontext manager and its resource cleanup logic - Modified
translate()to use persistentself.tokenizerandself.translatorinstead of context manager resources
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lib/main.py | Adds call to service.load_model() in the task fetch thread to initialize the model before processing tasks |
| lib/Service.py | Refactors from context manager pattern to persistent model loading with new load_model() method and updated translate() method to use instance attributes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
oleksandr-nc
left a comment
There was a problem hiding this comment.
this PR needs to be rebased after a fix for ROCM PR
I was hoping merging this first would have solved this but probably not. |
Signed-off-by: Anupam Kumar <kyteinsky@gmail.com>
a157e0e to
d676939
Compare
oleksandr-nc
left a comment
There was a problem hiding this comment.
changes looks ok (but did not test this, hopes it works good)
|
works on my machine (TM) |
No description provided.