feat: custom callbacks for train() and return TrainOutput plus model_load_time#49
Conversation
…load_time Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
Signed-off-by: Vassilis Vassiliadis <vassilis.vassiliadis@ibm.com>
|
@VassilisVassiliadis can this be closed if you agree with the design and direction of PR #50 ? |
| train_output: "transformers.trainer.TrainOutput" = trainer.train() | ||
|
|
||
| return EnhancedTrainOutput( | ||
| train_output=train_output, | ||
| model_load_time=model_load_time, | ||
| ) |
There was a problem hiding this comment.
@dushyantbehl I need the train() method to return the TrainOutput + model_load_time.
There was a problem hiding this comment.
do you have any specific need for that one metric coming out of the train() function?
Could you not fetch it from the tracker?
There was a problem hiding this comment.
We'd like to programmatically execute a large array of fms-hf-tuning experiments to collect data (things like performance of model, system metrics, etc). Some of these runs may take place on machines which do not have network connectivity. Other runs we may not want to register to AIM at all as they contain experimental code/models/datasets and we wouldn't want to pollute the AIM database with data that we're not sure we'd like to keep around.
As a result, we need to collect the measured metrics (trainoutput + model_load_time) straight from the return value of the train() method.
This PR resolves #33
Changes:
model_load_time