Skip to content

Added per epoch time in log files#1

Closed
quic-abhamidi wants to merge 27 commits intomainfrom
ft_experimental
Closed

Added per epoch time in log files#1
quic-abhamidi wants to merge 27 commits intomainfrom
ft_experimental

Conversation

@quic-abhamidi
Copy link
Owner

Added a TrainerCallback function called epoch timer to log the per epoch timing.

quic-meetkuma and others added 26 commits January 2, 2026 13:14
- Added a logger which will log onto console and file. This code is
similar to existing QEff. Finetuning logger code.
- Also added dist_utils which serves as utility code when dealing with
distributed training.
- Added logger test cases for sanity checks.

---------

Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com>
…quic#645)

- Added functionality to register dataset, model, optimizer, trainer
objects in a registry and fetch the class of given object based on
configuration provided.
- Also, added simple test cases to verify the functionality.

---------

Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com>
)

Adding a Script for Registering and Retrieving Optimizer Classes
The script includes:


get_optimizer()
Returns the optimizer class and kwargs.
Additionally, there is a test_optimizer.py script that validates the
functionality of the optimizer registration and retrieval process.

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
…ong with its test cases. (quic#647)

Edited the SFTDataset class to enable custom dataset loading.
Updated the dataset.py file to only enable support for SFTDataset type.
Created test file to check the functionalities.

---------

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Adding a Script for Registering and Retrieving Callback Classes
It has create_callback() function which creates an instance of callback.
Additionally, there is a test_callbacks.py script that validates the
functionality and retrieval process.

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
)

Added Config_manager to parse the training, model and dataset related
arguments.

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
-  Added Base Model class and HF model class.
- Base Model class will support FT for any custom model and will be a
common skeleton for any model, including any HF model. 
- Added unit tests for these.

---------

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
This PR contains all the changes of PR quic#660 along with all the comments
being addressed. The new PR was created due a rebase issue.

Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Added Readme file for the parameters used in sample config.

---------

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com>
Co-authored-by: Mohit Soni <quic_mohisoni@quicinc.com>
Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
Co-authored-by: vjanfaza <vjanfaza@qti.qualcomm.com>
Co-authored-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Co-authored-by: smedhe <smedhe@qti.qualcomm.com>
Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com>
Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com>
Co-authored-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Co-authored-by: Meet Patel <meetkuma@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <quic_sallabad@quicinc.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
…l config (quic#747)

This PR contain:
1.documentation for new finetune experimental stack.
2. Updates inconfig_manager.py

---------

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com>
Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com>
Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com>
Co-authored-by: vjanfaza <vjanfaza@qti.qualcomm.com>
Co-authored-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Co-authored-by: smedhe <smedhe@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Co-authored-by: Mohit Soni <quic_mohisoni@quicinc.com>
Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com>
Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com>
Co-authored-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Co-authored-by: Meet Patel <meetkuma@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <quic_sallabad@quicinc.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <quic_akuruvil@quicinc.com>
Signed-off-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Signed-off-by: Amit Raj <amitraj@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: abhishek-singh591 <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: asmigosw <asmigosw@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: meetkuma <meetkuma@qti.qualcomm.com>
Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Onkar Chougule <168134249+ochougul@users.noreply.github.com>
Co-authored-by: Mohit Soni <quic_mohisoni@quicinc.com>
Co-authored-by: Mohit Soni <mohisoni@qti.qualcomm.com>
Co-authored-by: vtirumal <vtirumal@qti.qualcomm.com>
Co-authored-by: vjanfaza <vjanfaza@qti.qualcomm.com>
Co-authored-by: smedhe <smedhe@qti.qualcomm.com>
Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com>
Co-authored-by: Abukhoyer Shaik <abukhoye@qti.qualcomm.com>
Co-authored-by: Amit Raj <amitraj@qti.qualcomm.com>
Co-authored-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Co-authored-by: Rishin Raj <rishinr@qti.qualcomm.com>
Co-authored-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Co-authored-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Co-authored-by: Meet Patel <meetkuma@qti.qualcomm.com>
Co-authored-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <quic_sallabad@quicinc.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
…t file (quic#787)

1) Adding text field required by TRL's scripts.
2) Passing config_name in the load_dataset_builder
3) Updated test_dataset accordingly.

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
….py) and related code (quic#791)

1) Added FinetuningPipeline (finetune_experiemental.py) which integrates
all the components added for HF-trainer and enable running fine tuning
through it.
2) Added files to handle PEFT and training config. 
3) Made changes in the config_manager and callbacks files. 
4) Added unit tests for the FinetuningPipeline (test_finetune.py)
5) Updated tests in test_callback and test_config_manager based on above
changes.

Finetuning on openai/gsm8k for 5 epochs on single SOC gave the following
numbers:

{"eval_loss":1.0224987268447876,"eval_runtime":484.8933,"eval_samples_per_second":2.72,"eval_steps_per_second":2.72,"eval_entropy":0.9871161538059735,"eval_num_tokens":6525025.0,"eval_mean_token_accuracy":0.7452040632806826,"epoch":5.0,"num_input_tokens_seen":6525025,"global_step":37365}

{"train_runtime":32856.1501,"train_samples_per_second":1.137,"train_steps_per_second":1.137,"total_flos":3.8132170931712e+16,"train_loss":1.0178058738101043,"epoch":5.0,"num_input_tokens_seen":6525025,"global_step":37365}

Training loss at the start of training :1.5146,

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Signed-off-by: Ann <quic_akuruvil@quicinc.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
* Added PP support in HF trainer stack. 
* Updated the documentation for the same. 
* Sample command to test PP : QAIC_VISIBLE_DEVICES=0,1 python -m
QEfficient.cloud.finetune_experimental
QEfficient/finetune/experimental/configs/sample_pp_config.yaml

---------

Signed-off-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm.com>
Co-authored-by: Swati Allabadi <sallabad@qti.qualcomm>
To run integrated_test for DDP use following command:
QAIC_VISIBLE_DEVICES=0,1 torchrun --nproc-per-node=2 -m pytest -q
QEfficient/finetune/experimental/tests/test_integrated.py

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Modified test_finetune.py
Changed optimizer names

---------

Signed-off-by: Tanisha Chawada <tchawada@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Ann Kuruvilla <akuruvil@qti.qualcomm.com>
…set registration (quic#835)

Added example script for registering seq_completion dataset_type and
also updated the hf_finetune.md.

---------

Signed-off-by: Sharvari Medhe <smedhe@qti.qualcomm.com>
Signed-off-by: Anusha Bhamidipati <abhamidi@qti.qualcomm.com>
@quic-abhamidi quic-abhamidi force-pushed the ft_experimental branch 2 times, most recently from 753fab8 to 35f2ff1 Compare March 16, 2026 10:12
Signed-off-by: Anusha Bhamidipati <abhamidi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants