厂商反馈:qwen2.5vl的loss base,readme是以GBS64 TP1PP1DP8, gradient accumulation steps是8 在GBS64 TP1PP4DP2, gradient accumulation steps是32,会和readme的base有些差异(不在5%以内了)
厂商反馈:qwen2.5vl的loss base,readme是以GBS64 TP1PP1DP8, gradient accumulation steps是8
在GBS64 TP1PP4DP2, gradient accumulation steps是32,会和readme的base有些差异(不在5%以内了)