Skip to content

Why there is a quote #87

@LukeLIN-web

Description

@LukeLIN-web

--gradient_merge_steps $(expr 67584 \/ $batch_size \/ 8)"

there has a quote without end
I modify it as following:

$CMD        --max_predictions_per_seq 80 \
            --learning_rate 5e-5 \
            --weight_decay 0.0 \
            --adam_epsilon 1e-8 \
            --warmup_steps 0 \
            --output_dir ./tmp2/ \
            --logging_steps 10 \
            --save_steps 20000 \
            --input_dir=$DATA_DIR \
            --model_type bert \
            --model_name_or_path bert-base-uncased \
            --batch_size ${batch_size} \
            --use_amp ${use_amp} \
            --gradient_merge_steps $(expr 67584 \/ $batch_size \/ 8)

And it show another problem :
Traceback (most recent call last):
File "./run_pretrain.py", line 439, in
do_train(args)
File "./run_pretrain.py", line 316, in do_train
train_data_loader) * args.num_train_epochs
UnboundLocalError: local variable 'train_data_loader' referenced before assignment

I used https://github.com/PaddlePaddle/Perf/blob/master/Bert/scripts/paddle_base_pre_training.sh
This shell script worked.

what more , I wonder how get 八卡的训练吞吐率(sequences/sec)?
是把八个worklog 都加起来吗? 有没有快速加起来的方法?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions