How was number of trainable parameters calculated?

I have tried to re-implement the architecture described in the paper exactly, just in TensorFlow. But don't get the correct number of trainable parameters. I can't find where this is calculated, so I was hoping someone could help me out.

**Paper**: 
56 layer: 1.5 mil.
103 layer: 9.4 mil

**My implementation**:
56 layer: 1.4 mil
103 layer: 9.2 mil

The discrepancy is small, so normally I wouldn't care, but I can't quite get the same performance results as in the paper, so perhaps this could help reveal any bugs in my code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How was number of trainable parameters calculated? #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How was number of trainable parameters calculated? #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions