Conversation
|
The original README does have some notes about achievable accuracy, but I don't believe anyone verified those numbers. One thing that could help is using the built-in ConvMixer model in Metalhead, because that has undergone basic convergence testing. |
1333b75 to
7f13463
Compare
| function create_loss_function(dataloader, device) | ||
|
|
||
| function loss(model) |
There was a problem hiding this comment.
Functions which create functions seem like an anti-pattern.
| if use_cuda | ||
| device = gpu | ||
| @info "Training on GPU" | ||
| else |
There was a problem hiding this comment.
This @info will lie to you, unless you edit the file to change use_cuda... simpler just to call gpu and it will do nothing if you don't have one.
| BatchNorm(dim), | ||
| [ | ||
| Chain( | ||
| SkipConnection(Chain(Conv((kernel_size,kernel_size), dim=>dim, gelu; pad=SamePad(), groups=dim), BatchNorm(dim)), +), |
There was a problem hiding this comment.
No functional change to the model, just indents etc.
| opt = OptimiserChain( | ||
| WeightDecay(1f-3), | ||
| ClipNorm(1.0), | ||
| ADAM(η) | ||
| ClipNorm(1f0), | ||
| Adam(η), | ||
| ) |
There was a problem hiding this comment.
This exposes a real error: composing ClipNorm after another optimiser means it tries to take norm(::Broadcasted) which fails:
julia> train(epochs=10, images=128)
ERROR: Scalar indexing is disallowed.
...
Stacktrace:
...
[12] norm(itr::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}, p::Float32)
@ LinearAlgebra ~/julia-9ded051e9f/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:596
[13] apply!(o::Optimisers.ClipNorm{Float32}, state::Nothing, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, dx::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}})
@ Optimisers ~/.julia/packages/Optimisers/kPdJV/src/rules.jl:584
There was a problem hiding this comment.
There was a problem hiding this comment.
Should be fixed, but not yet tested here.
7f13463 to
3a2580b
Compare
|
|
||
| model = model |> cpu | ||
| @save "model.bson" model | ||
| @save "model.bson" cpu(model) |
There was a problem hiding this comment.
Should really be updated to Flux.state too
3a2580b to
ca9b934
Compare
| opt = OptimiserChain( | ||
| WeightDecay(1f-3), # L2 regularisation | ||
| ClipNorm(1f0), | ||
| Adam(3f-4), # learning rate |
There was a problem hiding this comment.
Current practice would be to use AdamW and set L2 regularization there
There was a problem hiding this comment.
Yes, but maybe it's good to show how to chain rules?
| for (x1, y1) in train_loader | ||
| # move one batch at a time to GPU; gpu(train_loader) would be another way | ||
| x, y = gpu(x1), gpu(y1) | ||
| grads = gradient(m -> Flux.logitcrossentropy(m(x), y; agg=sum), model) |
There was a problem hiding this comment.
I just left this, as it probably doesn't hurt... no idea why it's there, maybe it was like that in what someone followed? Showing what the keyword is doesn't seem awful
| cpu_model = cpu(model) | ||
| BSON.@save "model.bson" cpu_model | ||
| BSON.@save "losses.bson" train_save test_save | ||
| # it's generally more robust to save just the state, like this, but |
There was a problem hiding this comment.
but what? I would remove entirely the model saving with bson in favor of state saving with JLD2
There was a problem hiding this comment.
I reverted this because maybe it's not a terrible idea to show some variety of approaches, not repeat exactly the same thing in every model?
The way this model is set up, you'd have to copy-paste the code with the sizes etc. to re-build the model before you can load the state back in. So perhaps leaving this is OK?
| train_save[epoch, :] = [train_loss, train_acc] | ||
| test_save[epoch, :] = [test_loss, test_acc] |
There was a problem hiding this comment.
Can we remove this history saving entirely? seems useless
There was a problem hiding this comment.
It's not a crazy way to collect some numbers, maybe this pattern makes sense to someone instead of others?
This previously didn't have a manifest.
Needs a correction to the translation rule for Optimisers.ClipNorm in Flux.fixed in FluxML/Flux.jl#2145Also, a simple test showed increasing loss, so perhaps something else is broken?