Explicit convmixer cifar10 by mcabbott · Pull Request #374 · FluxML/model-zoo

mcabbott · 2022-12-09T18:33:45Z

This previously didn't have a manifest.

~~Needs a correction to the translation rule for Optimisers.ClipNorm in Flux.~~ fixed in FluxML/Flux.jl#2145

Also, a simple test showed increasing loss, so perhaps something else is broken?

ToucheSir · 2022-12-10T23:07:12Z

The original README does have some notes about achievable accuracy, but I don't believe anyone verified those numbers. One thing that could help is using the built-in ConvMixer model in Metalhead, because that has undergone basic convergence testing.

mcabbott · 2023-02-05T15:29:19Z

-function create_loss_function(dataloader, device)
-
-    function loss(model)


Functions which create functions seem like an anti-pattern.

mcabbott · 2023-02-05T15:30:27Z

-    if use_cuda
-        device = gpu
-        @info "Training on GPU"
-    else


This @info will lie to you, unless you edit the file to change use_cuda... simpler just to call gpu and it will do nothing if you don't have one.

mcabbott · 2023-02-05T15:31:08Z

-            BatchNorm(dim),
-            [
-                Chain(
-                    SkipConnection(Chain(Conv((kernel_size,kernel_size), dim=>dim, gelu; pad=SamePad(), groups=dim), BatchNorm(dim)), +),


No functional change to the model, just indents etc.

mcabbott · 2023-02-05T17:44:22Z

+    opt = OptimiserChain(
            WeightDecay(1f-3), 
-            ClipNorm(1.0),
-            ADAM(η)
+            ClipNorm(1f0),
+            Adam(η),
            )


This exposes a real error: composing ClipNorm after another optimiser means it tries to take norm(::Broadcasted) which fails:

julia> train(epochs=10, images=128) ERROR: Scalar indexing is disallowed. ... Stacktrace: ... [12] norm(itr::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}, p::Float32) @ LinearAlgebra ~/julia-9ded051e9f/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:596 [13] apply!(o::Optimisers.ClipNorm{Float32}, state::Nothing, x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, dx::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, NTuple{4, Base.OneTo{Int64}}, typeof(+), Tuple{CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{4}, Nothing, typeof(*), Tuple{Float32, CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}}}}) @ Optimisers ~/.julia/packages/Optimisers/kPdJV/src/rules.jl:584

FluxML/Optimisers.jl#127

Should be fixed, but not yet tested here.

mcabbott · 2023-07-25T14:13:54Z


-    model = model |> cpu
-    @save "model.bson" model 
+    @save "model.bson" cpu(model)


Should really be updated to Flux.state too

CarloLucibello · 2024-03-21T06:44:05Z

+    opt = OptimiserChain(
+            WeightDecay(1f-3),  # L2 regularisation
+            ClipNorm(1f0),
+            Adam(3f-4),  # learning rate


Current practice would be to use AdamW and set L2 regularization there

Yes, but maybe it's good to show how to chain rules?

CarloLucibello · 2024-03-21T06:44:44Z

+        for (x1, y1) in train_loader
+            # move one batch at a time to GPU; gpu(train_loader) would be another way
+            x, y = gpu(x1), gpu(y1)
+            grads = gradient(m -> Flux.logitcrossentropy(m(x), y; agg=sum), model)


why agg=sum?

I just left this, as it probably doesn't hurt... no idea why it's there, maybe it was like that in what someone followed? Showing what the keyword is doesn't seem awful

CarloLucibello · 2024-03-21T06:47:15Z

+    cpu_model = cpu(model)
+    BSON.@save "model.bson" cpu_model
+    BSON.@save "losses.bson" train_save test_save
+    # it's generally more robust to save just the state, like this, but 


but what? I would remove entirely the model saving with bson in favor of state saving with JLD2

I reverted this because maybe it's not a terrible idea to show some variety of approaches, not repeat exactly the same thing in every model?

The way this model is set up, you'd have to copy-paste the code with the sizes etc. to re-build the model before you can load the state back in. So perhaps leaving this is OK?

CarloLucibello · 2024-03-21T06:48:47Z

+        train_save[epoch, :] = [train_loss, train_acc]
+        test_save[epoch, :] = [test_loss, test_acc]


Can we remove this history saving entirely? seems useless

It's not a crazy way to collect some numbers, maybe this pattern makes sense to someone instead of others?

mcabbott added the update making a model work with latest Flux, etc label Dec 11, 2022

mcabbott mentioned this pull request Dec 28, 2022

Fix two bugs re setup FluxML/Flux.jl#2145

Merged

mcabbott force-pushed the explicit_convmixer_cifar10 branch 2 times, most recently from 1333b75 to 7f13463 Compare February 5, 2023 15:39

mcabbott commented Feb 5, 2023

View reviewed changes

mcabbott force-pushed the explicit_convmixer_cifar10 branch from 7f13463 to 3a2580b Compare July 25, 2023 14:10

mcabbott commented Jul 25, 2023

View reviewed changes

mcabbott added 6 commits March 20, 2024 11:11

upgrade to explicit

c1c2e8a

add project + manifest

6c6aea3

tweaks

af94623

update

d5cba02

flux 0.14+

bc54245

update

ca9b934

mcabbott force-pushed the explicit_convmixer_cifar10 branch from 3a2580b to ca9b934 Compare March 20, 2024 16:06

update

354e966

mcabbott marked this pull request as ready for review March 20, 2024 22:17

mcabbott requested a review from CarloLucibello March 21, 2024 02:25

CarloLucibello reviewed Mar 21, 2024

View reviewed changes

		function create_loss_function(dataloader, device)

		function loss(model)

		train_save[epoch, :] = [train_loss, train_acc]
		test_save[epoch, :] = [test_loss, test_acc]

Uh oh!

Conversation

mcabbott commented Dec 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ToucheSir commented Dec 10, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mcabbott commented Dec 9, 2022 •

edited

Loading