Skip to content

Locally connected layer#1271

Closed
jackculpepper wants to merge 9 commits intoBVLC:devfrom
jackculpepper:local
Closed

Locally connected layer#1271
jackculpepper wants to merge 9 commits intoBVLC:devfrom
jackculpepper:local

Conversation

@jackculpepper
Copy link
Copy Markdown
Contributor

This PR implements a locally connected layer, as described in [1], for example.

It's similar to convolution, but there is no weight sharing.

Ye Lu and I worked on this together. Working code is mostly his. Bugs are mine.

We wrote this a while back, before the convolution layer supported non-square filters and strides. I can add support for that if there is interest.

Looking forward to reading your comments.

[1] Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR 2014. pdf

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper I have have tried this layer in siamese network, it seem to work absolutely fine if I have 1 such layer in network (1 on each side of siamese network), but if I try to add another locally connected layer network stops to learn. Do I miss something? I have tried different combinations, inserting relu or another usual convolution layer between locally connected layers, vary learning rate, result is the same - accuracy drops and network does not learn, I see very small values at the end of network.
btw, regarding the paper you quoted, it seems there should be stride 2 in locally connected L5 layer there?

@jackculpepper
Copy link
Copy Markdown
Contributor Author

@okn2020 Thanks for trying! Can you share your prototxt files?

I just committed an example prototxt for mnist with two local layers chained together. It gets 98.53%.

I1017 18:16:55.952603  5675 net.cpp:103] Top shape: 100 20 24 24 (1152000)
I1017 18:16:55.952630  5675 layer_factory.hpp:78] Creating layer pool1
I1017 18:16:55.952656  5675 net.cpp:67] Creating Layer pool1
I1017 18:16:55.952667  5675 net.cpp:394] pool1 <- conv1
I1017 18:16:55.952682  5675 net.cpp:356] pool1 -> pool1
I1017 18:16:55.952699  5675 net.cpp:96] Setting up pool1
I1017 18:16:55.952713  5675 net.cpp:103] Top shape: 100 20 12 12 (288000)
I1017 18:16:55.952726  5675 layer_factory.hpp:78] Creating layer local1
I1017 18:16:55.952747  5675 net.cpp:67] Creating Layer local1
I1017 18:16:55.952759  5675 net.cpp:394] local1 <- pool1
I1017 18:16:55.952775  5675 net.cpp:356] local1 -> local1
I1017 18:16:55.952791  5675 net.cpp:96] Setting up local1
I1017 18:16:55.954891  5675 net.cpp:103] Top shape: 100 5 8 8 (32000)
I1017 18:16:55.954913  5675 layer_factory.hpp:78] Creating layer relu1
I1017 18:16:55.954923  5675 net.cpp:67] Creating Layer relu1
I1017 18:16:55.954931  5675 net.cpp:394] relu1 <- local1
I1017 18:16:55.954941  5675 net.cpp:345] relu1 -> local1 (in-place)
I1017 18:16:55.954952  5675 net.cpp:96] Setting up relu1
I1017 18:16:55.954957  5675 net.cpp:103] Top shape: 100 5 8 8 (32000)
I1017 18:16:55.954964  5675 layer_factory.hpp:78] Creating layer local2
I1017 18:16:55.954972  5675 net.cpp:67] Creating Layer local2
I1017 18:16:55.954978  5675 net.cpp:394] local2 <- local1
I1017 18:16:55.954990  5675 net.cpp:356] local2 -> local2
I1017 18:16:55.955000  5675 net.cpp:96] Setting up local2
I1017 18:16:55.955235  5675 net.cpp:103] Top shape: 100 10 4 4 (16000)
I1017 18:16:55.955250  5675 layer_factory.hpp:78] Creating layer relu2
I1017 18:16:55.955260  5675 net.cpp:67] Creating Layer relu2
I1017 18:16:55.955267  5675 net.cpp:394] relu2 <- local2
I1017 18:16:55.955276  5675 net.cpp:345] relu2 -> local2 (in-place)
I1017 18:16:55.955283  5675 net.cpp:96] Setting up relu2
I1017 18:16:55.955289  5675 net.cpp:103] Top shape: 100 10 4 4 (16000)
I1017 18:16:55.955296  5675 layer_factory.hpp:78] Creating layer ip1
I1017 18:16:55.955306  5675 net.cpp:67] Creating Layer ip1
I1017 18:16:55.955312  5675 net.cpp:394] ip1 <- local2
I1017 18:16:55.955322  5675 net.cpp:356] ip1 -> ip1
I1017 18:16:55.955333  5675 net.cpp:96] Setting up ip1
I1017 18:16:55.956409  5675 net.cpp:103] Top shape: 100 500 1 1 (50000)
...
I1017 18:30:06.766150  5675 solver.cpp:419] Iteration 9800, lr = 0.00599102
I1017 18:30:13.340733  5675 solver.cpp:207] Iteration 9900, loss = 0.0082955
I1017 18:30:13.340785  5675 solver.cpp:222]     Train net output #0: loss = 0.0082955 (* 1 = 0.0082955 loss)
I1017 18:30:13.340795  5675 solver.cpp:419] Iteration 9900, lr = 0.00596843
I1017 18:30:19.859530  5675 solver.cpp:333] Snapshotting to examples/mnist/lenet_iter_10000.caffemodel
I1017 18:30:19.864722  5675 solver.cpp:340] Snapshotting solver state to examples/mnist/lenet_iter_10000.solverstate
I1017 18:30:19.914976  5675 solver.cpp:244] Iteration 10000, loss = 0.0103549
I1017 18:30:19.915000  5675 solver.cpp:263] Iteration 10000, Testing net (#0)
I1017 18:30:27.247264  5675 solver.cpp:314]     Test net output #0: accuracy = 0.9853
I1017 18:30:27.247316  5675 solver.cpp:314]     Test net output #1: loss = 0.046849 (* 1 = 0.046849 loss)
I1017 18:30:27.247326  5675 solver.cpp:249] Optimization Done.
I1017 18:30:27.247333  5675 caffe.cpp:121] Optimization Done.

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

network is almost the same as in the paper you quoted, I am not sure about fillers and first MVN layer, maybe you could suggest right numbers or correct layers?
In any case net below seem to converge (at least till 89% on my data), but if I replace l5 or l6 or both with type LOCAL_WEIGHTED_CONVOLUTION and local_weighted_convolution_param instead of convolution_param, keeping everything else untouched, net seem to report loss .25 all the time and accuracy raises to 60%, then drop to 50% and stays there..

name: "pt_train"
layers {
name: "pair_data"
type: DATA
top: "pair_data"
top: "sim"
data_param {
source: "dual_train"
scale: 0.00390625
batch_size: 128
}
include: { phase: TRAIN }
}
layers {
name: "pair_data"
type: DATA
top: "pair_data"
top: "sim"
data_param {
source: "dual_test"
scale: 0.00390625
batch_size: 100
}
include: { phase: TEST }
}
layers {
name: "slice_pair"
type: SLICE
bottom: "pair_data"
top: "data"
top: "data_p"
slice_param {
slice_dim: 1
slice_point: 1
}
}

layers {
name: "norm_1"
bottom: "data"
top: "data_norm"
type: MVN
}
layers {
name: "conv_1"
bottom: "data_norm"
top: "conv1"
type: CONVOLUTION
convolution_param {
num_output: 32
kernel_size: 11
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "conv_1_w"
param: "conv_1_b"
}
layers {
name: "relu_1"
bottom: "conv1"
top: "conv1"
type: RELU
}
layers {
name: "maxp_2"
bottom: "conv1"
top: "pool2"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm2"
type: LRN
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "c_3"
bottom: "norm2"
top: "conv3"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "c_3_w"
param: "c_3_b"
}
layers {
name: "relu_3"
bottom: "conv3"
top: "conv3"
type: RELU
}
layers {
name: "norm3"
type: LRN
bottom: "conv3"
top: "norm3"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "l_4"
bottom: "norm3"
top: "lconv4"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_4_w"
param: "l_4_b"
}
layers {
name: "relu_4"
bottom: "lconv4"
top: "lconv4"
type: RELU
}
layers {
name: "l_5"
bottom: "lconv4"
top: "lconv5"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_5_w"
param: "l_5_b"
}
layers {
name: "relu_5"
bottom: "lconv5"
top: "lconv5"
type: RELU
}
layers {
name: "l_6"
bottom: "lconv5"
top: "lconv6"
type: LOCAL_WEIGHTED_CONVOLUTION
local_weighted_convolution_param {
num_output: 16
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_6_w"
param: "l_6_b"
}
layers {
name: "relu_6"
bottom: "lconv6"
top: "lconv6"
type: RELU
}
layers {
name: "f_7"
bottom: "lconv6"
top: "features7"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "f_7_w"
param: "f_7_b"
}
layers {
bottom: "features7"
top: "features7"
name: "relu5"
type: RELU
}
#layers {

name: "dropout7"

type: DROPOUT

bottom: "features7"

top: "features7"

dropout_param {

dropout_ratio: 0.5

}

#}

layers {
name: "norm_1_p"
bottom: "data_p"
top: "data_norm_p"
type: MVN
}
layers {
name: "conv_1_p"
bottom: "data_norm_p"
top: "conv1_p"
type: CONVOLUTION
convolution_param {
num_output: 32
kernel_size: 11
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "conv_1_w"
param: "conv_1_b"
}
layers {
name: "relu_1_p"
bottom: "conv1_p"
top: "conv1_p"
type: RELU
}
layers {
name: "maxp_2_p"
bottom: "conv1_p"
top: "pool2_p"
type: POOLING
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm2_p"
type: LRN
bottom: "pool2_p"
top: "norm2_p"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "c_3_p"
bottom: "norm2_p"
top: "conv3_p"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "c_3_w"
param: "c_3_b"
}
layers {
name: "relu_3_p"
bottom: "conv3_p"
top: "conv3_p"
type: RELU
}
layers {
name: "norm3_p"
type: LRN
bottom: "conv3_p"
top: "norm3_p"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "l_4_p"
bottom: "norm3_p"
top: "lconv4_p"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_4_w"
param: "l_4_b"
}
layers {
name: "relu_4_p"
bottom: "lconv4_p"
top: "lconv4_p"
type: RELU
}
layers {
name: "l_5_p"
bottom: "lconv4_p"
top: "lconv5_p"
type: CONVOLUTION
convolution_param {
num_output: 16
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_5_w"
param: "l_5_b"
}
layers {
name: "relu_5_p"
bottom: "lconv5_p"
top: "lconv5_p"
type: RELU
}
layers {
name: "l_6_p"
bottom: "lconv5_p"
top: "lconv6_p"
type: LOCAL_WEIGHTED_CONVOLUTION
local_weighted_convolution_param {
num_output: 16
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "l_6_w"
param: "l_6_b"
}
layers {
name: "relu_6_p"
bottom: "lconv6_p"
top: "lconv6_p"
type: RELU
}
layers {
name: "f_7_p"
bottom: "lconv6_p"
top: "features7_p"
type: INNER_PRODUCT
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
}
}
blobs_lr: 1
blobs_lr: 2
param: "f_7_w"
param: "f_7_b"
}
layers {
bottom: "features7_p"
top: "features7_p"
name: "relu5_p"
type: RELU
}
#layers {

name: "dropout7_p"

type: DROPOUT

bottom: "features7_p"

top: "features7_p"

dropout_param {

dropout_ratio: 0.5

}

#}

layers {
name: "loss"
type: CONTRASTIVE_LOSS
contrastive_loss_param {
margin: 1.0
}
bottom: "features7"
bottom: "features7_p"
bottom: "sim"
top: "loss"
}

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

dropout was commented out

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper see prototxt above

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper "but if I replace l5 or l6 or both" - I mean l4 or l5 or both

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper solver:

test_initialization: false

debug_info: true

net: "train.prototxt"
test_iter: 2129
test_interval: 3117
momentum: 0.95
weight_decay: 0.0005
base_lr: 0.0005
lr_policy: "fixed"
display: 312
max_iter: 10000000
snapshot: 15586
snapshot_prefix: "siamese"
solver_mode: GPU
solver_type: NESTEROV

i also added another contrastive loss after the first local layer, because
it seems to plateau otherwise
@jackculpepper
Copy link
Copy Markdown
Contributor Author

Thanks.

Have you tried MNIST?

I've had the problem you describe before. I am not 100% sure, but I don't think it's a bug in the local layer. I think it's a plateau.

I just committed a version with two local layers and two contrastive losses. One contrastive loss is at the top. The other is after the first local layer, to try and pull those weights into a good regime. This kind of thing is described in [1].

It descends, but it does take a while. See the snippets from the log below.

[1] Going deeper with convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. http://arxiv.org/abs/1409.4842

...
I1017 19:05:48.546738 25233 solver.cpp:41] Solver scaffolding done.
I1017 19:05:48.546759 25233 solver.cpp:160] Solving mnist_siamese_train_test
I1017 19:05:48.546847 25233 solver.cpp:263] Iteration 0, Testing net (#0)
I1017 19:06:03.534447 25233 solver.cpp:314]     Test net output #0: loss1 = 0.453118 (* 1 = 0.453118 loss)
I1017 19:06:03.534485 25233 solver.cpp:314]     Test net output #1: loss2 = 0.45435 (* 1 = 0.45435 loss)
I1017 19:06:03.663257 25233 solver.cpp:207] Iteration 0, loss = 0.811462
I1017 19:06:03.663280 25233 solver.cpp:222]     Train net output #0: loss1 = 0.405212 (* 1 = 0.405212 loss)
I1017 19:06:03.663291 25233 solver.cpp:222]     Train net output #1: loss2 = 0.40625 (* 1 = 0.40625 loss)
I1017 19:06:03.663326 25233 solver.cpp:419] Iteration 0, lr = 0.01
I1017 19:06:17.022279 25233 solver.cpp:207] Iteration 100, loss = 0.552313
I1017 19:06:17.022305 25233 solver.cpp:222]     Train net output #0: loss1 = 0.107 (* 1 = 0.107 loss)
I1017 19:06:17.022316 25233 solver.cpp:222]     Train net output #1: loss2 = 0.445312 (* 1 = 0.445312 loss)
I1017 19:06:17.022325 25233 solver.cpp:419] Iteration 100, lr = 0.00992565
I1017 19:06:30.377449 25233 solver.cpp:207] Iteration 200, loss = 0.564269
I1017 19:06:30.377717 25233 solver.cpp:222]     Train net output #0: loss1 = 0.111144 (* 1 = 0.111144 loss)
I1017 19:06:30.377732 25233 solver.cpp:222]     Train net output #1: loss2 = 0.453125 (* 1 = 0.453125 loss)
I1017 19:06:30.377740 25233 solver.cpp:419] Iteration 200, lr = 0.00985258
...
I1017 19:44:53.017202 25233 solver.cpp:419] Iteration 14300, lr = 0.00513801
I1017 19:45:06.370012 25233 solver.cpp:207] Iteration 14400, loss = 0.463566
I1017 19:45:06.370069 25233 solver.cpp:222]     Train net output #0: loss1 = 0.0053643 (* 1 = 0.0053643 loss)
I1017 19:45:06.370080 25233 solver.cpp:222]     Train net output #1: loss2 = 0.458201 (* 1 = 0.458201 loss)
I1017 19:45:06.370090 25233 solver.cpp:419] Iteration 14400, lr = 0.00512221
I1017 19:45:19.590240 25233 solver.cpp:263] Iteration 14500, Testing net (#0)
I1017 19:45:34.537138 25233 solver.cpp:314]     Test net output #0: loss1 = 0.0253807 (* 1 = 0.0253807 loss)
I1017 19:45:34.537160 25233 solver.cpp:314]     Test net output #1: loss2 = 0.427178 (* 1 = 0.427178 loss)
I1017 19:45:34.663530 25233 solver.cpp:207] Iteration 14500, loss = 0.44025
I1017 19:45:34.663552 25233 solver.cpp:222]     Train net output #0: loss1 = 0.0181287 (* 1 = 0.0181287 loss)
I1017 19:45:34.663563 25233 solver.cpp:222]     Train net output #1: loss2 = 0.422121 (* 1 = 0.422121 loss)
I1017 19:45:34.663573 25233 solver.cpp:419] Iteration 14500, lr = 0.00510653
I1017 19:45:48.026561 25233 solver.cpp:207] Iteration 14600, loss = 0.137522
I1017 19:45:48.026602 25233 solver.cpp:222]     Train net output #0: loss1 = 0.00983484 (* 1 = 0.00983484 loss)
I1017 19:45:48.026612 25233 solver.cpp:222]     Train net output #1: loss2 = 0.127687 (* 1 = 0.127687 loss)
I1017 19:45:48.026621 25233 solver.cpp:419] Iteration 14600, lr = 0.00509095
I1017 19:46:01.385957 25233 solver.cpp:207] Iteration 14700, loss = 0.106923
I1017 19:46:01.386195 25233 solver.cpp:222]     Train net output #0: loss1 = 0.00539626 (* 1 = 0.00539626 loss)
I1017 19:46:01.386212 25233 solver.cpp:222]     Train net output #1: loss2 = 0.101527 (* 1 = 0.101527 loss)
I1017 19:46:01.386224 25233 solver.cpp:419] Iteration 14700, lr = 0.00507548
I1017 19:46:14.776785 25233 solver.cpp:207] Iteration 14800, loss = 0.117109
I1017 19:46:14.776810 25233 solver.cpp:222]     Train net output #0: loss1 = 0.0148551 (* 1 = 0.0148551 loss)
I1017 19:46:14.776821 25233 solver.cpp:222]     Train net output #1: loss2 = 0.102254 (* 1 = 0.102254 loss)
I1017 19:46:14.776830 25233 solver.cpp:419] Iteration 14800, lr = 0.00506012
I1017 19:46:28.166108 25233 solver.cpp:207] Iteration 14900, loss = 0.096353
I1017 19:46:28.166164 25233 solver.cpp:222]     Train net output #0: loss1 = 0.00688672 (* 1 = 0.00688672 loss)
I1017 19:46:28.166175 25233 solver.cpp:222]     Train net output #1: loss2 = 0.0894662 (* 1 = 0.0894662 loss)
I1017 19:46:28.166184 25233 solver.cpp:419] Iteration 14900, lr = 0.00504488
I1017 19:46:41.403934 25233 solver.cpp:333] Snapshotting to examples/siamese/mnist_siamese_iter_15000.caffemodel
I1017 19:46:41.413354 25233 solver.cpp:340] Snapshotting solver state to examples/siamese/mnist_siamese_iter_15000.solverstate
I1017 19:46:41.419075 25233 solver.cpp:263] Iteration 15000, Testing net (#0)
I1017 19:46:56.359097 25233 solver.cpp:314]     Test net output #0: loss1 = 0.026753 (* 1 = 0.026753 loss)
I1017 19:46:56.359120 25233 solver.cpp:314]     Test net output #1: loss2 = 0.0932524 (* 1 = 0.0932524 loss)
I1017 19:46:56.485534 25233 solver.cpp:207] Iteration 15000, loss = 0.131988
I1017 19:46:56.485556 25233 solver.cpp:222]     Train net output #0: loss1 = 0.0195205 (* 1 = 0.0195205 loss)
I1017 19:46:56.485569 25233 solver.cpp:222]     Train net output #1: loss2 = 0.112468 (* 1 = 0.112468 loss)
I1017 19:46:56.485580 25233 solver.cpp:419] Iteration 15000, lr = 0.00502973
...
I1017 20:25:18.461911 25233 solver.cpp:419] Iteration 29100, lr = 0.0035964
I1017 20:25:31.816725 25233 solver.cpp:207] Iteration 29200, loss = 0.0284481
I1017 20:25:31.816905 25233 solver.cpp:222]     Train net output #0: loss1 = 0.0118015 (* 1 = 0.0118015 loss)
I1017 20:25:31.816918 25233 solver.cpp:222]     Train net output #1: loss2 = 0.0166465 (* 1 = 0.0166465 loss)
I1017 20:25:31.816927 25233 solver.cpp:419] Iteration 29200, lr = 0.00358951
I1017 20:25:45.171653 25233 solver.cpp:207] Iteration 29300, loss = 0.0156576
I1017 20:25:45.171679 25233 solver.cpp:222]     Train net output #0: loss1 = 0.00427775 (* 1 = 0.00427775 loss)
I1017 20:25:45.171691 25233 solver.cpp:222]     Train net output #1: loss2 = 0.0113797 (* 1 = 0.0113797 loss)
I1017 20:25:45.171700 25233 solver.cpp:419] Iteration 29300, lr = 0.00358266
I1017 20:25:58.525778 25233 solver.cpp:207] Iteration 29400, loss = 0.0237713
I1017 20:25:58.525811 25233 solver.cpp:222]     Train net output #0: loss1 = 0.00566836 (* 1 = 0.00566836 loss)
I1017 20:25:58.525822 25233 solver.cpp:222]     Train net output #1: loss2 = 0.0181028 (* 1 = 0.0181028 loss)
I1017 20:25:58.525831 25233 solver.cpp:419] Iteration 29400, lr = 0.00357584

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper No, I have not tried MNIST, currently trying to get as much as I can out of my data. I will try your approach with two losses, thank you!
btw, also in fb paper you quoted they first trained on k-way softmax and only after that fine-tuned on contrastive loss/siamese net fixing most of the layers. In some papers I read that it could be beneficial to combine k-way (n classes) softmax loss together with contrastive loss in one net, since multi-class signal "is much stronger than contrastive". I couldnt figure out how to translate it in prototxt though..

@jackculpepper
Copy link
Copy Markdown
Contributor Author

Yeah, I have used the same approach with k-way softmax. I hang k-way softmax losses off each of the local layers. I'll check in an example of this on the MNIST siamese network in my #1278, which is exactly the point of the changes I made there. Let's continue this conversation over there.

@okn2020
Copy link
Copy Markdown

okn2020 commented Oct 17, 2014

@jackculpepper ok, just one more question regarding your latest commit of siamese example with two losses - shouldn't IP layers with 500 output be used as features? I think IP with 10 output were used for softmax in minst (?) and IP with 2 output are not needed (?).
I have to merge your changes in contrastive loss function and try it out, actually it seems much flexible now. Could you post there your prototxt or short example of siamese net with combined softmax and contrastive loss? This would be really great, such networks becoming quite complex

@jackculpepper
Copy link
Copy Markdown
Contributor Author

@shelhamer Could you take a look? I think this is pretty close to being ready to merge. I've integrated these examples to run:

examples/mnist/train_lenet_local.sh

and

examples/siamese/train_mnist_siamese_local.sh

@futurely
Copy link
Copy Markdown

The locally connected layer proposed here is a generalization of the convolution layer. The kernels in the original implementation all share the same set of parameters. Another extreme is each kernel has its own independent paremeters. In the middle ground, parameters are shared by the kernels in a local region [3]. Using these layers together carefully, near perfect 99.15% face verificatoin accuracy is achieved on the classic LFW dataset showing that the new method is much better than DeepFace.

Therefore, the ConvolutionLayer should be extended to support all the cases in a configurable way.

[3] Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-Verification. Technical report, arXiv:1406.4773, 2014.

@tjusxh
Copy link
Copy Markdown

tjusxh commented Nov 13, 2014

How to set weights are totall unshared?

@tjusxh
Copy link
Copy Markdown

tjusxh commented Nov 13, 2014

@jackculpepper I come across the question like
I1113 15:30:38.244654 33431 softmax_loss_layer.cpp:47] num = 100
I1113 15:30:38.244674 33431 softmax_loss_layer.cpp:48] count = 16000
I1113 15:30:38.244688 33431 softmax_loss_layer.cpp:49] dim = 160
I1113 15:30:38.244714 33431 softmax_loss_layer.cpp:50] label_value = 162
I1113 15:30:38.244735 33431 softmax_loss_layer.cpp:51] spatial_dim = 1
F1113 15:30:38.244763 33431 softmax_loss_layer.cpp:52] Check failed: dim > label_value * spatial_dim (160 vs. 162)

Maybe Because my categories is too much.In DeepFace:Closing the Gap to Human-Level Performance in Face Verification,There are many categories.Please help me!
Thanks
zjusxh

@rmanor
Copy link
Copy Markdown
Contributor

rmanor commented Dec 19, 2014

Hi, will this PR be merged?
It seems very useful. Thanks.

@caron-lee
Copy link
Copy Markdown

The speed seems a little slower than fb version. In my exp, jackculpepper's version takes ~700ms to extract feature, but fb papers says it only takes ~180ms using simd and caches. Why?

@Prasanna1991
Copy link
Copy Markdown

Should one need to update the caffe to use layers like "LOCAL_WEIGHTED_CONVOLUTION" ?? And is this layer the same as the one used by DeepFace for locally connected layer as: L4, L5, L6?? Thanks in advance!!

melgor added a commit to melgor/caffe that referenced this pull request Feb 4, 2015
Locally connected layer

Conflicts:
	src/caffe/layer_factory.cpp
	src/caffe/proto/caffe.proto
@singetta
Copy link
Copy Markdown

@jackculpepper thank you for your projects about locally connected layer. i also find and i want to develop as described in [1].
i installed your projects but i don't know how to do use this layers.
i saw your examples

  • examples/mnist/train_lenet_local.sh
  • examples/siamese/train_mnist_siamese_local.sh

but this examples just know about difference conv layers and locally layers.
anyway i use your locally layers and write train.prototxt for learning caffe as described in [1] and result accuracy is just 0.23...
so i don't know how to do develop this paper. if possible, please comment and upload your train.prototxt? thank you.

this train.prototxt is below..

name: "FaceNet"
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source:
backend: LMDB
batch_size: 5
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source:
backend: LMDB
batch_size: 5
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 32
kernel_size: 11
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
}
layers {
name: "pool2"
type: POOLING
bottom: "conv1"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layers {
name: "norm1"
type: LRN
bottom: "pool2"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "norm1"
top: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 16
kernel_size: 9
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 1
}
}
}
layers{
name: "relu2"
bottom: "conv3"
top: "conv3"
type: RELU
}
layers{
name: "norm2"
type: LRN
bottom: "conv3"
top: "norm2"
lrn_param{
local_size: 5
alpha: 0.0001
beta: 0.75
}
}

layers{
name: "local4"
type: LOCAL
bottom: "norm2"
top: "local4"
blobs_lr: 1
blobs_lr: 2
local_param{
num_output: 16
kernel_size: 9
stride:1
pad:0
weight_filler{
type: "xavier"
}
bias_filler{
type: "constant"
}
}
}

layers{
name: "relu3"
bottom: "local4"
top: "local4"
type: RELU
}

layers{
name: "local5"
type: LOCAL
bottom: "local4"
top: "local5"
blobs_lr: 1
blobs_lr: 2
local_param{
num_output: 16
kernel_size: 7
stride:2
pad:0
weight_filler{
type: "xavier"
}
bias_filler{
type: "constant"
}
}
}

layers{
name: "relu4"
type:RELU
bottom: "local5"
top: "local5"
}

layers{
name: "local6"
type: LOCAL
bottom: "local5"
top: "local6"
blobs_lr: 1
blobs_lr: 2
local_param{
num_output: 16
kernel_size: 5
stride:1
pad:0
weight_filler{
type: "xavier"
}
bias_filler{
type: "constant"
}
}
}

layers{
name: "relu5"
bottom: "local6"
top: "local6"
type: RELU
}

layers {
name: "fc7"
type: INNER_PRODUCT
bottom: "local6"
top: "fc7"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "relu6"
type: RELU
bottom: "fc7"
top: "fc7"
}
layers {
name: "drop7"
type: DROPOUT
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "fc8"
type: INNER_PRODUCT
bottom: "fc7"
top: "fc8"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 6
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
}
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "fc8"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "fc8"
bottom: "label"
top: "loss"
}

@shelhamer shelhamer added the ES label Mar 10, 2015
@jasonustc
Copy link
Copy Markdown

@jackculpepper @okn2020 @futurely @tjusxh @rmanor Has any body comes into the speed issue? I build a model that has 2 conv layers and 2 local layers, and use a K80 GPU, but the training speed is very slow, only 50 epoch in 3 minutes, it seems a bit wired... Here is my prototxt for CIFAR-10

name: "CIFAR10_do"
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param{
    mirror: true
    crop_size: 24
    mean_file: "D:\\Users\\v-xushe\\caffe\\cifar10\\data\\train_cifar10_mean"
  }
  data_param {
    source: "D:\\Users\\v-xushe\\caffe\\cifar10\\data\\cifar10_train_leveldb_2"
    batch_size: 128
#default database type is leveldb
  }
}
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  data_param {
    source: "D:\\Users\\v-xushe\\caffe\\cifar10\\data\\cifar10_test_leveldb_2"
    batch_size: 100
#default database type is leveldb
  }
  transform_param{
    crop_size: 24
    mean_file: "D:\\Users\\v-xushe\\caffe\\cifar10\\data\\train_cifar10_mean"
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 9
    alpha: 0.001
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
    group: 8
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}

layer {
  name: "norm2"
  type: "LRN"
  bottom: "conv2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 0.001
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "norm2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}


layer {
  name: "lc1"
  type: "Local"
  bottom: "pool2"
  top: "lc1"
  local_param {
    num_output: 64
    pad: 1
    stride: 1
    kernel_size: 3

    weight_filler {
      type: "gaussian"
      std: 0.04
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "relu_ip1"
  type: "ReLU"
  bottom: "lc1"
  top: "lc1"
}

layer {
  name: "lc2"
  type: "Local"
  bottom: "lc1"
  top: "lc2"
  local_param {
    num_output: 64
    pad: 1
    stride: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.04
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
  name: "relu_lc2"
  type: "ReLU"
  bottom: "lc2"
  top: "lc2"
}

layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "lc2"
  top: "ip1"
  inner_product_param {
    num_output: 128
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}

layer {
    name: "relu_ip1"
    type: "ReLU"
    bottom: "ip1"
    top: "ip1"
}

layer{
    name: "drop1"
    type: "Dropout"
    bottom: "ip1"
    top: "drop1"
    dropout_param {
        dropout_ratio: 0.5
    }
}

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "drop1"
  top: "ip2"
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}


layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

@bamos
Copy link
Copy Markdown
Contributor

bamos commented Jun 12, 2015

Hi, what's the status on this issue? What does the ES tag mean?

I'm interested in merging this into the master branch to use some of the features there.
Is anybody else also interested in this?

I'm happy to help with the merge, but I'm not familiar with Caffe's internals.

-Brandon.

@swamiviv
Copy link
Copy Markdown

I want to use the locally connected layers. Can anyone help with a merge ?

-Swami

@shelhamer
Copy link
Copy Markdown
Member

Closing since the dev branch is deprecated. Please send PRs to master.

@kailizhao
Copy link
Copy Markdown

@jackculpepper I go though your codes and merged it to the caffe(2015.05 version). I wanna know the difference between your local layer from the convolution layer in codes. Are the parameters of caffe_cpu_gemm different?
Thanks

@nitish11
Copy link
Copy Markdown

nitish11 commented Apr 4, 2016

@jackculpepper : I an unable to build this PR.
Getting the following error :
-- Installing: DeepFace/caffe-b4600f5f84771a3c39d78e31b2d1a3ba75544d9e/build/install/python/caffe/proto/init.py
CMake Error at src/caffe/proto/cmake_install.cmake:40 (FILE):
file INSTALL cannot find
"DeepFace/caffe-b4600f5f84771a3c39d78e31b2d1a3ba75544d9e/build/src/caffe/proto/caffe_pb2.py".
Call Stack (most recent call first):
src/caffe/cmake_install.cmake:41 (INCLUDE)
cmake_install.cmake:42 (INCLUDE)

Any check on this??
make: *** [install] Error 1

@nitish11
Copy link
Copy Markdown

nitish11 commented Apr 4, 2016

I got it resolved, I used the build below and it worked.
https://github.com/BVLC/caffe/tree/96b20185f77d17e81efead841a18aa509f9f7c4f

@tanfei2007
Copy link
Copy Markdown

tanfei2007 commented Apr 17, 2016

@jackculpepper
is there any parameter to set overlapping regions for your local connected layers as mentioned in paper[1] ?
in your code, it seems that each unit in a feature map has a unique filter.

[1] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6247968

@knsong
Copy link
Copy Markdown
Contributor

knsong commented Jul 7, 2016

It seems that the implemention does not support the local-conv manner you mentioned and this local convolution may be what you need.@tanfei2007

@chenxinhua
Copy link
Copy Markdown

in caffe, can we stack splitsLayer,cropLayer,convolutionLayer,flattenLayer, concatLayer to produce locally convolution layer?

1 similar comment
@chenxinhua
Copy link
Copy Markdown

in caffe, can we stack splitsLayer,cropLayer,convolutionLayer,flattenLayer, concatLayer to produce locally convolution layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.