Skip to content

Add label pair functionality to contrastive loss function#1278

Closed
jackculpepper wants to merge 9 commits intoBVLC:devfrom
jackculpepper:contrastive_loss_from_label_pair
Closed

Add label pair functionality to contrastive loss function#1278
jackculpepper wants to merge 9 commits intoBVLC:devfrom
jackculpepper:contrastive_loss_from_label_pair

Conversation

@jackculpepper
Copy link
Copy Markdown
Contributor

This PR adds functionality to the contrastive loss function.

Now, in addition to being able to pass a 1/0 label that specifies similar or dissimilar pairs, you can pass in two labels, and similarity will be calculated on the fly.

I do this by keeping another buffer, similar_, which is computed at feed-forward time.

This is useful, for example, if you want to combine a contrastive loss function with a soft-max loss, or if you want to use the same data loaders for both siamese or N-way classification networks.

The way I'm doing it does add a slight bit of additional overhead, but I think it's negligible.

If you hook up only one label, the 1/0 behavior still happens.

I also added a check to make sure there are either 3 or 4 bottom blobs.

@shelhamer
Copy link
Copy Markdown
Member

Nice generalization -- could you add another model to the siamese network example that makes use of this new feature?

@jackculpepper
Copy link
Copy Markdown
Contributor Author

These lines:

here, here, here, and here

..should they be .gpu_data() instead of .cpu_data()?

No.

(Removed my earlier comments to prevent someone from reading this and getting confused.)

@jeffdonahue
Copy link
Copy Markdown
Contributor

The code you linked is correct; using gpu_data instead would cause a segfault. gpu_data can only be read/written inside of a GPU kernel; all those examples (or at least the first two I checked) have immediate read/writes of the data from the host code and thus need to use cpu_data.

@jackculpepper
Copy link
Copy Markdown
Contributor Author

Heh..yeah. Thanks. Forgot my brain this morning.

changes include code to generated shared png files based storage
current setup should give roughly the same performance as before
seems to achieve approximately the same performance
could 'shuffle' in image_data_layer to generate extra train pairs
@jackculpepper
Copy link
Copy Markdown
Contributor Author

Here's a comparison on the MNIST "siamese" test error between the dev branch, and this branch with shuffle either off or on. When you turn it on, each epoch has a different same/not same pairing.

siamese

Note that the test set is identical for "shuffle" and "no shuffle" but different for "dev" (different random pairing). This may be why "no shuffle" does worse than "dev".

@amiralush
Copy link
Copy Markdown

@jackculpepper thanks for this PR. I've been trying to use the contrastive loss on imagenet dataset with no success. It seems like the loss is stuck straight from the beginning in a local minima. Have you tried using it with on a "imagenet" scale network?

@jackculpepper
Copy link
Copy Markdown
Contributor Author

@amiralush I haven't run it on imagenet yet, no. However, I have seen the problem you are describing. Instead of using a single contrastive loss at the top, you can put contrastive loss layers in the middle, too. Have you tried that?

Incidentally, for anyone else following this thread, I merged the label pair func idea into the hinge loss and accuracy layers, too. I'd be happy to submit PRs for those if anyone's interested.

@futurely
Copy link
Copy Markdown

@jackculpepper, all your changes are essential to implement a full-blown network using two kinds of supervised labels such as multi-class classification and pair-wise similarity [1]. Please share them too. Thank you very much!

[1] Y. Sun, X. Wang, and X. Tang. Deep Learning Face Representation by Joint Identification-Verification. Technical report, arXiv:1406.4773, 2014.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be cross platform, boost::filesystem::create_directories is a better option.

@amiralush
Copy link
Copy Markdown

@jackculpepper a PR would be very helpful.
Concerning your suggestion to use more than a single loss along the Net. Using imagenet architecture, I've tried adding an additional contrastive loss on fc6 which caused the second loss (on fc8) to output Nan after a couple of iterations but the first loss seems to be going along OK. I don't fully yet understand it.

add enum to params so that choice of label type is more explicit
refactor label computation code into a private function
add binary function generator and "caffe_cpu_same()" functions to math
@jackculpepper
Copy link
Copy Markdown
Contributor Author

@futurely Thank you for reviewing my code. All of your suggestions were good. I made the changes necessary to address them in the previous two commits. Note that using boost to create directories requires linking against boost_filesystem, so I had to add that to the Makefile.

@amiralush The changes I am making in this branch will be easily transferred over to accuracy and hinge loss layers, so it's probably better to wait. However, here are two PRs off dev from about a week ago:

accuracy
hinge

@kloudkl
Copy link
Copy Markdown
Contributor

kloudkl commented Oct 23, 2014

The changes look good to me. The other two PRs are also very useful. Thanks!

@jackculpepper
Copy link
Copy Markdown
Contributor Author

The "same" label can be computed using existing layers. Here is an example of how to use THRESHOLD and ELTWISE to compute the same/not same label from a pair of k-way labels:

layers {
  name: "diff_label_ab"
  type: ELTWISE
  eltwise_param {
    coeff: 1
    coeff: -1
  }
  bottom: "label_a"
  bottom: "label_b"
  top: "diff_label_ab"
}

layers {
  name: "diff_label_ba"
  type: ELTWISE
  eltwise_param {
    coeff: 1
    coeff: -1
  }
  bottom: "label_b"
  bottom: "label_a"
  top: "diff_label_ba"
}

layers {
  name: "diff_label_ab_thresh"
  type: THRESHOLD
  bottom: "diff_label_ab"
  top: "diff_label_ab_thresh"
}

layers {
  name: "diff_label_ba_thresh"
  type: THRESHOLD
  bottom: "diff_label_ba"
  top: "diff_label_ba_thresh"
}

layers {
  name: "label_same"
  type: ELTWISE
  bottom: "diff_label_ab_thresh"
  bottom: "diff_label_ba_thresh"
  top: "label_same"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants