Skip to content

Conversation

@shapovalov
Copy link

Hi Andreas,

I needed a more general wrapper for the GCO library, so I made several changes:

  1. GCO allows to redefine the energy type used, and the floating-point energy is often useful. In theory, it may work much slower (since augmenting-path max-flow algorithms are inherently designed for integer capacities), but is often OK in practice. Now, changing one compile-time definition in gco_python.pyx sets the type.

  2. I’ve added pairwise potential callbacks to the C++ wrapper, so that
    arbitrary pairwise potential suitable for expansion/swap can be specified. I thought that this was the only way to implement the generalized Potts potentials (i.e. that vary across edges), but later I noticed that for general graphs I can specify the edge weights.
    However, this feature is still can be useful for anyone who needs more general pairwise potentials (e.g. associative potentials that are learned separately for different labels).

I don’t know if your consider those changes useful. Please look at the diffs. Also, I am new to Cython, so some code may be not idiomatic or look stupid.

Regards,
Roman

2) added pairwise potential callbacks to the C++ wrapper, so that
arbitrary pairwise potential suitable for expansion/swap can be specified
@amueller
Copy link
Owner

Hi Roman.
Thanks for the PR. I'll try to look into it soon.
I recently added the cut_from_graph that can get a weight per edge as you said. Having a callable might also be useful, though. And having float support. As I said, I'll try to find some time ;)
Cheers,
Andy

@shapovalov
Copy link
Author

Hi Andy,

I’ve committed few follow-up edits. They concern double energies for varying edge potentials. The condition if e.shape[0] == 3 in cut_from_graph() seemed buggy, I’ve changed to shape[1].

Roman

@shapovalov
Copy link
Author

Andy,

2 more follow-up edits.

  1. Fixed memory leak, which was crucial when functions were called repeatedly, e.g. during learning.
  2. Added energy value returned by the library to the return list of the wrapper. This breaks the API, so you might want to ignore this edit.

Cheers,
Roman

@amueller
Copy link
Owner

Thanks a lot. I'll try to get it merged on the weekend.
Do you use svm-struct for learning? If so, how do you include the submodularity constraint?

I am not using submodular energies for learning any more, now...

@shapovalov
Copy link
Author

I tested it with your implementations of structured SVM — both cutting-plane and subgradient (thanks, BTW!). I switched from Joachims' SVM^struct code, because it was messy, used the QP solver that behaved weirdly, and was unlikely to be substantially faster, since the bottleneck was in CRF inference anyway.

The standard way for learning submodular/associative function is to use non-negative pairwise features and non-negative pairwise weights (assuming you maximize the scoring function). As I tested only on the toy example, the weights were non-negative all the time during learning. In practice they eventually are negative, in this case you can take a projection. In the subgradient method this is fine, but for cutting-plane it is just an heuristic. In the latter case you’d better add the non-negativity constraints to the QP. If there are not many pairwise weights, it works well.

@amueller
Copy link
Owner

amueller commented Feb 1, 2013

Cool, I didn't know you were using my code. There is actually an option in the n-slack version of my code to add the non-negativity constraint to the QP (there is an example in one of the tests, git grep submodular).

I was just wondering if you managed it to get the constraint into SVM^struct, which I didn't get to work :-/

@amueller
Copy link
Owner

amueller commented Feb 1, 2013

Btw, I recently made some changes in the subgradient and n-slack solvers to use mini-batches. In case of the subgradient method the minibatches are of size n_jobs (i.e. number of cores), so you can do more inference in parallel. For the n-slack version, you can choose the minibatch size (should be >n_jobs, though), which allows you to balance work between the QP and the inference. Smaller minibatches mean more updates of the QP and therefore less calls to inference. Might be worth giving a shot ;)

@shapovalov
Copy link
Author

I was just wondering if you managed it to get the constraint into SVM^struct,

Well, lately I used Yu’s latent SVM^struct code, which implements a proximate variant of the cutting-plane method. Besides, it allows using the MOSEK solver for QP, which seems more stable. When there are order of 10 non-negative weights, it works well.

Btw, I recently made some changes in the subgradient and n-slack solvers to use mini-batches.

This is definitely worth trying, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants