Skip to content

Commit 7dcfebb

Browse files
committed
entropic mostly done, starting general regularization
1 parent 982ee83 commit 7dcfebb

File tree

1 file changed

+133
-11
lines changed

1 file changed

+133
-11
lines changed

docs/source/quickstart.rst

Lines changed: 133 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ Quick start guide
55
In the following we provide some pointers about which functions and classes
66
to use for different problems related to optimal transport (OT).
77

8+
This document is not a tutorial on numerical optimal transport. For this we strongly
9+
recommend to read the very nice book [15]_ .
10+
811

912
Optimal transport and Wasserstein distance
1013
------------------------------------------
@@ -20,10 +23,11 @@ Solving optimal transport
2023

2124
The optimal transport problem between discrete distributions is often expressed
2225
as
23-
.. math::
24-
\gamma^* = arg\min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j}
2526

26-
s.t. \gamma 1 = a; \gamma^T 1= b; \gamma\geq 0
27+
.. math::
28+
\gamma^* = arg\min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j}
29+
30+
s.t. \gamma 1 = a; \gamma^T 1= b; \gamma\geq 0
2731
2832
where :
2933

@@ -120,8 +124,6 @@ distributions. In this case when the finite sample dataset is supposed gaussian,
120124
mapping.
121125

122126

123-
124-
125127
Regularized Optimal Transport
126128
-----------------------------
127129

@@ -146,6 +148,7 @@ We discuss in the following specific algorithms that can be used depending on
146148
the regularization term.
147149

148150

151+
149152
Entropic regularized OT
150153
^^^^^^^^^^^^^^^^^^^^^^^
151154

@@ -168,23 +171,107 @@ solution of the resulting optimization problem can be expressed as:
168171
\gamma_\lambda^*=\text{diag}(u)K\text{diag}(v)
169172
170173
where :math:`u` and :math:`v` are vectors and :math:`K=\exp(-M/\lambda)` where
171-
the :math:`\exp` is taken component-wise.
174+
the :math:`\exp` is taken component-wise. In order to solve the optimization
175+
problem, on can use an alternative projection algorithm that can be very
176+
efficient for large values if regularization.
177+
178+
The main function is POT are :any:`ot.sinkhorn` and
179+
:any:`ot.sinkhorn2` that return respectively the OT matrix and the value of the
180+
linear term. Note that the regularization parameter :math:`\lambda` in the
181+
equation above is given to those function with the parameter :code:`reg`.
172182

183+
>>> import ot
184+
>>> a=[.5,.5]
185+
>>> b=[.5,.5]
186+
>>> M=[[0.,1.],[1.,0.]]
187+
>>> ot.sinkhorn(a,b,M,1)
188+
array([[ 0.36552929, 0.13447071],
189+
[ 0.13447071, 0.36552929]])
173190

174191

175192

193+
More details about the algorithm used is given in the following note.
194+
195+
196+
.. note::
197+
The main function to solve entropic regularized OT is :any:`ot.sinkhorn`.
198+
This function is a wrapper and the parameter :code:`method` help you select
199+
the actual algorithm used to solve the problem:
200+
201+
+ :code:`method='sinkhorn'` calls :any:`ot.bregman.sinkhorn_knopp` the
202+
classic algorithm [2]_.
203+
+ :code:`method='sinkhorn_stabilized'` calls :any:`ot.bregman.sinkhorn_stabilized` the
204+
log stabilized version of the algorithm [9]_.
205+
+ :code:`method='sinkhorn_epsilon_scaling'` calls
206+
:any:`ot.bregman.sinkhorn_epsilon_scaling` the epsilon scaling version
207+
of the algorithm [9]_.
208+
+ :code:`method='greenkhorn'` calls :any:`ot.bregman.greenkhorn` the
209+
greedy sinkhorn verison of the algorithm [22]_.
210+
211+
In addition to all those variants of sinkhorn, we have another
212+
implementation solving the problem in the smooth dual or semi-dual in
213+
:any:`ot.smooth`. This solver use the :any:`scipy.optimize.minimize`
214+
function to solve the smooth problem with :code:`L-BFGS` algorithm. Tu use
215+
this solver, use functions :any:`ot.smooth.smooth_ot_dual` or
216+
:any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='kl'` to
217+
choose entropic/Kullbach Leibler regularization.
218+
219+
.. hint::
220+
Examples of use for :any:`ot.sinkhorn` are available in the following examples:
221+
222+
- :any:`auto_examples/plot_OT_2D_samples`
223+
- :any:`auto_examples/plot_OT_1D`
224+
- :any:`auto_examples/plot_OT_1D_smooth`
225+
- :any:`auto_examples/plot_stochastic`
226+
227+
Finally note that we also provide in :any:`ot.stochastic` several implementation
228+
of stochastic solvers for entropic regularized OT [18]_ [19]_.
176229

177230
Other regularization
178231
^^^^^^^^^^^^^^^^^^^^
179232

180-
Stochastic gradient descent
181-
^^^^^^^^^^^^^^^^^^^^^^^^^^^
233+
While entropic OT is the most common and favored in practice, there exist other
234+
kind of regularization. We provide in POT two specific solvers for other
235+
regularization terms: namely quadratic regularization and group lasso
236+
regularization. But we also provide in :any:`ot.optim` two generic solvers that allows solving any
237+
smooth regularization in practice.
238+
239+
The first general regularization term we can solve is the quadratic
240+
regularization of the form
241+
242+
.. math::
243+
\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}^2
244+
245+
this regularization term has a similar effect to entropic regularization in
246+
densifying the OT matrix but it keeps some sort of sparsity that is lost with
247+
entropic regularization as soon as :math:`\lambda>0` [17]_. This problem cen be
248+
solved with POT using solvers from :any:`ot.smooth`, more specifically
249+
functions :any:`ot.smooth.smooth_ot_dual` or
250+
:any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='l2'` to
251+
choose the quadratic regularization.
252+
253+
Another regularization that has been used in recent years is the group lasso
254+
regularization
255+
256+
.. math::
257+
\Omega(\gamma)=\sum_{j,G\in\mathcal{G}} \|\gamma_{G,j}\|_p^q
258+
259+
where :math:`\mathcal{G}` contains non overlapping groups of lines in the OT
260+
matrix. This regularization proposed in [5]_ will promote sparsity at the group level and for
261+
instance will force target samples to get mass from a small number of groups.
262+
Note that the exact OT solution is already sparse so this regularization does
263+
not make sens if it is not combined with others such as entropic.
264+
265+
266+
267+
268+
182269

183270
Wasserstein Barycenters
184271
-----------------------
185272

186273
Monge mapping and Domain adaptation with Optimal transport
187-
----------------------------------------
274+
----------------------------------------------------------
188275

189276

190277
Other applications
@@ -207,7 +294,6 @@ FAQ
207294
the OT transport matrix. If you want to solve a regularized OT you can
208295
use :py:mod:`ot.sinkhorn`.
209296

210-
211297

212298
Here is a simple use case:
213299

@@ -222,7 +308,43 @@ FAQ
222308
:doc:`auto_examples/plot_OT_2D_samples`
223309

224310

225-
2. **Compute a Wasserstein distance**
311+
2. **pip install POT fails with error : ImportError: No module named Cython.Build**
312+
313+
As discussed shortly in the README file. POT requires to have :code:`numpy`
314+
and :code:`cython` installed to build. This corner case is not yet handled
315+
by :code:`pip` and for now you need to install both library prior to
316+
installing POT.
317+
318+
Note that this problem do not occur when using conda-forge since the packages
319+
there are pre-compiled.
320+
321+
See `Issue #59 <https://github.com/rflamary/POT/issues/59>`__ for more
322+
details.
323+
324+
3. **Why is Sinkhorn slower than EMD ?**
325+
326+
This might come from the choice of the regularization term. The speed of
327+
convergence of sinkhorn depends directly on this term [22]_ and when the
328+
regularization gets very small the problem try and approximate the exact OT
329+
which leads to slow convergence in addition to numerical problems. In other
330+
words, for large regularization sinkhorn will be very fast to converge, for
331+
small regularization (when you need an OT matrix close to the true OT), it
332+
might be quicker to use the EMD solver.
333+
334+
Also note that the numpy implementation of the sinkhorn can use parallel
335+
computation depending on the configuration of your system but very important
336+
speedup can be obtained by using a GPU implementation since all operations
337+
are matrix/vector products.
338+
339+
4. **Using GPU fails with error: module 'ot' has no attribute 'gpu'**
340+
341+
In order to limit import time and hard dependencies in POT. we do not import
342+
some sub-modules automatically with :code:`import ot`. In order to use the
343+
acceleration in :any:`ot.gpu` you need first to import is with
344+
:code:`import ot.gpu`.
345+
346+
See `Issue #85 <https://github.com/rflamary/POT/issues/85>`__ and :any:`ot.gpu`
347+
for more details.
226348

227349

228350
References

0 commit comments

Comments
 (0)