|
23 | 23 | \min_{\mathbf{x} \in \mathcal{X}} p(\mathbf{x}) |
24 | 24 | $$ |
25 | 25 |
|
26 | | -where $p(\mathbf{x})$ is a polynomial and $\mathcal{X}$ is a set defined by polynomial equalities, for instance, $\mathcal{X} = \\{ \mathbf{x} \in \mathbb{R}^d \mid q_i(\mathbf{x}) = 0, \,i\in[n] \\}$, where we introduced the shorthand $[n]$ for $\\{1, \ldots, n\\}$. This problem may in general be hard to solve due to the non-convexity of the objective and the feasible set. |
| 26 | +where $p(\mathbf{x})$ is a polynomial and $\mathcal{X}$ is a set defined by polynomial equalities, for instance, $\mathcal{X} = \\{ \mathbf{x} \in \mathbb{R}^d \mid q_i(\mathbf{x}) = 0, \,i\in[n] \\}$, where we introduced the shorthand $[n]$ for $\\{1, \ldots, n\\}$. This problem may in general be hard to solve when the objective and/or feasible set are non-convex. |
27 | 27 |
|
28 | | -A powerful technique to tackle such problems is to solve a series of convex relaxations. To do so, we first rewrite the problem using "lifted" variables. We define a vector of monomials, $\phi(\mathbf{x})^\top$, which in machine learning would be called the feature vector. The objective can then be written as an inner product $p(\mathbf{x}) = \langle \mathbf{C}, \phi(\mathbf{x})\phi(\mathbf{x})^\top \rangle$ for some matrix $\mathbf{C}$, where $\langle, \rangle$ is trace inner product. Our problem becomes: |
| 28 | +A powerful technique to tackle such problems is to solve a series of increasingly tight convex relaxations. To do so, we first rewrite the problem using "lifted" variables. We define a vector of monomials, $\phi(\mathbf{x})$, akin to a feature vector in machine learning. We choose the feature vector such that objective can then be written as an inner product $p(\mathbf{x}) = \langle \mathbf{C}, \phi(\mathbf{x})\phi(\mathbf{x})^\top \rangle$ for some matrix $\mathbf{C}$, where $\langle, \rangle$ is trace inner product. Our problem becomes: |
29 | 29 |
|
30 | 30 | $$ |
31 | 31 | \min_{\mathbf{x} \in \mathcal{X}} \langle \mathbf{C}, \phi(\mathbf{x})\phi(\mathbf{x})^\top \rangle |
|
66 | 66 | \mathcal{V} = \text{span} \{ \phi(\mathbf{x})\phi(\mathbf{x})^\top \mid \mathbf{x} \in \mathcal{X} \} |
67 | 67 | $$ |
68 | 68 |
|
69 | | -Every moment matrix $\mathbf{M}(\mathbf{x})$ corresponding to a feasible point of our problem lies within this subspace $\mathcal{V}$. In other words, we can define a basis $\\{ \mathbf{B}_i \\}\_{i\in[n\_b]}$, so that every element $\mathbf{X}$ of $\mathcal{V}$ can be written as |
| 69 | +Every moment matrix $\mathbf{M}(\mathbf{x})$ corresponding to a feasible point of our problem lies within this subspace $\mathcal{V}$. In other words, we can define a basis $\\{ \mathbf{B}_i \\}\_{i\in[n\_b]}$, so that every element $\mathbf{M}$ in $\mathcal{V}$ can be written as |
70 | 70 |
|
71 | 71 | $$ |
72 | | -\mathbf{X} = \sum_i \alpha_i \mathbf{B}_i, |
| 72 | +\mathbf{M} = \sum_i \alpha_i \mathbf{B}_i, |
73 | 73 | $$ |
74 | 74 |
|
75 | | -for some choices $\alpha_i$. In particular, there exist some $\alpha$ that allow to characterize each element of the feasible set $\mathcal{X}$. |
| 75 | +for some chosen $\alpha_i$. |
76 | 76 |
|
77 | | -If we call $\mathcal{K}$ the space of all admissible moment matrices, i.e., matrices $\mathbf{X}$ for which there exists a positive measure $\mu$ such that $\mathbf{X}=\int \phi(\mathbf{x})\phi(\mathbf{x})^\top d\mu(\mathbf{x})$, that space corresponds to the closure of the convex hull of all $\\{\phi(\mathbf{x})\phi(\mathbf{x})$ for $\mathbf{x}\in\mathcal{X}\\}$ --- we call this set for convenience $\bar{\mathcal{X}}$ (see [this post](https://francisbach.com/sums-of-squares-for-dummies/) for more details, and below for the visualization of our toy example). |
| 77 | +We call $\mathcal{K}$ the space of all admissible (pseudo) moment matrices. |
| 78 | +<!--i.e., matrices $\mathbf{X}$ for which there exists a positive measure $\mu$ such that $\mathbf{X}=\int \phi(\mathbf{x})\phi(\mathbf{x})^\top d\mu(\mathbf{x})$,--> This space corresponds to the closure of the convex hull of all $\\{\phi(\mathbf{x})\phi(\mathbf{x})$ for $\mathbf{x}\in\mathcal{X}\\}$ --- we call this set for convenience $\bar{\mathcal{X}}$ (see [this post](https://francisbach.com/sums-of-squares-for-dummies/) for more details, and below for the visualization of our toy example). |
78 | 79 |
|
79 | 80 | {% include figure.liquid |
80 | 81 | path="/assets/images/blog/2025-10-27/subspaces-export.svg" |
|
108 | 109 | The subspace $\mathcal{V}$ is the span of these two matrices, $\mathcal{V} = \text{span}(\mathbf{B}_1, \mathbf{B}_2)$. This is a 2-dimensional subspace within the ambient space of $4 \times 4$ symmetric matrices, which has dimension $\frac{4 \times 5}{2} = 10$. For a more compact notation, we use the half-vectorization operator and define $\mathbf{b}_i:=\mathrm{vech}(\mathbf{B}_i)\in\mathbb{R}^{10}$, where we scale off-diagonal elements by $\sqrt{2}$ to ensure $\langle \mathbf{A}, \mathbf{B}\rangle = \mathbf{a}^\top\mathbf{b}$. |
109 | 110 | </div> |
110 | 111 |
|
111 | | -We will also need a basis for $\mathcal{V}^{\perp}$, the nullspace of the span of $\\{ \phi(\mathbf{x})\phi(\mathbf{x})^T, \mathbf{x} \in \mathcal{X} \\}$. Let's call the basis vectors $\\{\mathbf{U}_j\\}\_{j\in [n\_u]}$. By definition of the nullspace, for any $\mathbf{x} \in \mathcal{X}$, we must have: |
| 112 | +We will also need a basis for $\mathcal{V}^{\perp}$, the nullspace of the span of $\\{ \phi(\mathbf{x})\phi(\mathbf{x})^T, \mathbf{x} \in \mathcal{X} \\}$. Let's call the basis vectors $\\{\mathbf{U}_j\\}\_{j\in [n\_u]}$. By definition of the nullspace, for any $\mathbf{x} \in \mathcal{M}$, we must have: |
112 | 113 |
|
113 | 114 | $$ |
114 | 115 | \langle \mathbf{U}_j, \phi(\mathbf{x})\phi(\mathbf{x})^T \rangle = 0 |
|
152 | 153 | Now we want to find a computationally tractable outer approximation of the set |
153 | 154 |
|
154 | 155 | $$ |
155 | | -\mathcal{K} = \{\mathbf{X} \;| \mathbf{M} = \int_\mathcal{X} \phi(\mathbf{x})\phi(\mathbf{x})^\top d\mu(x)\ \text{for some measure $\mu$.}\} |
| 156 | +\mathcal{K} = \{\mathbf{X} \;| \mathbf{X} = \int_\mathcal{X} \phi(\mathbf{x})\phi(\mathbf{x})^\top d\mu(x)\ \text{for some measure $\mu$.}\} |
156 | 157 | $$ |
157 | 158 |
|
158 | 159 | An intuitive choice is to add all characteristics of this set that are computationally easy to handle: |
159 | 160 |
|
160 | 161 | $$ |
161 | | -\mathcal{\widehat{K}} = \{\mathbf{M} \;| \begin{cases} |
| 162 | +\mathcal{\widehat{K}} = \{\mathbf{X} \;| \begin{cases} |
162 | 163 | & \mathbf{X} = \sum_i \alpha_i \mathbf{B}_i & \text{(want to lie in same subspace)} \\ |
163 | 164 | & \mathbf{X} \succeq 0 & \text{(because it is an outer product of same vector and $\mu \geq 0$)} \\ |
164 | 165 | & \langle \mathbf{A}_0, \mathbf{X} \rangle = 1 & \text{(we assume normalization and that $\phi(\mathbf{x})_0=1$)} |
@@ -375,9 +376,9 @@ This kernelization of the problem allows for different kernels to be applied, an |
375 | 376 |
|
376 | 377 | ### Conclusion and Discussion |
377 | 378 |
|
378 | | -We've seen that by taking a subspace perspective, we can derive alternative but equivalent relaxations for polynomial optimization problems. Depending on the dimensions of the subspace $\mathcal{V}$ and its complement $\mathcal{V}^\perp$, one form might be more computationally efficient than the other. For instance, if the nullspace $\mathcal{V}^\perp$ has a very small dimension, the primal kernel form might have far fewer constraints than the image form. |
| 379 | +We've seen that by taking a subspace perspective, we can derive relaxations for polynomial optimization problems using only information from feasible samples. Depending on the dimensions of the subspace $\mathcal{V}$ and its complement $\mathcal{V}^\perp$, one form might be more computationally efficient than the other. For instance, if the nullspace $\mathcal{V}^\perp$ has a very small dimension, the primal kernel form might have far fewer constraints than the image form. |
379 | 380 |
|
380 | | -An obvious limitation of this approach is that it cannot easily deal with inequality constraints. However, at least the equality-constrained part of polynomial optimization problems can handled in a very elegant way through this subspace view. |
| 381 | +An obvious limitation of this approach is that it cannot easily deal with inequality constraints. However, at least the equality-constrained part of polynomial optimization problems can handled in a very elegant way through this subspace view. To some extent, inequalities can be dealt with my introducing slack variables, but this may does not easily scale to many inequalities. |
381 | 382 |
|
382 | 383 | ### Appendix 1: Verification via the Duals {#appendix1} |
383 | 384 |
|
|
0 commit comments