「ディープノーザーの定理」：対称制約の構築

固有の対称性が必要な学習問題がある場合、学習を強化するために学習問題に対称制約を適用する方法はありますか？

たとえば、画像認識を行う場合、2D回転対称性が必要になる場合があります。つまり、画像の回転されたバージョンは元の画像と同じ結果になるはずです。

または、私が三目並べをプレイすることを学んでいる場合、90度回転させると同じゲームプレイが得られます。

これについて何か研究が行われましたか？

machine-learning

— aidan.plenert.macdonald
ソース

はい、一部。たとえば、グループ同変畳み込みネットワーク（コード）、調和ネットワーク：深層移動と回転同値分散、深層回転同値ネットワーク、畳み込みニューラルネットワークでの周期的対称性の活用など。まだ実際にはあまり見かけません。

— Emre

@Emreありがとう！CNN以外の仕事について知っていますか？

— aidan.plenert.macdonald 2017

いいえ、私はこのニッチの表面的な知識しか持っていません。それにもかかわらず、CNNは自然な設定のように見えます...

— Emre

Risi Kondorの博士論文、機械学習におけるグループ理論的手法（pdf）

— Emre

上記のEmreのコメントから、Risi Kondorによる機械学習のグループ理論的手法のセクション4.4には、本質的に対称性のあるカーネルメソッドの作成に関する詳細な情報と証明があります。うまくいけば直感的な方法で要約します（私は数学者ではなく物理学者です！）。

、のようなほとんどのMLアルゴリズムは、行列の乗算を持っている

\begin{aligned} s_{i} & = \sum_{j} W_{i j} x_{j} \\ = \sum_{j} W_{i j} ({\vec{e}}_{j} \cdot \vec{x}) \end{aligned}

$\begin{align} s_i &= \sum_j W_{ij}~x_j \\ &= \sum_j W_{ij}~(\vec{e}_j \cdot \vec{x}) \end{align}$ と

\vec{x}

$\vec{x}$ が入力し、

W_{i j}

$W_{ij}$ 、我々は電車に希望の重みです。

カーネル法

カーネルメソッドの領域を入力し、介してアルゴリズムに入力を処理させる

\begin{aligned} s_{i} & = \sum_{j} W_{i j} k (e_{j}, x) \end{aligned}

$\begin{align} s_i &= \sum_j W_{ij}~k(e_j,~x) \end{align}$ 今、私たちは一般に

x, e_{j} \in X

$x, e_j \in \mathcal{X}$ 。

グループ検討 $G$ に作用する $\mathcal{X}$ を介して $x \rightarrow T_g(x)$ のための $g \in G$ 。このグループの下でアルゴリズムを不変にする簡単な方法は、カーネル

\begin{aligned} k^{G} (x, y) & = \frac{1}{| G |} \sum_{g \in G} k (x, T_{g} (y)) \end{aligned}

$\begin{align} k^G(x, y) &= \frac{1}{|G|} \sum_{g \in G} k(x, T_g(y)) \end{align}$ と

k (x, y) = k (T_{g} (x), T_{g} (y))

$k(x, y) = k(T_g(x), T_g(y))$ 。

したがって、

\begin{aligned} k^{G} (x, T_{h} (y)) & = \frac{1}{| G |} \sum_{g \in G} k (x, T_{g h} (y)) \\ = \frac{1}{| G |} \sum_{g \in G} k (x, T_{g} (y)) \\ = \frac{1}{| G |} \sum_{g \in G} k (T_{g} (x), y) \end{aligned}

$\begin{align} k^G(x, T_h(y)) &= \frac{1}{|G|} \sum_{g \in G} k(x, T_{gh}(y)) \\ &= \frac{1}{|G|} \sum_{g \in G} k(x, T_{g}(y)) \\ &= \frac{1}{|G|} \sum_{g \in G} k(T_{g}(x), y) \end{align}$

$k(x, y) = x \cdot y$

\begin{aligned} k^{G} (x, T_{h} (y)) & = [\frac{1}{| G |} \sum_{g \in G} T_{g} (x)] \cdot y \end{aligned}

$\begin{align} k^G(x, T_h(y)) &= \left[ \frac{1}{|G|} \sum_{g \in G} T_{g}(x) \right] \cdot y \end{align}$

これは、アルゴリズムへの入力を対称化できる変換行列を提供します。

SO（2）の例

実際、マップするグループだけ $\frac{\pi}{2}$

$(\vec{x}_i, y_i) \in \mathbb{R}^2 \times \mathbb{R}$

\begin{aligned} min_{W_{j}} & \sum_{i} \frac{1}{2} (y_{i} - {\tilde{y}}_{i})^{2} \\ {\tilde{y}}_{i} & = \sum_{j} W_{j} k_{G} (e_{j}, x_{i}) + b_{i} \end{aligned}

$\begin{align} \min_{W_{j}} &\sum_i \frac{1}{2} (y_i - \tilde{y}_i)^2 \\ \tilde{y}_i &= \sum_j W_{j} k_G(e_j, x_i) + b_i \end{align}$

$k(x, y) = \| x - y \|^2$ $k(x, y) = k(T_g(x), T_g(y))$ $k(x, y) = x \cdot y$

\begin{aligned} k_{G} (e_{j}, x_{i}) & = \frac{1}{4} \sum_{n = 1}^{4} ‖ R (n π / 2) {\vec{e}}_{j} - {\vec{x}}_{i} ‖^{2} \\ = \frac{1}{4} \sum_{n = 1}^{4} (\cos (n π / 2) - {\vec{x}}_{i 1})^{2} + (\sin (n π / 2) - {\vec{x}}_{i 2})^{2} \\ = \frac{1}{4} [2 {\vec{x}}_{i 1}^{2} + 2 {\vec{x}}_{i 2}^{2} + (1 - {\vec{x}}_{i 1})^{2} + (1 - {\vec{x}}_{i 2})^{2} + (1 + {\vec{x}}_{i 1})^{2} + (1 + {\vec{x}}_{i 2})^{2}] \\ = {\vec{x}}_{i 1}^{2} + {\vec{x}}_{i 2}^{2} + 1 \end{aligned}

$\begin{align} k_G(e_j, x_i) &= \frac{1}{4} \sum_{n=1}^4 \| R(n\pi/2)~\vec{e}_j - \vec{x}_i \|^2 \\ &= \frac{1}{4} \sum_{n=1}^4 ( \cos(n\pi/2) - \vec{x}_{i1} )^2 + ( \sin(n\pi/2) - \vec{x}_{i2} )^2 \\ &= \frac{1}{4} \left[ 2 \vec{x}_{i1}^2 + 2 \vec{x}_{i2}^2 + (1 - \vec{x}_{i1} )^2 + (1 - \vec{x}_{i2} )^2 + (1 + \vec{x}_{i1} )^2 + (1 + \vec{x}_{i2} )^2 \right] \\ &= \vec{x}_{i1}^2 + \vec{x}_{i2}^2 + 1 \end{align}$

Note that we needn't sum over $j$ because it is the same for both. So our problem becomes,

\begin{aligned} min_{W} & \sum_{i} \frac{1}{2} (y_{i} - {\tilde{y}}_{i})^{2} \\ {\tilde{y}}_{i} & = W [{\vec{x}}_{i 1}^{2} + {\vec{x}}_{i 2}^{2} + 1] + b_{i} \end{aligned}

$\begin{align} \min_{W} &\sum_i \frac{1}{2} (y_i - \tilde{y}_i)^2 \\ \tilde{y}_i &= W \left[ \vec{x}_{i1}^2 + \vec{x}_{i2}^2 + 1 \right] + b_i \end{align}$

Which yields the expected spherical symmetry!

Tic-Tac-Toe

Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.

— aidan.plenert.macdonald
ソース

Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.

— Emre

Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq

— aidan.plenert.macdonald

Is there any meaningful approach when the symmetry group is not known?

— leitasat

@leitasat How do you know it's symmetric if you don't know the group?

— aidan.plenert.macdonald

@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?

— leitasat

Turns out this is just the study of Invariant Theory applied to Machine Learning

— aidan.plenert.macdonald
ソース