誰かが逆伝播アルゴリズムを説明できますか？[複製]

13

バックプロパゲーションアルゴリズムとは何ですか？

algorithms optimization neural-networks

— あみ
ソース

1

誰かが興味がある場合は、この質問に対する答えをここにまとめます（再投稿したくありませんでした）。

— フィリイダ

14

逆伝播アルゴリズムは、ニューラルネットワークモデルに適合する勾配降下アルゴリズムです。（@Dikranが言及したように）方法を説明させてください。

正式：下の式[1]内のこの投稿の終わりに勾配の計算を使用すると（勾配降下の定義）、勾配降下を使用する特定のケースとして逆伝播アルゴリズムが得られます。

ニューラルネットワークモデル 正式には、単純な単一層モデルでアイデアを修正します。

ここで、およびは、すべての、および

f (x) = g (A^{1} (s (A^{2} (x))))

$f(x)=g(A^1(s(A^2(x))))$

g : R \to R

$g:\mathbb{R} \rightarrow \mathbb{R}$

s : R^{M} \to R^{M}

$s:\mathbb{R}^M\rightarrow \mathbb{R}^M$

m = 1 \dots, M

$m=1\dots,M$

s (x) [m] = σ (x [m])

$s(x)[m]=\sigma(x[m])$

、

は未知のアフィン関数です。関数

は、分類の枠組みでは活性化関数と呼ばれます。

A^{1} : R^{M} \to R

$A^1:\mathbb{R}^M\rightarrow \mathbb{R}$

A^{2} R^{p} \to R^{M}

$A^2\mathbb{R}^p\rightarrow \mathbb{R}^M$

σ : R \to R

$\sigma:\mathbb{R}\rightarrow \mathbb{R}$

二次損失関数は、アイデアを修正するために使用されます。したがって入力のベクトル実際の出力に装着することができるの経験的損失最小化することによって、（ベクトルとすることができる）： $(x_1,\dots,x_n)$ $\mathbb{R}^p$ $(y_1,\dots,y_n)$ $\mathbb{R}$ および選択に関して。

R_{n} (A^{1}, A^{2}) = \sum_{i = 1}^{n} (y_{i} - f (x_{i}))^{2} [1]

$\mathcal{R}_n(A^1,A^2)=\sum_{i=1}^n (y_i-f(x_i))^2\;\;\;\;\;\;\; [1]$

A^{1}

$A^1$

A^{2}

$A^2$

勾配降下 A grandient降下最小化するための反復そのアルゴリズムである：よく選ばれたステップサイズに対して（とも呼ばれる学習率逆伝播のフレームワークで）。の勾配の計算が必要です。考慮されるケースでは $\mathcal{R}$

a_{l + 1} = a_{l} - γ_{l} \nabla R (a_{l}), l \geq 0.

$\mathbf{a}_{l+1}=\mathbf{a}_l-\gamma_l \nabla \mathcal{R}(\mathbf{a}_l),\ l \ge 0.$

(γ_{l})_{l}

$(\gamma_l)_l$

R

$\mathcal{R}$

。

a_{l} = (A_{l}^{1}, A_{l}^{2})

$\mathbf{a}_l=(A^1_{l},A^2_{l})$

$\mathcal{R}$ $\nabla_1 \mathcal{R}$ $\mathcal{R}$ $A^1$ $\nabla_2\mathcal{R}$ $\mathcal{R}$ $A^2$ $z_i=A^1(s(A^2(x_i)))$

\nabla_{1} R [1 : M] = - 2 \times \sum_{i = 1}^{n} z_{i} g^{'} (z_{i}) (y_{i} - f (x_{i}))

$\nabla_1 \mathcal{R}[1:M] =-2\times \sum_{i=1}^n z_i g'(z_i) (y_i-f(x_i))$

m = 1, \dots, M

$m=1,\dots,M$

\nabla_{2} R [1 : p, m] = - 2 \times \sum_{i = 1}^{n} x_{i} g^{'} (z_{i}) z_{i} [m] σ^{'} (A^{2} (x_{i}) [m]) (y_{i} - f (x_{i}))

$\nabla_2 \mathcal{R}[1:p,m] =-2\times \sum_{i=1}^n x_i g'(z_i) z_i[m]\sigma'(A^2(x_i)[m]) (y_i-f(x_i))$

Here I used the R notation: $x[a:b]$ is the vector composed of the coordinates of $x$ from index $a$ to index $b$ .

— robin girard
ソース

11

Back-propogation is a way of working out the derivative of the error function with respect to the weights, so that the model can be trained by gradient descent optimisation methods - it is basically just the application of the "chain rule". There isn't really much more to it than that, so if you are comfortable with calculus that is basically the best way to look at it.

If you are not comfortable with calculus, a better way would be to say that we know how badly the output units are doing because we have a desired output with which to compare the actual output. However we don't have a desired output for the hidden units, so what do we do? The back-propagation rule is basically a way of speading out the blame for the error of the output units onto the hidden units. The more influence a hidden unit has on a particular output unit, the more blame it gets for the error. The total blame associated with a hidden unit then give an indication of how much the input-to-hidden layer weights need changing. The two things that govern how much blame is passed back is the weight connecting the hidden and output layer weights (obviously) and the output of the hidden unit (if it is shouting rather than whispering it is likely to have a larger influence). The rest is just the mathematical niceties that turn that intuition into the derivative of the training criterion.

適切な答えを得るには、司教の本もお勧めします！; o）

— ディクラン・マースピアル
ソース

2

これは、フィードフォワード多層ニューラルネットワーク（多層パーセプトロン）をトレーニングするためのアルゴリズムです。Webには、http：//neuron.eng.wayne.edu/bpFunctionApprox/bpFunctionApprox.htmlのように、何が起こっているのかを示す素晴らしいJavaアプレットがいくつかあります。また、NNに関するBishopの本は、NNに関連するすべての標準的な参考資料です。

— スティーブンターナー
ソース

質疑応答の形式で高品質の統計情報の永続的なリポジトリを構築しようとする際に、リンクのみの回答を避けようとします。可能であれば、おそらくリンクで情報の要約を提供することにより、これを展開できますか？

— Glen_b -Reinstate Monica