ND畳み込み逆プログラム

私の教育では、畳み込みニューラルネットワークにN次元畳み込み層を実装しようとしています。

バックプロパゲーション機能を実装したいのですが。しかし、そのための最も効率的な方法はわかりません。

現在、私は以下を使用signal.fftconvolveしています：

転送ステップでは、フィルターを畳み込み、カーネルがすべてのフィルターを介して転送します。
バックプロパゲーションステップでは、派生物（FlipAllAxes関数ですべての次元で反転）を配列（https://jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/）で畳み込みます。すべてのフィルターとそれらを合計します。私が取る出力は、各フィルターの各導関数で畳み込まれた各画像の合計です。

私は、デリバティブをどのように畳み込むかについて特に混乱しています。以下のクラスを使用してバックプロパゲートすると、ウェイトのサイズが急激に増加します。

微分と出力およびフィルターの畳み込みをプログラムする正しい方法は何ですか？

編集：

この論文（FFTによるたたみ込みネットワークの高速トレーニング）によると、これは私がしたいことを正確に実行しようとするものです。

前のレイヤーの導関数は、現在のレイヤーの導関数と重みの畳み込みによって与えられます。

dL / dy_f = dL / dx * w_f ^ T
重みの導関数は、導関数と元の入力の畳み込みの区分的合計です。

dL / dy = dL / dx * x

私が知っている限りでは、これを以下に実装しました。ただし、このレイヤーを使用して作成したネットワークは、トレーニング中に激しい変動を示すため、これは意図した結果を与えていないようです。

    import numpy as np
    from scipy import signal

    class ConvNDLayer:
        def __init__(self,channels, kernel_size, dim):

            self.channels = channels
            self.kernel_size = kernel_size;
            self.dim = dim

            self.last_input = None

            self.filt_dims = np.ones(dim+1).astype(int)
            self.filt_dims[1:] =  self.filt_dims[1:]*kernel_size
            self.filt_dims[0]= self.filt_dims[0]*channels 
            self.filters = np.random.randn(*self.filt_dims)/(kernel_size)**dim


        def FlipAllAxes(self, array):

            sl = slice(None,None,-1)
            return array[tuple([sl]*array.ndim)] 

        def ViewAsWindows(self, array, window_shape, step=1):
             # -- basic checks on arguments
             if not isinstance(array, cp.ndarray):
                 raise TypeError("`array` must be a Cupy ndarray")
             ndim = array.ndim
             if isinstance(window_shape, numbers.Number):
                  window_shape = (window_shape,) * ndim
             if not (len(window_shape) == ndim):
                   raise ValueError("`window_shape` is incompatible with `arr_in.shape`")

             if isinstance(step, numbers.Number):
                  if step < 1:
                  raise ValueError("`step` must be >= 1")
                  step = (step,) * ndim
             if len(step) != ndim:
                   raise ValueError("`step` is incompatible with `arr_in.shape`")

              arr_shape = array.shape
              window_shape = np.asarray(window_shape, dtype=arr_shape.dtype))

              if ((arr_shape - window_shape) < 0).any():
                   raise ValueError("`window_shape` is too large")

              if ((window_shape - 1) < 0).any():
                    raise ValueError("`window_shape` is too small")

               # -- build rolling window view
                    slices = tuple(slice(None, None, st) for st in step)
                    window_strides = array.strides
                    indexing_strides = array[slices].strides
                    win_indices_shape = (((array.shape -window_shape)
                    // step) + 1)

                 new_shape = tuple(list(win_indices_shape) + list(window_shape))
                 strides = tuple(list(indexing_strides) + list(window_strides))

                  arr_out = as_strided(array, shape=new_shape, strides=strides)

                  return arr_out

        def UnrollAxis(self, array, axis):
             # This so it works with a single dimension or a sequence of them
             axis = cp.asnumpy(cp.atleast_1d(axis))
             axis2 = cp.asnumpy(range(len(axis)))

             # Put unrolled axes at the beginning
             array = cp.moveaxis(array, axis,axis2)
             # Unroll
             return array.reshape((-1,) + array.shape[len(axis):])

        def Forward(self, array):

             output_shape =cp.zeros(array.ndim + 1)    
             output_shape[1:] =  cp.asarray(array.shape)
             output_shape[0]= self.channels 
             output_shape = output_shape.astype(int)
             output = cp.zeros(cp.asnumpy(output_shape))

             self.last_input = array

             for i, kernel in enumerate(self.filters):
                    conv = self.Convolve(array, kernel)
                    output[i] = conv

             return output


        def Backprop(self, d_L_d_out, learn_rate):

            d_A= cp.zeros_like(self.last_input)
            d_W = cp.zeros_like(self.filters)


           for i, (kernel, d_L_d_out_f) in enumerate(zip(self.filters, d_L_d_out)):

                d_A += signal.fftconvolve(d_L_d_out_f, kernel.T, "same")
                conv = signal.fftconvolve(d_L_d_out_f, self.last_input, "same")
                conv = self.ViewAsWindows(conv, kernel.shape)
                axes = np.arange(kernel.ndim)
                conv = self.UnrollAxis(conv, axes)  
                d_W[i] = np.sum(conv, axis=0)


           output = d_A*learn_rate
           self.filters =  self.filters - d_W*learn_rate
           return output

— ジャック・ロルフ
ソース

通常、gradientsにlearn_rateを乗算するだけでは十分ではありません。

パフォーマンスを向上させ、激しい変動を低減するために、過去のいくつかの勾配（RMSprop）で除算するなどの方法でオプティマイザーを使用して勾配をスケーリングします。

更新はまた、エラーに依存します。すべてのサンプルに対して個別にエラーを渡すと、通常はノイズが発生するため、複数のサンプル（ミニバッチ）を平均化することをお勧めします。

— SajanGohil
ソース