Pytorch CNNで損失が減少しない

私はPytorchでタスクのCNNを実行していますが、学習して精度を向上させることはできません。ここに投稿できるように、MNISTデータセットを使用するバージョンを作成しました。なぜ機能しないのかについての答えを探しています。アーキテクチャはすばらしいです。私はそれをKerasで実装し、3エポック後に92％以上の精度がありました。注：私がMNISTを60x60の画像に再形成したのは、それが私の「実際の」問題にあるためです。

import numpy as np
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
y_train, y_test = y_train.astype('float32'), y_test.astype('float32') 

if torch.cuda.is_available():
    x_train = torch.from_numpy(x_train)[:10_000]
    x_test = torch.from_numpy(x_test)[:4_000] 
    y_train = torch.from_numpy(y_train)[:10_000] 
    y_test = torch.from_numpy(y_test)[:4_000]


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024) 
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 1)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1) 
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = torch.sigmoid(self.fc3(x))
        return x


net = ConvNet()

optimizer = optim.Adam(net.parameters(), lr=0.03)

loss_function = nn.BCELoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        return x_train[index], y_train[index].unsqueeze(0)

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        return x_test[index], y_test[index].unsqueeze(0)

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in train_loader: 
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()        
        running_loss += loss.item()        
    else:
        test_loss = 0
        accuracy = 0        

        with torch.no_grad():
            for images, labels in test_loader: 
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)                
                ps = torch.exp(log_ps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class.type('torch.LongTensor') == labels.type(torch.LongTensor).view(*top_class.shape)
                accuracy += torch.mean(equals.type('torch.FloatTensor'))
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

python conv-neural-network pytorch

— ニコラ・ジェルヴェ
ソース

comp.ai.neural-netsよくあるご質問は、お使いのニューラルネットが学習されていない場合に見える場所に関するいくつかの素晴らしい提案を持っています。そこから始めることをお勧めします。

— Ari Cooper-Davis、

損失関数、ネットワークの出力形状、ターゲットラベルはここでは意味がありません（この組み合わせは少なくとも間違っています）。MNISTには10個のクラスがあり、ラベルは0から9までの整数です。BCELossは、ターゲットごとに0から1までの単一の値を想定しています。代わりに10ロジットを出力する必要があり（必ずしもS字型である必要はありません）、nn.CrossEntropyLoss単一クラスの分類問題に使用します。nn.CrossEntropyLosssoftmaxとNLLLossの両方を1つの操作として適用するため、最初にsoftmaxを実行しないでください。

— jodag

または、回帰問題、つまり出力として実際の数を推定する場合（分類タイプの問題には推奨されません）を試すこともできますがnn.MSELoss、シグモイド出力の範囲内に収まるようにターゲットを調整するか、またはしないでください。 t最終層の後にシグモイドを適用します。

— jodag

このように動作するようにしましたか？今、それは私に言っていRuntimeError: multi-target not supportedます。

— Nicolas Gervais

ラベルの次元を絞る必要があります（バッチサイズの整数の1Dテンソルである必要があります）

— jodag

回答:

まず主要な問題...

1.このコードの主な問題は、分類に間違った出力形状と間違った損失関数を使用していることです。

nn.BCELossバイナリクロスエントロピー損失を計算します。これは、0または1（したがってバイナリ）のいずれかである1つ以上のターゲットがある場合に適用されます。あなたの場合、ターゲットは0から9までの単一の整数です。潜在的なターゲット値は少数しかないため、最も一般的なアプローチは、カテゴリカルクロスエントロピー損失（nn.CrossEntropyLoss）を使用することです。クロスエントロピー損失の「理論的」定義では、ネットワーク出力とターゲットの両方が10次元ベクトルであることが想定されており、ターゲットは1つの場所（ワンホットエンコード）を除いてすべてゼロです。ただし、計算の安定性とスペース効率の理由から、pytorchはnn.CrossEntropyLoss直接整数をターゲットとして使用します。しかしながら、ネットワークからの10次元の出力ベクトルを提供する必要があります。

# pseudo code (ignoring batch dimension)
loss = nn.functional.cross_entropy_loss(<output 10d vector>, <integer target>)

コードでこの問題を修正するにはfc3、10次元の機能を出力する必要があり、ラベルは（floatではなく）整数である必要があります。また、.sigmoidpytorchのクロスエントロピー損失関数は、最終的な損失値を計算する前に内部的にlog-softmaxを適用するため、fc3で使用する必要はありません。

2. Serget Dymchenkoによって指摘されたように、ネットワークをeval推論train中にモードに、列車中にモードに切り替える必要があります。これは主にドロップアウトレイヤーとbatch_normレイヤーに影響します。これらは、トレーニング中と推論中に異なる動作をするためです。

3.学習率0.03はおそらく少し高すぎます。これは0.001の学習率で問題なく機能し、2、3の実験でトレーニングが0.03で発散するのを見ました。

これらの修正に対応するには、いくつかの変更を行う必要があります。コードの最小限の修正を以下に示します。変更された行にはコメントを付け、####その後に変更の簡単な説明を続けました。

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.autograd import Variable
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()


def resize(pics):
    pictures = []
    for image in pics:
        image = Image.fromarray(image).resize((dim, dim))
        image = np.array(image)
        pictures.append(image)
    return np.array(pictures)


dim = 60

x_train, x_test = resize(x_train), resize(x_test) # because my real problem is in 60x60

x_train = x_train.reshape(-1, 1, dim, dim).astype('float32') / 255
x_test = x_test.reshape(-1, 1, dim, dim).astype('float32') / 255
#### float32 -> int64
y_train, y_test = y_train.astype('int64'), y_test.astype('int64')

#### no reason to test for cuda before converting to numpy

#### I assume you were taking a subset for debugging? No reason to not use all the data
x_train = torch.from_numpy(x_train)
x_test = torch.from_numpy(x_test)
y_train = torch.from_numpy(y_train)
y_test = torch.from_numpy(y_test)


class ConvNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5*5*128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        #### 1 -> 10
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        #### removed sigmoid
        x = self.fc3(x)
        return x


net = ConvNet()

#### 0.03 -> 1e-3
optimizer = optim.Adam(net.parameters(), lr=1e-3)

#### BCELoss -> CrossEntropyLoss
loss_function = nn.CrossEntropyLoss()


class FaceTrain:

    def __init__(self):
        self.len = x_train.shape[0]
        self.x_train = x_train
        self.y_train = y_train

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_train[index], y_train[index]

    def __len__(self):
        return self.len


class FaceTest:

    def __init__(self):
        self.len = x_test.shape[0]
        self.x_test = x_test
        self.y_test = y_test

    def __getitem__(self, index):
        #### .unsqueeze(0) removed
        return x_test[index], y_test[index]

    def __len__(self):
        return self.len


train = FaceTrain()
test = FaceTest()

train_loader = DataLoader(dataset=train, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test, batch_size=64, shuffle=True)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    #### put net in train mode
    net.train()
    for idx, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        #### put net in eval mode
        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                #### removed torch.exp() since exponential is monotone, taking it doesn't change the order of outputs. Similarly with torch.softmax()
                top_p, top_class = log_ps.topk(1, dim=1)
                #### convert to float/long using proper methods. what you have won't work for cuda tensors.
                equals = top_class.long() == labels.long().view(*top_class.shape)
                accuracy += torch.mean(equals.float())
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

トレーニングの結果は今です...

[Epoch: 1/10]  [Training Loss: 0.139]  [Test Loss: 0.046]  [Test Accuracy: 0.986]
[Epoch: 2/10]  [Training Loss: 0.046]  [Test Loss: 0.042]  [Test Accuracy: 0.987]
[Epoch: 3/10]  [Training Loss: 0.031]  [Test Loss: 0.040]  [Test Accuracy: 0.988]
[Epoch: 4/10]  [Training Loss: 0.022]  [Test Loss: 0.029]  [Test Accuracy: 0.990]
[Epoch: 5/10]  [Training Loss: 0.017]  [Test Loss: 0.066]  [Test Accuracy: 0.987]
[Epoch: 6/10]  [Training Loss: 0.015]  [Test Loss: 0.056]  [Test Accuracy: 0.985]
[Epoch: 7/10]  [Training Loss: 0.018]  [Test Loss: 0.039]  [Test Accuracy: 0.991]
[Epoch: 8/10]  [Training Loss: 0.012]  [Test Loss: 0.057]  [Test Accuracy: 0.988]
[Epoch: 9/10]  [Training Loss: 0.012]  [Test Loss: 0.041]  [Test Accuracy: 0.991]
[Epoch: 10/10]  [Training Loss: 0.007]  [Test Loss: 0.048]  [Test Accuracy: 0.992]

パフォーマンスとコードを改善する他のいくつかの問題。

4.モデルをGPUに移動することはありません。つまり、GPUアクセラレーションは利用できません。

5. torchvisionすべての標準変換とデータセットを使用して設計され、PyTorchで使用するように構築されています。使用をお勧めします。これにより、コード内のケラへの依存も削除されます。

6.ネットワークのパフォーマンスを向上させるために、平均を差し引き、標準偏差で除算してデータを正規化します。トーチビジョンで使用できますtransforms.Normalize。MNISTはすでに簡単すぎるため、MNISTに大きな違いはありません。しかし、より難しい問題では、それが重要であることが判明しました。

さらに改善されたコードを以下に示します（GPUでははるかに高速です）。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms

dim = 60

class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)

        self.fc1 = nn.Linear(5 * 5 * 128, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2))

        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, 0.5)
        x = self.fc3(x)
        return x


net = ConvNet()
if torch.cuda.is_available():
    net.cuda()

optimizer = optim.Adam(net.parameters(), lr=1e-3)

loss_function = nn.CrossEntropyLoss()

train_dataset = MNIST('./data', train=True, download=True,
                      transform=transforms.Compose([
                          transforms.Resize((dim, dim)),
                          transforms.ToTensor(),
                          transforms.Normalize((0.1307,), (0.3081,))
                      ]))
test_dataset = MNIST('./data', train=False, download=True,
                     transform=transforms.Compose([
                         transforms.Resize((dim, dim)),
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True, num_workers=8)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False, num_workers=8)

epochs = 10
steps = 0
train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    net.train()
    for images, labels in train_loader:
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()
        optimizer.zero_grad()
        log_ps = net(images)
        loss = loss_function(log_ps, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        test_loss = 0
        accuracy = 0

        net.eval()
        with torch.no_grad():
            for images, labels in test_loader:
                if torch.cuda.is_available():
                    images, labels = images.cuda(), labels.cuda()
                log_ps = net(images)
                test_loss += loss_function(log_ps, labels)
                top_p, top_class = log_ps.topk(1, dim=1)
                equals = top_class.flatten().long() == labels
                accuracy += torch.mean(equals.float()).item()
        train_losses.append(running_loss/len(train_loader))
        test_losses.append(test_loss/len(test_loader))
        print("[Epoch: {}/{}] ".format(e+1, epochs),
              "[Training Loss: {:.3f}] ".format(running_loss/len(train_loader)),
              "[Test Loss: {:.3f}] ".format(test_loss/len(test_loader)),
              "[Test Accuracy: {:.3f}]".format(accuracy/len(test_loader)))

トレーニング結果の更新...

[Epoch: 1/10]  [Training Loss: 0.125]  [Test Loss: 0.045]  [Test Accuracy: 0.987]
[Epoch: 2/10]  [Training Loss: 0.043]  [Test Loss: 0.031]  [Test Accuracy: 0.991]
[Epoch: 3/10]  [Training Loss: 0.030]  [Test Loss: 0.030]  [Test Accuracy: 0.991]
[Epoch: 4/10]  [Training Loss: 0.024]  [Test Loss: 0.046]  [Test Accuracy: 0.990]
[Epoch: 5/10]  [Training Loss: 0.020]  [Test Loss: 0.032]  [Test Accuracy: 0.992]
[Epoch: 6/10]  [Training Loss: 0.017]  [Test Loss: 0.046]  [Test Accuracy: 0.991]
[Epoch: 7/10]  [Training Loss: 0.015]  [Test Loss: 0.034]  [Test Accuracy: 0.992]
[Epoch: 8/10]  [Training Loss: 0.011]  [Test Loss: 0.048]  [Test Accuracy: 0.992]
[Epoch: 9/10]  [Training Loss: 0.012]  [Test Loss: 0.037]  [Test Accuracy: 0.991]
[Epoch: 10/10]  [Training Loss: 0.013]  [Test Loss: 0.038]  [Test Accuracy: 0.992]

— ジョダグ
ソース

動いた！私は実際に大きな間違いを犯しました。このMNIST簡略化された問題は10クラスあり、私の問題は2クラスしかありませんでした。だから私はあなたがしたすべてを使うことができませんでした。しかし、あなたの推奨事項をいじくり回して、私はそれを機能させることができたので、ありがとう！

— Nicolas Gervais

モデルをトレインモードでテストしていることに1つ気づきました。net.eval()ドロップアウトを無効にするために呼び出す必要があります（その後、net.train()再度トレインモードに戻すには）。

多分他の問題があります。トレーニングの損失は減少していますか？単一の例にオーバーフィットしようとしましたか？

— セルギ・ディムチェンコ
ソース