視覚的な言葉のバッグ

私がやろうとしていること：

一部の画像をローカルおよびグローバル機能を使用して分類しようとしています。

これまでに行ったこと：

各画像のシフト記述子を抽出しました。これをk平均の入力として使用して、すべての画像のすべての機能から語彙を作成しています。ここから、画像のふるい分け機能をk平均法の予測メソッドに渡してクラスターのラベルを取得することで、各画像のクラスターからヒストグラムを作成します。ここから、各ビンのラベルの数をカウントしてヒストグラムを作成します。これでnxm行列ができました。nは画像の数、mは各画像のクラスター（特徴/単語）の数です。

このマトリックスを分類器に送り、画像の分類を取得します。

一言で言えば、ステップ：

各画像にnx128行列を与えるシフト機能記述子を抽出します
すべての機能記述子を1つの大きなリストにスタックします
これらの機能すべてをkmeansアルゴリズム設定k = 100に適合させます。
すべての画像について、そのシフト機能を使用して、同じトレーニング済みkmeansモデルを使用してクラスターのラベルを予測します
ビンの数としてkを使用してクラスターからヒストグラムを作成し、モデルの各ラベルのビンに1を追加します。（画像にシフトからの10個の特徴がある場合、10個のラベルが与えられ、これらの10個のラベルはkの範囲にあるため、各ラベルについて、ヒストグラムの対応するビンに追加します）。
これでnxk行列ができました。nは画像の数、kはクラスターの数です。
次に、ヒストグラムを分類子にフィードし、テストデータを予測するように依頼します。

問題：

Bag of Visual Wordsを正しく実行していますか？

これが私のコードです：

def extract_features(df):
    IF = imageFeatures()
    global_features = []
    sift_features = []
    labels = []
    for i, (index, sample) in enumerate(df.iterrows()):
        image = cv2.imread(sample["location"])
        image = cv2.resize(image, shape)
        hist = IF.fd_histogram(image)
        haralick = IF.fd_haralick(image)
        hu = IF.fd_hu_moments(image)
        lbp = IF.LocalBinaryPatterns(image, 24, 8)
        kp, des = IF.SIFT(image)
        if len(kp) == 0:
            #print (i)
            #print (index)
            #print (sample)
            #return 0
            des = np.zeros(128)
        sift_features.append(des)
        global_feature = np.hstack([hist, haralick, hu, lbp])
        global_features.append(global_feature)
        labels.append(sample["class_id"])
    scaler = MinMaxScaler(feature_range=(0, 1))
    rescaled = scaler.fit_transform(global_features)
    return sift_features, rescaled, labels

def BOVW(feature_descriptors, n_clusters = 100):
    print("Bag of visual words with {} clusters".format(n_clusters))
    #take all features and put it into a giant list
    combined_features = np.vstack(np.array(feature_descriptors))
    #train kmeans on giant list
    print("Starting K-means training")
    kmeans = MiniBatchKMeans(n_clusters=n_clusters, random_state=0).fit(combined_features)
    print("Finished K-means training, moving on to prediction")
    bovw_vector = np.zeros([len(feature_descriptors), n_clusters])#number of images x number of clusters. initiate matrix of histograms
    for index, features in enumerate(feature_descriptors):#sift descriptors in each image
        try:
            for i in kmeans.predict(features):#get label for each centroid
                bovw_vector[index, i] += 1#create individual histogram vector
        except:
            pass
    return bovw_vector#this should be our histogram

if __name__ == '__main__':
    n_clusters = 100
    #set model
    model = GaussianNB()
    image_list = pd.read_csv("image_list.csv")
    image_list_subset = image_list.groupby('class_id').head(80)#image_list.loc[(image_list["class_id"] == 0) | (image_list["class_id"] == 19)]
    shape = (330,230)
    train, test = train_test_split(image_list_subset, test_size=0.1, random_state=42)

    train_sift_features, train_global_features, y_train = extract_features(train)
    train_histogram = BOVW(train_sift_features, n_clusters)
    import matplotlib.pyplot as plt
    plt.plot(train_histogram[100], 'o')
    plt.ylabel('frequency');
    plt.xlabel('features');

    test_sift_features, test_global_features, y_test = extract_features(test)
    test_histogram = BOVW(test_sift_features, n_clusters)

    '''Naive Bays'''
    y_hat = model.fit(train_histogram, y_train).predict(test_histogram)
    print("Number of correctly labeled points out of a total {} points : {}. An accuracy of {}"
          .format(len(y_hat), sum(np.equal(y_hat,np.array(y_test))), 
                  sum(np.equal(y_hat,np.array(y_test)))/len(y_hat)))

— ケビン
ソース

反対票を投じた場合は、その理由を説明してください。

— ケビン

私は反対票を投じませんでしたが、コードに問題があると思われる理由を知ることは非常に役に立ちます。一般的なコードのレビューを求めていますか？BOVWテスト目的で、関数へのサンプル入力を提供できますか？

— E_net4、Rustacean、18年

@ E_net4私はコンセプトが正しいことを確認しようとしています。その理由は、BOVWが結果をあまり改善しないように思われるためです。これには多くの理由が考えられます。データが悪いか、十分なクラスターがないか、機能が良くない可能性があります。私のアプローチが正しいことを確認したいだけです。より簡潔な例を提供できますが、データも提供する必要があります。これを行う方法はありますか？たぶん私はnumpyを使用していくつかのデータを生成できますか？

— ケビン

パフォーマンスと何を比較しているのか教えてください。そして、あなたが使用しているデータには？

— Tony Knapp、

あなたの質問に答える最良の方法は、メソッドを紹介した元の論文に行くことです：

「バッグのキーポイントによる視覚的分類」（2004）

記事は長くなく、わかりやすい方法で書かれています。あなたの質問については、最初の6ページだけを読むことができます。

記事「バッグのキーポイントによる視覚的な分類」からの引用：

この方法の主な手順は次のとおりです。

•画像パッチの検出と説明

•パッチ記述子を、ベクトル量子化アルゴリズムを使用して、事前に定義された一連のクラスター（語彙）に割り当てる

•各クラスターに割り当てられたパッチの数をカウントする、キーポイントのバッグの構築

•マルチクラス分類子を適用し、キーポイントのバッグを特徴ベクトルとして扱い、画像に割り当てるカテゴリを決定します。

— Mark.F
ソース