距離（ユークリッド）を類似度スコアに変換する方法

13

私はを使用して $k$ クラスタリングを行い、話者の声をクラスタリングします。発話をクラスター化されたスピーカーデータと比較すると、（ユークリッド距離に基づく）平均歪みが得られます。この距離は、範囲になります $[0,\infty]$ 。私はこの距離を変換したい $[0,1]$ 類似性スコア。これを達成する方法を教えてください。

— ムハンマド
ソース

15

$d(p_1,p_2)$ がポイント $p_1$ からポイントまでのユークリッド距離を表す場合 $p_2$ 、

\frac{1}{1 + d (p_{1}, p_{2})}

$\frac{1}{1 + d(p_1, p_2)}$

一般的に使用されます。

— TrynnaDoStat
ソース

我々が持っている場合、私は、間違っているなら、私を修正してください

X = (x_{1}, x_{2}, x_{3}, . . ., x_{t})

$X = (x_1,x_2,x_3,...,x_t)$ と

Y = (Y_{1}, Y_{2}, Y_{3}, . . ., Y_{n})

$Y = (Y_1,Y_2,Y_3,...,Y_n)$ ここで、各

x

$x$ および

y

$y$ の次元は

D

$D$ です。次に、

などの類似性を定義できます。

。

S i m i l a r i t y = \frac{1}{t} \sum_{i = 1}^{t} \frac{1}{1 + m i n D i s t a n c e (x_{i}, Y)}

$Similarity = \frac{1}{t} \sum\limits_{i=1}^t \frac{1}{ 1+ minDistance(x_i, Y)}$

— ムハンマド

分母のプラス1はゼロ除算エラーを回避するためのものであることを理解しています。しかし、プラス1の値は、1より大きいd（p1、p2）値に不釣り合いに影響し、最終的に類似性スコアを大幅に低下させることがわかりました。これを行う別の方法はありますか？たぶんs = 1-d（p1、p2）

— aamir23

9

次を使用することもできます： $\frac{1}{e^{dist}}$ ここdistで、目的の距離関数を指定します。

— 未処理の例外
ソース

あなたがそれを見つけたこの方程式に関連する参考書/文書を教えてください。@Dougal

— Justlife

@AnimeshKumarPaulこの答えは書きませんでしたが、フォーマットを改善しただけです。しかし、たとえば「一般化されたRBFカーネル」のバージョンとして頻繁に使用されます。ここを参照してください。その質問は、出力が正定値カーネルかどうかに関するものです。ただし、それを気にしない場合は、少なくとも、遠いポイントほど類似性が低いという直感的な概念を満たします。

— ドゥーガル

@Justlife：この「距離の百科事典」のためにGoogleがPDFドキュメントで結果を選びます。

— 未処理の例外

6

余弦類似性に似たものが欲しいように思えますが、それ自体が単位区間の類似性スコアです。実際、ユークリッド距離とコサイン類似度の間には直接的な関係が存在します！

それを守っ

| | x - x^{'} | |^{2} = (x - x^{'})^{T} (x - x^{'}) = | | x | | + | | x^{'} | | - 2 | | x - x^{'} | | .

$||x-x^\prime||^2=(x-x^\prime)^T(x-x^\prime)=||x||+||x^\prime||-2||x-x^\prime||.$

コサイン類似度はここで、はと間の角度です。

f (x, x^{'}) = \frac{x^{T} x^{'}}{| | x | | | | x^{'} | |} = \cos (θ)

$f(x,x^\prime)=\frac{x^T x^\prime}{||x||||x^\prime||}=\cos(\theta)$

θ

$\theta$

x

$x$

x^{'}

$x^\prime$

When $||x||=||x^\prime||=1,$ we have

| | x - x^{'} | |^{2} = 2 (1 - f (x, x^{'}))

$||x-x^\prime||^2=2(1-f(x,x^\prime))$ and

f (x, x^{'}) = x^{T} x^{'},

$f(x,x^\prime)=x^T x^\prime,$

so

1 - \frac{| | x - x^{'} | |^{2}}{2} = f (x, x^{'}) = \cos (θ)

$1-\frac{||x-x^\prime||^2}{2}=f(x,x^\prime)=\cos(\theta)$ in this special case.

From a computational perspective, it may be more efficient to just compute the cosine, rather than Euclidean distance and then perform the transformation.

— Sycorax says Reinstate Monica
ソース

I'm confused by your notation here. Is

‖ x, x^{'} ‖^{2}

$\lVert x, x' \rVert^2$ supposed to be

‖ x - x^{'} ‖^{2}

$\lVert x - x' \rVert^2$ (in which case I think the relation is incorrect, as it doesn't account for

‖ x ‖

$\lVert x \rVert$ or

‖ x^{'} ‖

$\lVert x' \rVert$ ), or something based on

⟨ x, x^{'} ⟩

$\langle x, x' \rangle$ ? The cosine similarity I'm familiar with is simply

x^{T} x^{'} / (‖ x ‖ ‖ x^{'} ‖)

$x^T x' / (\lVert x \rVert \lVert x' \rVert)$ , though Wikipedia says the "angular similarity"

1 - \frac{2}{π} \frac{x^{T} x^{'}}{‖ x ‖ ‖ x^{'} ‖}

$1 - \frac2\pi \frac{x^T x'}{\lVert x \rVert \lVert x' \rVert}$ is also sometimes called that.

— Dougal

@Dougal Blah. Correct. I've revised to make it intelligible.

— Sycorax says Reinstate Monica

Cool. Note though that since the OP said distances are unbounded, it seems like we don't have

‖ x ‖ = 1

$\lVert x \rVert = 1$ . Also, your expansion of

‖ x - x^{'} ‖^{2}

$\lVert x - x' \rVert^2$ is mistaken; it should be

‖ x ‖^{2} + ‖ x^{'} ‖^{2} - 2 x^{T} x^{'}

$\lVert x \rVert^2 + \lVert x' \rVert^2 - 2 x^T x'$ , though the rest of your post handles it correctly. :)

— Dougal

3

How about a Gaussian kernel ?

$K(x, x') = \exp\left( -\frac{\| x - x' \|^2}{2\sigma^2} \right)$

The distance $\|x - x'\|$ is used in the exponent. The kernel value is in the range $[0, 1]$ . There is one tuning parameter $\sigma$ . Basically if $\sigma$ is high, $K(x, x')$ will be close to 1 for any $x, x'$ . If $\sigma$ is low, a slight distance from $x$ to $x'$ will lead to $K(x,x')$ being close to 0.

— wij
ソース

1

Note that this answer and @Unhandled exception's are very related: this is

\exp (- γ d (x, x^{'})^{2})

$\exp\left( - \gamma d(x, x')^2 \right)$ , where that one [introducing a scaling factor] is

\exp (- γ d (x, x^{'}))

$\exp\left( - \gamma d(x, x') \right)$ , a Gaussian kernel with

\sqrt{d}

$\sqrt{d}$ as the metric. This will still be a valid kernel, though the OP doesn't necessarily care about that.

— Dougal

0

If you are using a distance metric that is naturally between 0 and 1, like Hellinger distance. Then you can use 1 - distance to obtain similarity.

— Brad
ソース