機械学習で、下付き文字の代わりに上付き文字が使用されるのはなぜですか？

私はCourseraを通じてAndrew Ngの機械学習に関するコースを受講しています。方程式では、下付き文字の代わりに上付き文字が使用されます。例えば、以下の式で $x^{(i)}$ の代わりに使用される $x_i$ ：

$J(\theta_0, \theta_1) = \frac{1}{2m} \sum\limits_{i=1}^{m}{(h_\theta(x^{(i)}) - y^{(i)})^2}$

どうやら、これは一般的な慣行です。私の質問は、なぜ下付き文字ではなく上付き文字を使用するのですか？上付き文字はすでに累乗に使用されています。かっこが存在するかどうかに注意を払うことで、上付き文字とべき乗のユースケースを明確にすることができるように思えますが、それでも混乱しているようです。

machine-learning notation

— エンターナード
ソース

一部のコンピューターサイエンスの人々は標準的な数学表記に精通していないため、独自の表記法を構成しているためと思われます。アクチュアリーも時々これを行いますが、より複雑な概念に到達するとイライラします。

— -rocinante

されたiデータセットのサイズを超える、またはベクトルの要素の上にインデックスを付けますかx？前者の場合、それは完全に標準です。後者の場合、それは完全に非標準です。そして、上付き文字が使用される理由は、下付き文字を使用してベクトルの要素を参照したい場合があるためです。

— レックスカー

@rocinante lolいいえ、それは添字がすでにベクトルのインデックス付けに使用されているためです。

— ニールG

@rocinanteそれはかなり豪華です。反変ベクトル/ アインシュタイン表記についてはどうですか？

— ウィルヴーデン16

@rocinanteあなたの言葉遣いは不幸だということを強調して、他の人たちをエコーしなければなりません。私たちは皆、ローカルで馴染みのあるものを標準と見なす傾向があります。

— ニックコックス

回答:

場合意味Aベクトル、次にのための標準的な表記法である番目のの座標、すなわち、 $x$ $x \in \mathbb R^m$ $x_i$ $i$ $x$

x = (x_{1}, x_{2}, \dots, x_{m}) \in R^{m} .

$x = (x_1, x_2, \ldots, x_m)\in\mathbb R^m.$

そのようなベクトルのコレクションがある場合、番目のベクトルをどのように示しますか？書くことはできません。これには他の標準的な意味があります。それで時々人々はを書きます、それがAndrew Ngがそれをする理由です。 $n$ $i$ $x_i$ $x^{(i)}$

すなわち

x^{(1)} = (x_{1}^{(1)}, x_{2}^{(1)}, \dots, x_{m}^{(1)}) \in R^{m} x^{(2)} = (x_{1}^{(2)}, x_{2}^{(2)}, \dots, x_{m}^{(2)}) \in R^{m} \dots x^{(n)} = (x_{1}^{(n)}, x_{2}^{(n)}, \dots, x_{m}^{(n)}) \in R^{m} .

$\begin{equation} x^{(1)} = (x_1^{(1)}, x_2^{(1)}, \ldots, x_m^{(1)}) \in \mathbb R^m\\ x^{(2)} = (x_1^{(2)}, x_2^{(2)}, \ldots, x_m^{(2)}) \in \mathbb R^m\\ \ldots \\ x^{(n)} = (x_1^{(n)}, x_2^{(n)}, \ldots, x_m^{(n)}) \in \mathbb R^m.\\ \end{equation}$

— アメーバはモニカを復活させると言う
ソース

私は同意しませんが、多くの場合、繰り返し測定のために

が使用されます。

x_{i j}

$x_{ij}$

— クリフAB

はい、ただし

はmy

と同等です。

と同等のものは何でしょうか？

x_{i j}

$x_{ij}$

x_{j}^{(i)}

$x^{(i)}_j$

x^{(i)}

$x^{(i)}$

— アメーバは、モニカーを復活させる

はい、それは利点です。

と思います

時々使用されますが、これは

と混同される可能性があります。

x_{i .}

$x_{i.}$

\sum_{j = 1}^{n} x_{i j} / m

$\sum_{j= 1}^n x_{ij}/m$

— クリフAB

行列を反復処理する場合は、

が最も直感的な方法です。したがって、ベクトルから行列に移動するときに表記法は一貫したままです。

x_{m n}^{(i)}

$x_{mn}^{(i)}$

— -josh

@JABはい、表記法をより明示的にすることです（あなたが言うように「タイプヒント」）。もちろん1は、使用に同意することができます

ために

番目のベクトルと

ための

番目の要素

番目のベクトル。さまざまな規約がありますが、これはそのうちの1つにすぎません。私はそれが最高のものであるとは言っておらず、その背後にある理論的根拠を説明しているだけです。

x_{i}

$x_i$

i

$i$

x_{i j}

$x_{ij}$

j

$j$

i

$i$

— アメーバは、モニカーを復活させる

The use of super scripts as you have stated I believe is not very common in machine learning literature. I'd have to review Ng's course notes to confirm, but if he's putting that use there, I would say he would be origin of the proliferation of this notation. This is a possibility. Either way, not to be too unkind, but I don't think many of the online course students are publishing literature on machine learning, so this notation is not very common in the actual literature. After all, these are introductory courses in machine learning, not PhD level courses.

What is very common with super scripts is to denote the iteration of an algorithm using super scripts. For example, you could write an iteration of Newton's method as

$\theta^{(t+1)} = \theta^{(t)} - H(\theta^{(t)}) ^{-1} \nabla \theta^{(t)}$

where $H(\theta^{(t)})$ is the Hessian and $\nabla \theta^{(t)}$ is the gradient.

(...yes this is not quite the best way to implement Newton's method due to the inversion of the Hessian matrix...)

Here, $\theta^{(t)}$ represents the value of $\theta$ in the $t^{th}$ iteration. This is the most common (but certainly not only) use of super scripts that I am aware of.

EDIT: To clarify, in the original question, it appeared to suggest that in the ML notation, $x^{(i)}$ was equivalent to statistic's $x_i$ notation. In my answer, I state that this is not truly prevalent in ML literature. This is true. However, as pointed out by @amoeba, there is plenty of superscript notation in ML literature for data, but in these cases $x^{(i)}$ does not typically mean the $i^{th}$ observation of a single vector $x$ .

— Cliff AB
ソース

The clash with the use of parenthesized/bracketed superscripts for iteration counts (a notation that is in common use across a wide range of areas) is a really important thing to raise.

— Glen_b -Reinstate Monica

It is also commonly used to indicate the index of the sample in the training set, which is like the iteration but not exactly the same because you usually end up iterating through your training set many times.

— Rex Kerr

I've also seen iteration counts noted using subscripts (

a_{n + 1} = a_{n} + 1

$a_{n+1} = a_n + 1$ ) as well as in line (

a (n + 1) = a (n) + 1

$a(n+1) = a(n) + 1$ ). Which is why, when using some specific notation, I'll usually put something at the start to disambiguate (e.g. saying "in the following series, blah blah blah" and then putting the math). Thus, whatever notation is in use, readers can (hopefully) intuit the meaning for potentially ambiguous cases rather than having to guess based on the conventions they know.

— JAB

I agree with @JAB. More generally, I don't think it's heinous for people who will be writing and using code to borrow notation from software in mathematical treatments. For example, and contentiously, computing people are way ahead of many mathematical groups in using clean notation such as

(x > 0)

$(x > 0)$ , to be evaluated as 1 if true and 0 if false, instead of unnecessary formalisms such as

I (x > 0)

$I(x > 0)$ ; here I am merely following behind Donald Knuth.

— Nick Cox

@NickCox I generally only see the

I (x > 0)

$I(x > 0)$ form when it comes to probability; otherwise,

x > 0

$x > 0$ is just an inequality constraint. When it comes to mathematical equations, they're either broken up into piecewise representations or they just represent the equation itself as an inequality as doing otherwise would induce ambiguity. (It's similar to how

=

$=$ in math is more subtle than either = or == in most programming languages; it introduces a constraint or definition rather than an actual assignment or equality check.)

— JAB

Superscripts are already used for exponentiation.

In mathematics superscripts are used left and right depending on the field. The choice is always historical legacy, nothing more. Whoever first got into the field set the convention of using sub- or superscripts.

Two examples. Superscripts are used to denote derivatives: $f(x)^{(n)}$

In tensor algebra both super and subscripts are used heavily for the same thing like $R^i_i$ could mean $i$ rows and $j$ columns. It's quite expressive: $T_i^k=R_i^jC_j^k$

Also I remember using scripts before letters (prescripts) in Physics, e.g. $^i_jB_k^l$ . I think it was with tensors.

Hence, the choice of superscripts by Ng is purely historical too. There's no real reason to use or not use them, or prefer them to subscripts. Actually, I believe that here ML people are using tensor notation. They definitely are well versed in the subject, e.g. see this paper.

— Aksakal
ソース

Another example for your point: Einstein notation

— Neil G