ロジスティック回帰と尤度を理解する


12

パラメーター推定/ロジスティック回帰のトレーニングは実際にどのように機能しますか?これまでに手に入れたものを入れようとします。

  1. 出力はyであり、xの値に応じた確率の形でのロジスティック関数の出力:
    P(y=1|x)=11+eωTxσ(ωTx)
    P(y=0|x)=1P(y=1|x)=111+eωTx
  2. 1つの次元について、いわゆるオッズは次のように定義されます
    p(y=1|x)1p(y=1|x)=p(y=1|x)p(y=0|x)=eω0+ω1x
  3. 次にlog、線形形式でW_0およびW_1を取得する関数を追加します
    Logit(y)=log(p(y=1|x)1p(y=1|x))=ω0+ω1x
  4. さて問題の部分へ 尤度の使用(Big X is y)
    L(X|P)=i=1,yi=1NP(xi)i=1,yi=0N(1P(xi))
    y = 1の確率を2回考慮している理由は誰にもわかりますか?以来:
    P(y=0|x)=1P(y=1|x)

そして、それからωの値をどのように取得しますか

回答:


10

一般的に、フォームのモデルを取ることにしたと仮定します

P(y=1|X=x)=h(x;Θ)

いくつかのパラメータ。次に、その可能性を書き留めます。つまり、Θ

L(Θ)=i{1,...,N},yi=1P(y=1|x=x;Θ)i{1,...,N},yi=0P(y=0|x=x;Θ)

と同じです

L(Θ)=i{1,...,N},yi=1P(y=1|x=x;Θ)i{1,...,N},yi=0(1P(y=1|x=x;Θ))

今、あなたは「仮定する」ことに決めました(モデル)

P(y=1|X=x)=σ(Θ0+Θ1x)

ここで、

σ(z)=1/(1+ez)

あなただけの可能性のための式を計算し、見つけるために最適化アルゴリズムのいくつかの種類を行うようargmaxΘL(Θ)、例えば、ニュートン法または他の勾配ベースの方法を。

Notce that sometimes, people say that when they are doing logistic regression they do not maximize a likelihood (as we/you did above) but rather they minimize a loss function

l(Θ)=i=1Nyilog(P(Yi=1|X=x;Θ))+(1yi)log(P(Yi=0|X=x;Θ))

but notice that log(L(Θ))=l(Θ).

This is a general pattern in Machine Learning: The practical side (minimizing loss functions that measure how 'wrong' a heuristic model is) is in fact equal to the 'theoretical side' (modelling explicitly with the P-symbol, maximizing statistical quantities like likelihoods) and in fact, many models that do not look like probabilistic ones (SVMs for example) can be reunderstood in a probabilistic context and are in fact maximizations of likelihoods.


@Werner thanks for your answer. But I still need a bit of clarification.1st can you please explain what on earth the 2 stay for in the definition of L(θ) since as far I understood it I'm interessted in the case of yi=1. and how can get the values of ω1 and ω0 thanks a lot dor your help !
Engine

@Engine: The big 'pi' is a product... like a big Sigma Σ is a sum... do you understand or do you need more clarification on that as well? On the second question: Lets say we want to minimize a function f(x)=x2 and we start at x=3 but let us assume that we do not know/can not express / can not visualize f as it is to complicated. Now the derivative of f is f=2x. Interestingly if we are right from the minimum x=0 it points to the right and if we are left of it it points left. Mathematically the derivative points into the direction of the 'strongest ascend'
Fabian Werner

@Engine: In more dimensions you replace the derivative by the gradient, i.e. you start off at a random point x0 and compute the gradient f at x and if you want to maximize then your next point x1 is x1=x0+f(x0). Then you compute f(x1) and you next x is x2=x1+f(x1) and so forth. This is called gradient ascend/descent and is the most common technique in maximizing a function. Now you do that with L(Θ) or in your notation L(ω) in order to find the ω that maxeimizes L
Fabian Werner

@Engine: You are not at all interested in the case y=1! You are interested in 'the' ω that 'best explains your data'. From thet ω aou let the model 'speak for itself' and get back to the case of y=1 but first of all you need to setup a model! Here, 'best explains' means 'having the highest likelihood' because that is what people came up with (and I think it is very natural)... however, there are other metrics (different loss functions and so on) that one could use! There are two products because we want the model to explain the y=1 as well as the y=0 'good'!
Fabian Werner

8

Your likelihood function (4) consists of two parts: the product of the probability of success for only those people in your sample who experienced a success, and the product of the probability of failure for only those people in your sample who experienced a failure. Given that each individual experiences either a success or a failure, but not both, the probability will appear for each individual only once. That is what the ,yi=1 and ,yi=0 mean at the bottom of the product signs.

The coefficients are included in the likelihood function by substituting (1) into (4). That way the likelihood function becomes a function of ω. The point of maximum likelihood is to find the ω that will maximize the likelihood.


thanks so much for your answer, sorry but still don't get it. isn'tyi=0 means the probability that y =0[Don't occure] for all y's of the product. and vis versa for y_i=1. And still after the subtitutiing of how can I find ω values, caclulating the 2nd derivative ? or gradient ? thanks a lot for your help !
Engine

i=1,y=1N should be read as "product for persons i=1 till N, but only if y=1. So the first part only applies to those persons in your data that experienced the event. Similarly, the second part only refers to persons who did not experienced the event.
Maarten Buis

There are many possible algorithms for maximizing the likelihood function. The most common one, the Newton-Raphson method, indeed involves computing the first and second derivatives.
Maarten Buis
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.