時系列のLjung-Boxテストで使用するラグの数は?


20

ARMAモデルが時系列に適合した後、Ljung-Box portmanteauテスト(他のテストの中でも)を介して残差を確認するのが一般的です。Ljung-Boxテストは、p値を返します。パラメータhがあり、これはテストするラグの数です。一部のテキストでは、h = 20の使用が推奨されています。他の人はh = ln(n)の使用を推奨します。ほとんどが何を言っていない時間を使用します。

hに単一の値を使用するのではなく、すべてのh <50 に対してLjung-Boxテストを行い、最小のp値を与えるhを選択するとします。そのアプローチは合理的ですか?長所と短所は何ですか?(明らかな欠点の1つは計算時間の増加ですが、ここでは問題ではありません。)これに関する文献はありますか?

少し詳しく説明します。...テストがすべてのhに対してp> 0.05を与える場合、明らかに時系列(残差)がテストに合格します。私の質問は、他の値ではなくhのいくつかの値に対してp <0.05の場合に検定を解釈する方法に関するものです。


1
@ user2875、回答を削除しました。事実、が大きい場合、テストは信頼できません。したがって、答えは、どのに対してかによって異なります。さらに正確な値は何ですか?しきい値を減らすと、テストの結果は変わりますか?個人的に矛盾する仮説の場合、モデルが良いかどうかにかかわらず、他の指標を探します。モデルの適合度 モデルは代替モデルとどのように比較されますか?代替モデルにも同じ問題がありますか?他のどの違反に対して、テストはnullを拒否しますか?hhp<0.05p0.01
mpiktas

1
@ mpiktas、Ljung-Box検定は、分布が漸近的に(hが大きくなるにつれて)カイ2乗する統計に基づいています。ただし、nに対してhが大きくなると、検定の検出力は0に減少します。したがって、分布がカイ2乗に近いが有用な検出力を得るには十分小さいhを選択する必要があります。(hが小さい場合、偽陰性の危険性はわかりません。)
user2875

@ user2875、これは3回目の質問の変更です。最初に、最小値でを選択する戦略について質問し、次に、いくつかの値について場合にテストを解釈する方法を尋ね、次に最適なを選択します。3つの質問にはすべて異なる回答があり、特定の問題のコンテキストに応じて異なる回答がある場合もあります。hp<0.05hh
mpiktas

@mpiktas、質問はすべて同じで、見方が違うだけです。(指摘したように、すべてのhについてp> 0.05の場合、最小のpの解釈方法がわかります。最適なhがわかっていれば(わかっていない場合)、最小のpを選択することに関心はありません)。
user2875

回答:


9

答えは間違いなく次の要素に依存します:テストを実際に使用しようとしているのは何ですか?Q

一般的な理由は次のとおりです。多かれ少なかれあることを確信してラグをなし自己相関までの帰無仮説の共同統計的有意性についての(あるいはあなたが近くに何かを持っていると仮定して、弱い白色雑音)および構築するために倹約的なモデルを、少しとして持ちます可能なパラメーターの数。h

通常、時系列データには自然な季節パターンがあるため、実用的な経験則はをこの値の2倍に設定することです。もう1つは、予測のニーズにモデルを使用する場合の予測期間です。最後に、後の遅れでいくつかの重要な逸脱が見つかった場合は、修正について考えてみてください(これは、季節的な影響が原因であるか、外れ値のデータが修正されていない可能性があります)。h

hに単一の値を使用するのではなく、すべてのh <50に対してLjung-Boxテストを実行し、最小のp値を与えるhを選択するとします。

これは共同有意性検定であるため、の選択がデータ駆動型の場合、それがもちろんよりもはるかに小さいと仮定して、(あなたが言及したテストの)。シンプルでありながら関連性のあるモデルを見つけるために、以下で説明する情報基準を提案します。hhn

私の質問は、他の値ではなく、いくつかの値について場合、テストをどのように解釈するかに関するものです。時間p<0.05h

そのため、それは現在からどれだけ離れているかに依存します。遠く離れたデメリット:推定するパラメーターが増え、自由度が低くなり、モデルの予測力が低​​下します。

出発が発生するラグでMAおよび/またはARパーツを含むモデルを推定し、さらに情報基準(サンプルサイズに応じてAICまたはBICのいずれか)を調べてみてください。これにより、どのモデルがより多くの洞察が得られますpar約。ここでは、サンプル外の予測演習も歓迎します。


+1、これは私が表現しようとしていたものですが、できませんでした:)
mpiktas

8

すべての通常のプロパティを使用して、単純なAR(1)モデルを指定すると仮定します。

yt=βyt1+あなたはt

エラー項の理論的共分散を次のように示します

γjE(ututj)

エラー項を観察できる場合、エラー項のサンプル自己相関は次のように定義されます。

ρ~jγ~jγ~0

どこで

γ~j1nt=j+1nututj,j=0,1,2 ...

しかし、実際には、エラー用語は観察されません。したがって、誤差項に関連するサンプルの自己相関は、推定からの残差を使用して推定されます。

γ^j1nt=j+1nあなたは^tあなたは^tjj=012 ...

Box-Pierce Q統計(Ljung-Box Qは、漸近的に中立なスケーリングバージョンです)

QBP=nj=1pρ^j2=j=1p[nρ^j]2dχ2p

問題は、このモデルでが漸近的にカイ2乗分布(誤差項のオートコレレーションのヌルの下)を持つと言えるかどうかです。 これを実現するために、QBP
漸近的に標準正規でなければなりません。これを確認する方法は、nρ^j同様の漸近分布を有するnρ^(真のエラーを使用して構築し、そう帰無仮説の下で所望の漸近挙動を有しています)。nρ~

私たちはそれを持っています

u^t=ytβ^yt1=ut(β^β)yt1

どこβは、一貫性の推定量です。そうβ^

γ^j1nt=j+1n[ut(β^β)yt1][utj(β^β)ytj1]

=γ~j1nt=j+1n(β^β)[utytj1+utjyt1]+1nt=j+1n(β^β)2yt1ytj1

サンプルは定常的でエルゴード的であると想定され、モーメントは目的の順序まで存在すると想定されます。推定ので、βが一致している2つの合計がゼロに行くのは、これは十分にあります。結論β^

γ^jpγ~j

これは、

ρ^jpρ~jpρj

しかし、これは自動的に収束に nρ^jnρ~j(配布中)(変換は確率変数に適用されるので、定理が、ここでは適用されない連続マッピングがに依存していることだと思う)。これを実現するには、次のものが必要ですn

nγ^jdnγ~j

(分母それは私たちの問題に中立であるように、-tildeまたはHAT-は、両方のケースで誤差項の分散に収束します)。γ0

我々は持っています

nγ^j=nγ~j1nt=j+1nn(β^β)[utytj1+utjyt1]+1nt=j+1nn(β^β)2yt1ytj1

したがって、問題は次のとおりです。これらの2つの合計を行い、今では√を掛けます、確率がゼロになるため、が残ります。n漸近?nγ^j=nγ~j

2番目の合計については

1nt=j+1nn(β^β)2yt1ytj1=1nt=j+1n[n(β^β)][(β^β)yt1ytj1]

以来確率変数に収束し、 βは一貫している、これはゼロになります。[n(β^β)]β^

最初の合計については、ここにもあります確率変数に収束し、その我々が持っているので、 [n(β^β)]

1nt=j+1n[utytj1+utjyt1]pE[utytj1]+E[utjyt1]

The first expected value, E[utytj1] is zero by the assumptions of the standard AR(1) model. But the second expected value is not, since the dependent variable depends on past errors.

So nρ^j won't have the same asymptotic distribution as nρ~j. But the asymptotic distribution of the latter is standard Normal, which is the one leading to a chi-squared distribution when squaring the r.v.

Therefore we conclude, that in a pure time series model, the Box-Pierce Q and the Ljung-Box Q statistic cannot be said to have an asymptotic chi-square distribution, so the test loses its asymptotic justification.

This happens because the right-hand side variable (here the lag of the dependent variable) by design is not strictly exogenous to the error term, and we have found that such strict exogeneity is required for the BP/LB Q-statistic to have the postulated asymptotic distribution.

Here the right-hand-side variable is only "predetermined", and the Breusch-Pagan test is then valid. (for the full set of conditions required for an asymptotically valid test, see Hayashi 2000, p. 146-149).


1
You wrote "But the second expected value is not, since the dependent variable depends on past errors." That's called strict exogeneity. I agree that it's a strong assumption, and you can build AR(p) framework without it, just by using weak exogeneity. This the reason why Breusch-Godfrey test is better in some sense: if the null is not true, then B-L loses power. B-G is based on weak exogeneity. Both tests are not good for some common econometric, applications, see e.g. this Stata's presentation, p. 4/44.
Aksakal

3
@Aksakal Thanks for the reference. The point exactly is that without strict exogeneity, the Box-Pierce/Ljung-Box do not have an asymptotic chi-square distribution, this is what the mathematics above show. Weak exogeneity (which holds in the above model) is not enough for them. This is exactly what the presentation you link to says in p. 3/44.
Alecos Papadopoulos

2
@AlecosPapadopoulos, an amazing post!!! Among the few best ones I have encountered here at Cross Validated. I just wish it would not disappear in this long thread and many users would find and benefit from it in the future.
Richard Hardy

3

Before you zero-in on the "right" h (which appears to be more of an opinion than a hard rule), make sure the "lag" is correctly defined.

http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm

Quoting the section below Issue 4 in the above link:

"....The p-values shown for the Ljung-Box statistic plot are incorrect because the degrees of freedom used to calculate the p-values are lag instead of lag - (p+q). That is, the procedure being used does NOT take into account the fact that the residuals are from a fitted model. And YES, at least one R core developer knows this...."

Edit (01/23/2011): Here's an article by Burns that might help:

http://lib.stat.cmu.edu/S/Spoetry/Working/ljungbox.pdf


@bil_080, the OP does not mention R, and help page for Box.test in R mentions the correction and has an argument to allow for the correction, although you need to supply it manualy.
mpiktas

@mpiktas, Oops, you're right. I assumed this was an R question. As for the second part of your comment, there are several R packages that use Ljung-Box stats. So, it's a good idea to make sure the user understands what the package's "lag" means.
bill_080

Thanks--I am using R, but the question is a general one. Just to be safe, I was doing the test with the LjungBox function in the portes package, as well as Box.test.
user2875

2

The thread "Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey" shows that the Ljung-Box test is essentially inapplicable in the case of an autoregressive model. It also shows that Breusch-Godfrey test should be used instead. That limits the relevance of your question and the answers (although the answers may include some generally good points).


The trouble with LB test is when autoregressive models have other regressors, i.e. ARMAX not ARM models. OP explicitly states ARMA not ARMAX in the question. Hence, I think that your answer is incorrect.
Aksakal

@Aksakal, I clearly see from Alecos Papadopoulos answer (and comments under it) in the above-mentioned thread that Ljung-Box test is inapplicable in both cases, i.e. pure AR/ARMA and ARX/ARMAX. Therefore, I cannot agree with you.
Richard Hardy

Alecos Papadopoulos's answer is good, but incomplete. It points out to Ljung-Box test's assumption of strict exogeneity but it fails to mention that if you're fine with the assumption, then L-B test is Ok to use. B-G test, which he and I favor over L-B, relies on weak exogeneity. It's better to use tests with weaker assumptions in general, of course. However, even B-G test's assumptions are too strong in many cases.
Aksakal

@Aksakal, The setting of this question is quite definite -- it considers residuals from an ARMA model. The important thing here is, L-B does not work (as shown explicitly in Alecos post in this as well as the above-cited thread) while B-G test does work. Of course, things can happen in other settings (even B-G test's assumptions are too strong in many cases) -- but that is not the concern in this thread. Also, I did not get what the assumption is in your statement if you're fine with the assumption, then L-B test is Ok to use. Is that supposed to invalidate Alecos point?
Richard Hardy

1

Escanciano and Lobato constructed a portmanteau test with automatic, data-driven lag selection based on the Pierce-Box test and its refinements (which include the Ljung-Box test).

The gist of their approach is to combine the AIC and BIC criteria --- common in the identification and estimation of ARMA models --- to select the optimal number of lags to be used. In the introduction of they suggest that, intuitively, ``test conducted using the BIC criterion are able to properly control for type I error and are more powerful when serial correlation is present in the first order''. Instead, tests based on AIC are more powerful against high order serial correlation. Their procedure thus choses a BIC-type lag selection in the case that autocorrelations seem to be small and present only at low order, and an AIC-type lag section otherwise.

The test is implemented in the R package vrtest (see function Auto.Q).


1

The two most common settings are min(20,T1) and lnT where T is the length of the series, as you correctly noted.

The first one is supposed to be from the authorative book by Box, Jenkins, and Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.. However, here's all they say about the lags on p.314: enter image description here

It's not a strong argument or suggestion by any means, yet people keep repeating it from one place to another.

The second setting for a lag is from Tsay, R. S. Analysis of Financial Time Series. 2nd Ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005, here's what he wrote on p.33:

Several values of m are often used. Simulation studies suggest that the choice of m ≈ ln(T ) provides better power performance.

This is a somewhat stronger argument, but there's no description of what kind of study was done. So, I wouldn't take it at a face value. He also warns about seasonality:

This general rule needs modification in analysis of seasonal time series for which autocorrelations with lags at multiples of the seasonality are more important.

Summarizing, if you just need to plug some lag into the test and move on, then you can use either of these setting, and that's fine, because that's what most practitioners do. We're either lazy or, more likely, don't have time for this stuff. Otherwise, you'd have to conduct your own research on the power and properties of the statistics for series that you deal with.

UPDATE.

Here's my answer to Richard Hardy's comment and his answer, which refers to another thread on CV started by him. You can see that the exposition in the accepted (by Richerd Hardy himself) answer in that thread is clearly based on ARMAX model, i.e. the model with exogenous regressors xt:

yt=xtβ+ϕ(L)yt+ut

However, OP did not indicate that he's doing ARMAX, to contrary, he explicitly mentions ARMA:

After an ARMA model is fit to a time series, it is common to check the residuals via the Ljung-Box portmanteau test

One of the first papers that pointed to a potential issue with LB test was Dezhbaksh, Hashem (1990). “The Inappropriate Use of Serial Correlation Tests in Dynamic Linear Models,” Review of Economics and Statistics, 72, 126–132. Here's the excerpt from the paper:

enter image description here

As you can see, he doesn't object to using LB test for pure time series models such as ARMA. See also the discussion in the manual to a standard econometrics tool EViews:

If the series represents the residuals from ARIMA estimation, the appropriate degrees of freedom should be adjusted to represent the number of autocorrelations less the number of AR and MA terms previously estimated. Note also that some care should be taken in interpreting the results of a Ljung-Box test applied to the residuals from an ARMAX specification (see Dezhbaksh, 1990, for simulation evidence on the finite sample performance of the test in this setting)

Yes, you have to be careful with ARMAX models and LB test, but you can't make a blanket statement that LB test is always wrong for all autoregressive series.

UPDATE 2

Alecos Papadopoulos's answer shows why Ljung-Box test requires strict exogeneity assumption. He doesn't show it in his post, but Breusch-Gpdfrey test (another alternative test) requires only weak exogeneity, which is better, of course. This what Greene, Econometrics, 7th ed. says on the differences between tests, p.923:

The essential difference between the Godfrey–Breusch and the Box–Pierce tests is the use of partial correlations (controlling for X and the other variables) in the former and simple correlations in the latter. Under the null hypothesis, there is no autocorrelation in εt , and no correlation between xt and εs in any event, so the two tests are asymptotically equivalent. On the other hand, because it does not condition on xt , the Box–Pierce test is less powerful than the LM test when the null hypothesis is false, as intuition might suggest.


I suppose that you decided to answer the question as it was bumped to the top of the active threads by my recent answer. Curiously, I argue that the test is inappropriate in the setting under consideration, making the whole thread problematic and the answers in it especially so. Do you think it is good practice to post yet another answer that ignores this problem without even mentioning it (just like all the previous answers do)? Or do you think my answer does not make sense (which would justify posting an answer like yours)?
Richard Hardy

Thank you for an update! I am not an expert, but the argumentation by Alecos Papadopoulos in "Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey" and in the comments under his answer suggests that Ljung-Box is indeed inapplicable on residuals from pure ARMA (as well as ARMAX) models. If the wording is confusing, check the maths there, it seems fine. I think this is a very interesting and important question, so I would really like to find agreement between all of us here.
リチャードハーディ

0

... h should be as small as possible to preserve whatever power the LB test may have under the circumstances. As h increases the power drops. The LB test is a dreadfully weak test; you must have a lot of samples; n must be ~> 100 to be meaningful. Unfortunately I have never seen a better test. But perhaps one exists. Anyone know of one ?

Paul3nt


0

There's no correct answer to this that works in all situation for the reasons other have said it will depend on your data.

That said, after trying to figure out to reproduce a result in Stata in R I can tell you that, by default Stata implementation uses: min(n22,40)。データポイント数の半分から2を引いた数、または40のいずれか小さい方。

もちろん、デフォルトはすべて間違っており、状況によっては間違いがあります。多くの場合、これは開始するのに悪い場所ではないかもしれません。


0

Rパッケージhwwntestを提案します。チューニングパラメータを必要とせず、良好な統計サイズとパワーを持つウェーブレットベースのホワイトノイズテストを実装しています。

さらに、最近、ロブ・ヒンドマンのトピックに関する優れた議論である「Ljung-Boxテストに関する考察を発見しました。

更新:ARMAXに関するこのスレッドでの別の議論を考慮すると、hwwntestを見るもう1つの動機は、ARMA(p、q)モデルの対立仮説に対するテストの1つの理論的なべき関数の可用性です。

弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.