RおよびStataソフトウェアでの重力モデルの計算：係数が同じで標準誤差が異なるのはなぜですか？

RおよびStataソフトウェアで重力モデルの計算を実行しました。

計算にはglmm、R（パラメーター付きfamily = quasipoisson）とppmlStata の標準パッケージを使用しました。

Rで計算手順を呼び出します。

summary(glmm<-glm(formula=exports ~ ln_GDPimporter + ln_GDPexporter + 
    ln_GDPimppc + ln_GDPexppc + ln_Distance + ln_Tariff + ln_ExchangeRate +
    Contig + Comlang + Colony_CIS + EAEU_CIS + EU_European_Union, 
    family=quasipoisson(link="log"),data=data_pua))

Rの結果は次のとおりです。

同じデータについて、ppml手順を使用して、スタタで計算を実行しました。

ppml exports ln_gdpimporter ln_gdpexporter ln_gdpimppc ln_gdpexppc ln_distance ln_tariff ln_exchangerate contig comlang colony_cis eaeu_cis eu_european_union

スタタでの計算結果は次のとおりです。

ご覧のとおり、モデル係数（結果表の2列目）は、少なくとも小数点以下4桁目まで同じです。

ただし、他の結果（結果の表の3列目から）は同じではありません。

結果の違いについて説明してもらえますか？
特に、係数は同じであるが（最初の結果テーブルの列）、標準誤差が異なるのはなぜですか。

r generalized-linear-model stata

— セルゲイS.
ソース

回答:

スタタ係数表には「ロバスト標準glmm誤差」と記載されていますが、おそらくロバストエラーは使用していません。それはSEの違いを説明します。

また、ppml実際には「非有意」リグレッサをドロップしているようで、Rのquasipoissonファミリは、たとえばとは異なる負の二項回帰とは異なる方法で過剰分散を許容しppmlます。

私は、あなたがいくつかの場所で、Rパッケージがppml（経済学）重力モデルと同等の結果をもたらすと尋ねたところ、答えが得られなかったことに気付きました。申し訳ありません。情報に通じた推奨事項を提供できればと思います。必要なのは、ゼロ値を処理する強力な標準誤差を含むポアソン回帰であるようです。どのRパッケージがそれをサポートしているかはわかりません。（ppml分散を処理するかどうかは不明です。）

などのベイジアン回帰パッケージはrstanarm、異分散性をより堅牢に処理する可能性がありますが、よくわかりません。私はstudent_t家族のようなものを使う傾向がありますが、あなたはそれを使わなければpoissonならないので、そこでの答えはわかりません。あなたは（負の二項ファミリを試す可能性があるneg_binomial_2中でrstanarmのにstan_glmも過分散を処理し、より堅牢でもよいです）quasipoisson。

参照：ポアソン回帰でロバストな標準誤差を使用する場合

— ウェイン
ソース

使用したいrstanarmが、必要な機能がない場合は、でモデルをコーディングするだけrstanです。

— Sycorax氏は、モニカを2016

PPMLは、分散不足または過剰分散がある場合でも一貫しています。詳しくは、重力の記録ページをご覧ください。

— Dimitriy V.Masterov 2016年

Дмитрий、спасибо！Анельзялипродублироватьответпо-русски:)

— セルゲイS.

@salnsg：すみません、ドミトリー、残念ながら私は2つのロシア語しか知りません。translate.google.comを試すこともできますが、技術的な部分が適切に処理されるとは思えません。

— ウェイン

ウェインの優れた答えを拡張するにppmlは、バイアスを減らすために、ロバストな（異分散性）分散共分散行列とその行列に対する有限サンプル調整を使用します。

これらはsandwich()、同じ名前のパッケージからRで計算されるものと非常に似ています。唯一の違いは、有限標本調整がどのように行われるかです。sandwich(...)機能、いかなる有限サンプル調整は、デフォルトでは全く行われない、すなわち、サンドイッチを1で割っ/ nはn個の観測の数です。または、sandwich(..., adjust = TRUE)1 /（n-k）で除算することもできます。ここで、kはリグレッサの数です。ただし、スタタは1 /（n-1）で割ります。

以下は、調整係数1 /（n-1）のカスタムサンドイッチ分散を使用して、RをStataに一致させる方法です。

. clear

. set more off

. capture ssc install rsource

. use http://personal.lse.ac.uk/tenreyro/mock, clear

. saveold ~/Desktop/mock, version(12) replace
(saving in Stata 12 format, which can be read by Stata 11 or 12)
file ~/Desktop/mock.dta saved

. rsource, terminator(XXX) rpath("/usr/local/bin/R") roptions("--vanilla")
Assumed R program path: "/usr/local/bin/R"

Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Beginning of R output

R version 3.2.4 (2016-03-10) -- "Very Secure Dishes"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>   library("foreign")
>   library("sandwich")
>   library("lmtest")
>   mock<-read.dta("~/Desktop/mock.dta")
>   glmm<-glm(formula=y ~ x + w, family=quasipoisson(link="log"),data=mock)
> 
>   sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
>   coeftest(glmm,vcov=sandwich1)

z test of coefficients:

            Estimate Std. Error z value  Pr(>|z|)    
(Intercept) 0.516969   0.098062  5.2718 1.351e-07 ***
x           0.125657   0.101591  1.2369    0.2161    
w           0.013410   0.710752  0.0189    0.9849    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

> 
End of R output

. 
. ppml y x w

note: checking the existence of the estimates

Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0

note: starting ppml estimation
note: y has noninteger values

Iteration 1:   deviance =  139.7855
Iteration 2:   deviance =  137.7284
Iteration 3:   deviance =  137.7222
Iteration 4:   deviance =  137.7222

Number of parameters: 3
Number of observations: 100
Pseudo log-likelihood: -173.89764
R-squared: .01628639
Option strict is: off
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1256565   .1015913     1.24   0.216    -.0734588    .3247718
           w |   .0134101   .7107518     0.02   0.985    -1.379638    1.406458
       _cons |   .5169689   .0980624     5.27   0.000     .3247702    .7091676
------------------------------------------------------------------------------

上記の出力を生成するStata / Rコードは次のとおりです。私はrsourceStataからRを実行するために使用しています（そしてrpath()、設定に合わせるために以下を微調整する必要があります）が、それは本当に必要ではありませんrsource。Rからパーツを実行するだけでかまいません。

clear
set more off
capture ssc install rsource

use http://personal.lse.ac.uk/tenreyro/mock, clear
saveold ~/Desktop/mock, version(12) replace

rsource, terminator(XXX) rpath("/usr/local/bin/R") roptions("--vanilla")
  library("foreign")
  library("sandwich")
  library("lmtest")
  mock<-read.dta("~/Desktop/mock.dta")
  glmm<-glm(formula=y ~ x + w, family=quasipoisson(link="log"),data=mock)

  sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
  coeftest(glmm,vcov=sandwich1)  
XXX 

ppml y x w

— Dimitriy V. Masterov
ソース

@salnsgПожалуйста、напишитееслиямогучто-тоуточнить。Ксожалению、янезнаюкаквсеэтоописатьнародномязыке。

— Dimitriy V. Masterov 2016年