データサイエンス probability

XGBoost出力は極端になる傾向があります

私は現在リスク予測にXGBoostを使用していますが、バイナリ分類部門ではうまく機能しているようですが、確率出力はかなりずれています。つまり、観測値の特徴の値を少しだけ変更すると、確率が高くなります。 0.5から0.99への出力ジャンプ。 0.6〜0.8の範囲の出力はほとんど見えません。すべての場合において、確率は0.99または1未満です。 Platt ScalingやLogistic Correctionなどのトレーニング後のキャリブレーション方法は知っていますが、XGBoostトレーニングプロセスで調整できることがあるかどうか疑問に思っていました。私はFFIを使用してさまざまな言語からXGBoostを呼び出します。そのため、他のキャリブレーションライブラリを導入せずにこの問題を修正できると便利です。たとえば、評価メトリックをAUCからログ損失に変更します。

15 machine-learning classification xgboost probability probability-calibration

LSTMセルはいくつ使用すればよいですか？

使用する必要があるLSTMセルの最小、最大、および「妥当な」量に関する経験則（または実際の規則）はありますか？具体的には、TensorFlowとプロパティのBasicLSTMCellに関連していnum_unitsます。私が定義する分類問題があると仮定してください： t - number of time steps n - length of input vector in each time step m - length of output vector (number of classes) i - number of training examples たとえば、トレーニングの例の数は次の数よりも多い必要がありますか？ 4*((n+1)*m + m*m)*c cセルの数はどこですか？これに基づいています：LSTMネットワークのパラメーターの数を計算する方法？私が理解しているように、これはパラメータの総数を与えるはずであり、トレーニング例の数よりも少なくなければなりません。

12 rnn machine-learning r predictive-modeling random-forest python language-model sentiment-analysis encoding machine-learning deep-learning neural-network dataset caffe classification xgboost multiclass-classification unbalanced-classes time-series descriptive-statistics python r clustering machine-learning python deep-learning tensorflow machine-learning python predictive-modeling probability scikit-learn svm machine-learning python classification gradient-descent regression research python neural-network deep-learning convnet keras python tensorflow machine-learning deep-learning tensorflow python r bigdata visualization rstudio pandas pyspark dataset time-series multilabel-classification machine-learning neural-network ensemble-modeling kaggle machine-learning linear-regression cnn convnet machine-learning tensorflow association-rules machine-learning predictive-modeling training model-selection neural-network keras deep-learning deep-learning convnet image-classification predictive-modeling prediction machine-learning python classification predictive-modeling scikit-learn machine-learning python random-forest sampling training recommender-system books python neural-network nlp deep-learning tensorflow python matlab information-retrieval search search-engine deep-learning convnet keras machine-learning python cross-validation sampling machine-learning

タグ付けされた質問 「probability」

タグ付けされた質問「probability」