カードを引いた後、エース、2、3などを獲得するまでの数字

12

以下を解決するのに苦労しています。

エースを獲得するまで、標準の52カードデッキからカードを交換せずに引きます。2を得るまで残っているものから引きます。3に進みます。デッキ全体がなくなった後、予想される数はどれくらいですか。

させるのは自然でした

$T_i = \text{first position of card whose value is }i$
$U_i = \text{last position of card whose value is }i$

したがって、問題は基本的に、デッキがなくなったときに $k$ になる確率を把握することになります。

P r (T_{1} < \dots < T_{k} \cap U_{k + 1} < T_{k})

$Pr(T_1<\cdots<T_k \cap U_{k+1} < T_k)$

わかります

P r (T_{1} < \dots < T_{k}) = 1 / k! and P r (U_{k + 1} < T_{k}) = 1 / 70

$Pr(T_1<\cdots<T_k) = 1/k! \\ \text{and} \\ Pr(U_{k+1} < T_k) = 1/70$

しかし、これ以上手に入れることができませんでした...

— ビル
ソース

1

最初のエースを引くまでに

すべてを引いた場合はどうなります

2

$2$ か？

— GUNG -復活モニカ

「期待される」数は本当に「最も可能性の高い」数を意味しますか？

— whuber

これは興味深い問題ですが、「問題は本質的に」の後にあなたが書く数学についてはわかりません。最初の文では、

ではなく

を書くつもりでしたか？しかし、それでも文が正しいかどうかはわかりません。始まるシーケンスを考えてください。我々は持っている

ので、

が、私は間違いなくあなたのテキストの説明を理解していれば、我々はまだそれから第二の位置と5位で2でエースを選ぶことができますか？したがって、

は必要条件ではありませんか？

\cap

$\cap$

\cup

$\cup$ 2AAA2

T_{1} = 2, T_{2} = 1

$T_1=2, T_2=1$

T_{1} > T_{2}

$T_1 > T_2$

T_{1} < T_{2}

$T_1 < T_2$

— トゥートーン

@TooToneああ、私はあなたが言ったように

を意味しました

\cap

$\cap$ 、そしてあなたは正しいです。

T_{1} < T_{2}

$T_1 < T_2$ ...必要条件ではない

— 法案

その場合@gung、自分のデッキが出て実行され、あなたはまだ2になります

— 法案

0

@gungのアイデアに従って、期待される値は5.84になると思いますか？私のコメントの解釈から、「A」はほとんど不可能な値であると仮定しています（デッキの最後の4枚のカードがすべてエースでない限り）。ここに、100,000回のモンテカルロシミュレーションの結果があります

results
    2     3     4     5     6     7     8     9     J     K     Q     T 
 1406  7740 16309 21241 19998 15127  9393  4906   976   190   380  2334

そして、あなたがそれを試してみたい場合のためのRコードです。

# monte carlo card-drawing functions from here
# http://streaming.stat.iastate.edu/workshops/r-intro/lectures/5-Rprogramming.pdf

# create a straightforward deck of cards
create_deck <-
    function( ){
        suit <- c( "H" , "C" , "D" , "S" )
        rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )
        deck <- NULL
        for ( r in rank ) deck <- c( deck , paste( r , suit ) )
        deck
    }

# construct a function to shuffle everything
shuffle <- function( deck ){ sample( deck , length( deck ) ) }

# draw one card at a time
draw_cards <-
    function( deck , start , n = 1 ){
        cards <- NULL

        for ( i in start:( start + n - 1 ) ){
            if ( i <= length( deck ) ){
                cards <- c( cards , deck[ i ] )
            }
        }

        return( cards )
    }

# create an empty vector for your results
results <- NULL

# run your simulation this many times..
for ( i in seq( 100000 ) ){
    # create a new deck
    sdeck <- shuffle( create_deck() )

    d <- sdeck[ grep('A|2' , sdeck ) ]
    e <- identical( grep( "2" , d ) , 1:4 )

    # loop through ranks in this order
    rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )

    # start at this position
    card.position <- 0

    # start with a blank current.draw
    current.draw <- ""

    # start with a blank current rank
    this.rank <- NULL

    # start with the first rank
    rank.position <- 1

    # keep drawing until you find the rank you wanted
    while( card.position < 52 ){

        # increase the position by one every time
        card.position <- card.position + 1

        # store the current draw for testing next time
        current.draw <- draw_cards( sdeck , card.position )

        # if you draw the current rank, move to the next.
        if ( grepl( rank[ rank.position ] , current.draw ) ) rank.position <- rank.position + 1

        # if you have gone through every rank and are still not out of cards,
        # should it still be a king?  this assumes yes.
        if ( rank.position == length( rank ) ) break        

    }

    # store the rank for this iteration.
    this.rank <- rank[ rank.position ]

    # at the end of the iteration, store the result
    results <- c( results , this.rank )

}

# print the final results
table( results )

# make A, T, J, Q, K numerics
results[ results == 'A' ] <- 1
results[ results == 'T' ] <- 10
results[ results == 'J' ] <- 11
results[ results == 'Q' ] <- 12
results[ results == 'K' ] <- 13
results <- as.numeric( results )

# and here's your expected value after 100,000 simulations.
mean( results )

— アンソニー・ダミコ
ソース

なぜA不可能なのですか？AAAAたとえば、48枚のカードのシーケンスを検討してください。

— TooTone

あなたは正しい..それは270725のうちの1つです-またはRコードで1/prod( 48:1 / 52:5 )

— アンソニーダミコ

1

この答えは間違っています。「2」のカウントを考慮してください。これは、すべての2が1のいずれかの前に遭遇した場合にのみ発生する可能性があるため、その確率はすべて

なので、シミュレーションでの期待値は

(\binom{8}{4}) = 70

$\binom{8}{4}=70$

の標準誤差で

。

出力は6つの標準エラーを超えているため、ほとんど間違いがありません。平均の正確な値は（

回の反復による異なるシミュレーションに基づく）です

10^{5} / (\binom{8}{4}) \approx 1428.6

$10^5/\binom{8}{4}\approx 1428.6$

37.5

$37.5$

1660

$1660$

10^{6}

$10^6$

です。

5.833 \pm 0.004

$5.833\pm 0.004$

— whuber

1

残念ながら、大量に文書化されたコードは必要以上に長く、遅くなります。出力が正しくないことを示しました。私はあなたのコードをデバッグする時間があればいいのですが、そうするのは私の仕事ではありません。私の主張はこれです。すべての「2」がすべての「A」に先行する場合に限り、最後に「2」に取り組んでいます。うち、

の4つの「2」と4つの「A」を均等に配置する方法で、そのうちの1つがこの基準を満たします。そのためのあなたの価値見出し「2」の下には、近くにあるべき

が、そうではありません。

(\binom{4 + 4}{4}) = 70

$\binom{4+4}{4}=70$ results

10^{5} / 70 = 1429

$10^5/70=1429$

— whuber

1

モデレーターでさえ、他の人の投票を削除することはできません:-)。カイ2乗検定は、結果が私のものと一致することを示唆していますが、シミュレーションのテスト方法を知っておくとよいでしょう。答えの信頼性が向上するからです。実際、あなたの答えの最初の段落に行った編集によれば、今では両方の結果が間違っています：私はあなたの質問を解釈したので、すべてのカードが使い果たされたときにエースに取り組むことはまだ不可能です。

— whuber

7

シミュレーションには、次のことが重要です正確かつ高速であるです。 これらの目的は両方とも、プログラミング環境のコア機能をターゲットとするコードと、できるだけ短くシンプルなコードを記述することをお勧めします。シンプルさは明快さをもたらし、明快さは正確さを促進するからです。ここに両方を達成するための私の試みがありRます：

#
# Simulate one play with a deck of `n` distinct cards in `k` suits.
#
sim <- function(n=13, k=4) {
  deck <- sample(rep(1:n, k)) # Shuffle the deck
  deck <- c(deck, 1:n)        # Add sentinels to terminate the loop
  k <- 0                      # Count the cards searched for
  for (j in 1:n) {
    k <- k+1                          # Count this card
    deck <- deck[-(1:match(j, deck))] # Deal cards until `j` is found
    if (length(deck) < n) break       # Stop when sentinels are reached
  }
  return(k)                   # Return the number of cards searched
}

これを適用する を再現可能な方法では、次のようにreplicate乱数シードを設定した後、関数を使用します。

> set.seed(17);  system.time(d <- replicate(10^5, sim(13, 4)))
   user  system elapsed 
   5.46    0.00    5.46

それは遅いですが、待機せずにかなり長い（したがって正確な）シミュレーションを繰り返し実行するのに十分な速さです。結果を表示する方法はいくつかあります。その平均から始めましょう：

> n <- length(d)
> mean(d)
[1] 5.83488

> sd(d) / sqrt(n)
[1] 0.005978956

後者は標準誤差です。シミュレートされた平均値は、真の値の2または3 SE以内にあると予想されます。 これは、から間のどこかに真の期待を $5.817$ $5.853$ ます。

また、周波数（およびそれらの標準誤差）の一覧表を見たいかもしれません。次のコードは、集計を少し説明しています。

u <- table(d)
u.se <- sqrt(u/n * (1-u/n)) / sqrt(n)
cards <- c("A", "2", "3", "4", "5", "6", "7", "8", "9", "T", "J", "Q", "K")
dimnames(u) <- list(sapply(dimnames(u), function(x) cards[as.integer(x)]))
print(rbind(frequency=u/n, SE=u.se), digits=2)

出力は次のとおりです。

                2       3      4      5      6      7       8       9       T       J       Q       K
frequency 0.01453 0.07795 0.1637 0.2104 0.1995 0.1509 0.09534 0.04995 0.02249 0.01009 0.00345 0.00173
SE        0.00038 0.00085 0.0012 0.0013 0.0013 0.0011 0.00093 0.00069 0.00047 0.00032 0.00019 0.00013

シミュレーションが正しいことをどのように知ることができますか？1つの方法は、小さな問題について徹底的にテストすることです。そのため、このコードは置き換え、問題の小さな一般化を攻撃するために書かれたと別個のカードと $13$ n $4$ とのスーツk。ただし、テストでは、所定の順序でコードをデッキに送信できることが重要です。同じアルゴリズムにわずかに異なるインターフェースを書きましょう：

draw <- function(deck) {
  n <- length(sentinels <- sort(unique(deck)))
  deck <- c(deck, sentinels)
  k <- 0
  for (j in sentinels) {
    k <- k+1
    deck <- deck[-(1:match(j, deck))]
    if (length(deck) < n) break
  }
  return(k)
}

（使用することが可能です drawsimどこでもが、最初にdraw行う余分な作業により、の2倍遅くなりますsim。）

これを適用して使用できます 、すべての与えられたデッキの明確なシャッフル。ここでの目的は少数の1回限りのテストであるため、これらのシャッフルの生成効率は重要ではありません。簡単なブルートフォースの方法は次のとおりです。

n <- 4 # Distinct cards
k <- 2 # Number of suits
d <- expand.grid(lapply(1:(n*k), function(i) 1:n))
e <- apply(d, 1, function(x) var(tabulate(x))==0)
g <- apply(d, 1, function(x) length(unique(x))==n)
d <- d[e & g,]

これdで、行にすべてのシャッフルが含まれるデータフレームができました。適用するdraw各行にして結果を数えます：

d$result <- apply(as.matrix(d), 1, draw)
    (counts <- table(d$result))

出力（これは一時的に正式なテストで使用します）は

   2    3    4 
 420  784 1316

（の値方法によって、理解することは簡単です：我々はまだカードで作業されるだろう。すべての補数は、すべてのACEを先行している場合と場合にのみ、この出来事の可能性（2つのスーツを持つ）である $420$ $2$ 。うち別個のシャッフル、 $1/\binom{2+2}{2} = 1/6$ $2520$ この特性を有します。） $2520/6 = 420$

カイ2乗検定で出力をテストできます。 この目的のために私が適用sim 、この場合に時間をで異なるカードスーツ。 $10,000$ $n = 4$ $k = 2$

>set.seed(17)
>d.sim <- replicate(10^4, sim(n, k))
>print((rbind(table(d.sim) / length(d.sim), counts / dim(d)[1])), digits=3)

         2     3     4
[1,] 0.168 0.312 0.520
[2,] 0.167 0.311 0.522

> chisq.test(table(d.sim), p=counts / dim(d)[1])

    Chi-squared test for given probabilities

data:  table(d.sim) 
X-squared = 0.2129, df = 2, p-value = 0.899

ので非常に高いので、我々は何の間に有意差見つけないと言うと、網羅的な列挙によって計算された値を。と他の（小さな）値に対してこの演習を繰り返すと、同等の結果が得られ、およびに適用した場合に信頼する十分な理由が得られます。 $p$ sim $n$ $k$ sim $n=13$ ます。 $k=4$

最後に、2サンプルのカイ2乗検定は、出力をsim別の回答で報告された出力と比較します。

>y <- c(1660,8414,16973,21495,20021,14549,8957,4546,2087,828,313,109)
>chisq.test(cbind(u, y))

data:  cbind(u, y) 
X-squared = 142.2489, df = 11, p-value < 2.2e-16

巨大なカイ2乗統計量は、本質的にゼロであるp値を生成します。疑いなく、sim他の答えに同意しません。 意見の相違には2つの解決策があります。これらの回答の1つ（または両方！）が間違っているか、質問の異なる解釈を実装しています。たとえば、「デッキがなくなった後」とは、最後のカードを観察し、可能な場合は手順を終了する前に「あなたがいる数字」を更新した後と解釈しました。おそらく最後の一歩を踏み出すことを意図したものではなかった。おそらく、このような解釈の微妙な違いが意見の相違を説明し、その時点で質問を修正して、求められている内容を明確にすることができます。

— ウーバー
ソース

4

正確な答えがあります（以下のポイント4に示す行列積の形式）。これらを観察するための合理的に効率的なアルゴリズムが存在します：

ランダムにシャッフルカードがランダムにシャッフルすることによって生成することができるカードを、次にランダムに散在残り $N+k$ $N$ $k$ それらの中にカード。
エースのみをシャッフルし、次に（最初の観測を適用して）2つ、3つなどを点在させることにより、この問題は13ステップのチェーンと見なすことができます。
探しているカードの価値以上のものを追跡する必要があります。ただし、これを行う場合、すべてのカードに対するマークの位置を考慮する必要はありません。、同等またはそれ以下の値のカードに対するます。

最初のエースにマークを付け、その後に見つかった最初の2つにマークを付けることを想像してください。（現在探しているカードを表示せずにデッキがなくなった場合、すべてのカードにマークを付けません。）各マークの「場所」（存在する場合）を等しいかそれ以下の値のカードの数にします（マークされたカード自体を含む）マークが作成されたときに対処されました。 場所にはすべての重要な情報が含まれています。
マークが作成された後の場所は乱数です。特定のデッキについて、これらの場所のシーケンスは確率的プロセスを形成します。実際、それはマルコフ過程です（可変遷移行列を使用）。 したがって、12回の行列乗算から正確な答えを計算できます。 $i^\text{th}$

これらのアイデアを使用して、このマシンは、の値を取得で（倍精度浮動小数点で計算）秒。正確な値のこの近似値 $5.8325885529019965$ $1/9$

\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}

$\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}$ は、表示されているすべての数字に対して正確です。

この投稿の残りの部分では詳細を説明し、実用的な実装を紹介します（ R）を示し、最後に質問とソリューションの効率性に関するコメントを示します。

デッキのランダムシャッフルの生成

「デッキ」（別名マルチセット）を考慮することは、実際には概念的にはより明確であり、最も低い金種の、次の最低など。（尋ね懸念などの質問は、デッキはで決定 -ベクトル。） $N = k_1+k_2+\cdots+k_m$ $k_1$ $k_2$ $13$ $(4,4,\ldots,4)$

「ランダムシャッフル」カードから均一かつランダムに撮影した1つの順列であるカードの順列。これらのシャッフルは、「エース」を並べ替えても何も変わらず、「2」を並べ替えても何も変わらないため、同等の構成のグループに分類されます。したがって、カードのスーツが無視されたときに同一に見える順列の各グループにはが含まれています $N$ $N! = N\times(N-1)\times\cdots\times 2\times 1$ $N$ $k_1$ $k_2$ $k_1!\times k_2!\times \cdots \times k_m!$ permutations. These groups, whose number therefore is given by the multinomial coefficient

(\binom{N}{k_{1}, k_{2}, \dots, k_{m}}) = \frac{N!}{k_{1}! k_{2}! \dots k_{m}!},

$\binom{N}{k_1,k_2,\ldots,k_m} = \frac{N!}{k_1!k_2!\cdots k_m!},$

are called "combinations" of the deck.

There is another way to count the combinations. The first $k_1$ cards can form only $k_1!/k_1! = 1$ combination. They leave $k_1+1$ "slots" between and around them in which the next $k_2$ cards can be placed. We could indicate this with a diagram where " $*$ " designates one of the $k_1$ cards and " $\_$ " designates a slot that can hold between $0$ and $k_2$ additional cards:

\underset{k_{1} stars}{\underset{⏟}{_*_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{k_1\text{ stars}}$

$k_2$ $k_1+k_2$ $\binom{k_1+k_2}{k_1,k_2} = \frac{(k_1+k_2)!}{k_1!k_2!}$ .

Repeating this procedure with $k_3$ "threes," we find there are $\binom{(k_1+k_2)+k_3}{k_1+k_2,k_3}= \frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!}$ ways to intersperse them among the first $k_1+k_2$ cards. Therefore the total number of distinct ways to arrange the first $k_1+k_2+k_3$ cards in this manner equals

1 \times \frac{(k_{1} + k_{2})!}{k_{1}! k_{2}!} \times \frac{(k_{1} + k_{2} + k_{3})!}{(k_{1} + k_{2})! k_{3}!} = \frac{(k_{1} + k_{2} + k_{3})!}{k_{1}! k_{2}! k_{3}!} .

$1\times\frac{(k_1+k_2)!}{k_1!k_2!}\times\frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!} = \frac{(k_1+k_2+k_3)!}{k_1!k_2!k_3!}.$

After finishing the last $k_n$ cards and continuing to multiply these telescoping fractions, we find that the number of distinct combinations obtained equals the total number of combinations as previously counted, $\binom{N}{k_1,k_2,\ldots,k_m}$ . Therefore we have overlooked no combinations. That means this sequential process of shuffling the cards correctly captures the probabilities of each combination, assuming that at each stage each possible distinct way of interspersing the new cards among the old is taken with uniformly equal probability.

The place process

Initially, there are $k_1$ aces and obviously the very first is marked. At later stages there are $n = k_1 + k_2 + \cdots + k_{j-1}$ cards, the place (if a marked card exists) equals $p$ (some value from $1$ through $n$ ), and we are about to intersperse $k=k_j$ cards around them. We can visualize this with a diagram like

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{n - p stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{\_*\_\cdots\_*\_}_{n-p\text{ stars}}$

where " $\odot$ " designates the currently marked symbol. Conditional on this value of the place $p$ , we wish to find the probability that the next place will equal $q$ (some value from $1$ through $n+k$ ; by the rules of the game, the next place must come after $p$ , whence $q\ge p+1$ ). If we can find how many ways there are to intersperse the $k$ new cards in the blanks so that the next place equals $q$ , then we can divide by the total number of ways to intersperse these cards (equal to $\binom{n+k}{k}$ , as we have seen) to obtain the transition probability that the place changes from $p$ to $q$ . (There will also be a transition probability for the place to disappear altogether when none of the new cards follow the marked card, but there is no need to compute this explicitly.)

Let's update the diagram to reflect this situation:

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{s stars}{\underset{⏟}{* * \dots *}} | \underset{n - p - s stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{**\cdots*}_{s\text{ stars}}\ \vert\ \underbrace{\_*\_\cdots\_*\_}_{n-p-s\text{ stars}}$

The vertical bar " $\vert$ " shows where the first new card occurs after the marked card: no new cards may therefore appear between the $\odot$ and the $\vert$ (and therefore no slots are shown in that interval). We do not know how many stars there are in this interval, so I have just called it $s$ (which may be zero) The unknown $s$ will disappear once we find the relationship between it and $q$ .

Suppose, then, we intersperse $j$ new cards around the stars before the $\odot$ and then--independently of that--we intersperse the remaining $k-j-1$ new cards around the stars after the $\vert$ . There are

τ_{n, k} (s, p) = (\binom{(p - 1) + j}{j}) (\binom{(n - p - s) + (k - j) - 1}{k - j - 1})

$\tau_{n,k}(s,p) = \binom{(p-1)+j}{j}\binom{(n-p-s) + (k-j)-1}{k-j-1}$

ways to do this. Notice, though--this is the trickiest part of the analysis--that the place of $\vert$ equals $p+s+j+1$ because

There are $p$ "old" cards at or before the mark.
There are $s$ old cards after the mark but before $\vert$ .
There are $j$ new cards before the mark.
There is the new card represented by $\vert$ itself.

Thus, $\tau_{n,k}(s,p)$ gives us information about the transition from place $p$ to place $q=p+s+j+1$ . When we track this information carefully for all possible values of $s$ , and sum over all these (disjoint) possibilities, we obtain the conditional probability of place $q$ following place $p$ ,

{Pr}_{n, k} (q | p) = (\sum_{j} (\binom{p - 1 + j}{j}) (\binom{n + k - q}{k - j - 1})) / (\binom{n + k}{k})

${\Pr}_{n,k}(q|p) = \left(\sum_j \binom{p-1+j}{j}\binom{n+k-q}{k-j-1}\right) / \binom{n+k}{k}$

where the sum starts at $j=\max(0, q-(n+1))$ and ends at $j=\min(k-1, q-(p+1)$ . (The variable length of this sum suggests there is unlikely to be a closed formula for it as a function of $n, k, q,$ and $p$ , except in special cases.)

The algorithm

Initially there is probability $1$ that the place will be $1$ and probability $0$ it will have any other possible value in $2, 3, \ldots, k_1$ . This can be represented by a vector $p_1 = (1, 0, \ldots, 0)$ .

After interspersing the next $k_2$ cards, the vector $p_1$ is updated to $p_2$ by multiplying it (on the left) by the transition matrix $(\Pr_{k_1,k_2}(q|p), 1\le p\le k_1, 1\le q\le k_2)$ . This is repeated until all $k_1+k_2+\cdots+k_m$ cards have been placed. At each stage $j$ , the sum of the entries in the probability vector $p_j$ is the chance that some card has been marked. Whatever remains to make the value equal to $1$ therefore is the chance that no card is left marked after step $j$ . The successive differences in these values therefore give us the probability that we could not find a card of type $j$ to mark: that is the probability distribution of the value of the card we were looking for when the deck runs out at the end of the game.

Implementation

The following R code implements the algorithm. It parallels the preceding discussion. First, calculation of the transition probabilities is performed by t.matrix (without normalization with the division by $\binom{n+k}{k}$ , making it easier to track the calculations when testing the code):

t.matrix <- function(q, p, n, k) {
  j <- max(0, q-(n+1)):min(k-1, q-(p+1))
  return (sum(choose(p-1+j,j) * choose(n+k-q, k-1-j))
}

This is used by transition to update $p_{j-1}$ to $p_j$ . It calculates the transition matrix and performs the multiplication. It also takes care of computing the initial vector $p_1$ if the argument p is an empty vector:

#
# `p` is the place distribution: p[i] is the chance the place is `i`.
#
transition <- function(p, k) {
  n <- length(p)
  if (n==0) {
    q <- c(1, rep(0, k-1))
  } else {
    #
    # Construct the transition matrix.
    #
    t.mat <- matrix(0, nrow=n, ncol=(n+k))
    #dimnames(t.mat) <- list(p=1:n, q=1:(n+k))
    for (i in 1:n) {
      t.mat[i, ] <- c(rep(0, i), sapply((i+1):(n+k), 
                                        function(q) t.matrix(q, i, n, k)))
    }
    #
    # Normalize and apply the transition matrix.
    #
    q <- as.vector(p %*% t.mat / choose(n+k, k))
  }
  names(q) <- 1:(n+k)
  return (q)
}

We can now easily compute the non-mark probabilities at each stage for any deck:

#
# `k` is an array giving the numbers of each card in order;
# e.g., k = rep(4, 13) for a standard deck.
#
# NB: the *complements* of the p-vectors are output.
#
game <- function(k) {
  p <- numeric(0)
  q <- sapply(k, function(i) 1 - sum(p <<- transition(p, i)))
  names(q) <- names(k)
  return (q)
}

Here they are for the standard deck:

k <- rep(4, 13)
names(k) <- c("A", 2:9, "T", "J", "Q", "K")
(g <- game(k))

The output is

         A          2          3          4          5          6          7          8          9          T          J          Q          K 
0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

According to the rules, if a king was marked then we would not look for any further cards: this means the value of $0.9994461$ has to be increased to $1$ . Upon doing so, the differences give the distribution of the "number you will be on when the deck runs out":

> g[13] <- 1; diff(g)
          2           3           4           5           6           7           8           9           T           J           Q           K 
0.014285714 0.078037518 0.163626897 0.211916093 0.200325120 0.150026562 0.093388313 0.049854807 0.023333275 0.009731843 0.003663077 0.001810781

(Compare this to the output I report in a separate answer describing a Monte-Carlo simulation: they appear to be the same, up to expected amounts of random variation.)

The expected value is immediate:

> sum(diff(g) * 2:13)
[1] 5.832589

All told, this required only a dozen lines or so of executable code. I have checked it against hand calculations for small values of $k$ (up to $3$ ). Thus, if any discrepancy becomes apparent between the code and the preceding analysis of the problem, trust the code (because the analysis may have typographical errors).

Remarks

Relationships to other sequences

When there is one of each card, the distribution is a sequence of reciprocals of whole numbers:

> 1/diff(game(rep(1,10)))
[1]      2      3      8     30    144    840   5760  45360 403200

The value at place $i$ is $i! + (i-1)!$ (starting at place $i=1$ ). This is sequence A001048 in the Online Encyclopedia of Integer Sequences. Accordingly, we might hope for a closed formula for the decks with constant $k_i$ (the "suited" decks) that would generalize this sequence, which itself has some profound meanings. (For instance, it counts sizes of the largest conjugacy classes in permutation groups and is also related to trinomial coefficients.) (Unfortunately, the reciprocals in the generalization for $k\gt 1$ are not usually integers.)

The game as a stochastic process

Our analysis makes it clear that the initial $i$ coefficients of the vectors $p_j$ , $j\ge i$ , are constant. For example, let's track the output of game as it processes each group of cards:

> sapply(1:13, function(i) game(rep(4,i)))

[[1]]
[1] 0

[[2]]
[1] 0.00000000 0.01428571

[[3]]
[1] 0.00000000 0.01428571 0.09232323

[[4]]
[1] 0.00000000 0.01428571 0.09232323 0.25595013

...

[[13]]
 [1] 0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

For instance, the second value of the final vector (describing the results with a full deck of 52 cards) already appeared after the second group was processed (and equals $1/\binom{8}{4}=1/70$ ). Thus, if you want information only about the marks up through the $j^\text{th}$ card value, you only have to perform the calculation for a deck of $k_1+k_2+\cdots+k_j$ cards.

Because the chance of not marking a card of value $j$ is getting quickly close to $1$ as $j$ increases, after $13$ types of cards in four suits we have almost reached a limiting value for the expectation. Indeed, the limiting value is approximately $5.833355$ (computed for a deck of $4 \times 32$ cards, at which point double precision rounding error prevents going any further).

Timing

Looking at the algorithm applied to the $m$ -vector $(k,k, \ldots, k)$ , we see its timing should be proportional to $k^2$ and--using a crude upper bound--not any worse than proportional to $m^3$ . By timing all calculations for $k=1$ through $7$ and $n=10$ through $30$ , and analyzing only those taking relatively long times ( $1/2$ second or longer), I estimate the computation time is approximately $O(k^2 n^{2.9})$ , supporting this upper-bound assessment.

One use of these asymptotics is to project calculation times for larger problems. For instance, seeing that the case $k=4, n=30$ takes about $1.31$ seconds, we would estimate that the (very interesting) case $k=1, n=100$ would take about $1.31(1/4)^2(100/30)^{2.9}\approx 2.7$ seconds. (It actually takes $2.87$ seconds.)

— whuber
ソース

0

Hacked a simple Monte Carlo in Perl and found approximately $5.8329$ .

#!/usr/bin/perl

use strict;

my @deck = (1..13) x 4;

my $N = 100000; # Monte Carlo iterations.

my $mean = 0;

for (my $i = 1; $i <= $N; $i++) {
    my @d = @deck;
    fisher_yates_shuffle(\@d);
    my $last = 0;
        foreach my $c (@d) {
        if ($c == $last + 1) { $last = $c }
    }
    $mean += ($last + 1) / $N;
}

print $mean, "\n";

sub fisher_yates_shuffle {
    my $array = shift;
        my $i = @$array;
        while (--$i) {
        my $j = int rand($i + 1);
        @$array[$i, $j] = @$array[$j, $i];
    }
}

— Zen
ソース

Given the sharp discrepancy between this and all the previous answers, including two simulations and a theoretical (exact) one, I suspect you are interpreting the question in a different way. In the absence of any explanation on your part, we just have to take it as being wrong. (I suspect you may be counting one less, in which case your 4.8 should be compared to 5.83258...; but even then, your two significant digits of precision provide no additional insight into this problem.)

— whuber

1

Yep! There was an off-by-one mistake.

— Zen