We simply translate the binary result of a qubit measurement to our guess whether it's the first state or the second, calculate the probability of success for every possible measurement of the qubit, and then more find the maximum of a function of two variables (on the two-sphere).
First, something that we won't really need, the precise description of the state. The full state of the system that depends both on superpositions a well as a classical fair coin may be encoded in the density matrix
ρ=12(1000)+12(cos2xsinxcosxsinxcosxsin2x)
where the left column and upper row corresponds to the basis state "zero" and the remaining ones to "one". It's helpful to rewrite the density matrix in terms of the 4-element basis of the 2×2 matrices,
ρ=12+sinxcosx2σx+(cos2x−sin2x4+14)σz
That may be written in terms of the angle 2x:
ρ=12+sin2x4σx+cos2x+14σz
Now, regardless of the mixed state, this is still a two-level system and all measurements on the two-dimensional Hilbert space are either trivial (measurements of a c-number) or equivalent to the measurement of the spin along an axis, i.e. measurements of
V=n⃗ ⋅σ⃗
which is a unit 3D vector multiplied by the vector of Pauli matrices. OK, what happens if we measure V? The eigenvalues of V are plus one or minus one. The probability of each may be obtained from the expectation value of V which is
⟨V⟩=Tr(Vρ)
The traces of products only contribute if 1 meets 1 (but we assume there was no term in V) or σx meets σx etc., in which cases the trace of the matrix gives an extra factor of 2. So we have
⟨V⟩=sin2x2nx+cos2x+12nz
We get the eigenvalue ±1 with the probabilities (1±⟨V⟩)/2, respectively. Exactly when cosx=0, the two initial "head and tail" states are orthogonal to one another (basically |0⟩ and |1⟩) and we may fully discriminate them. To make the probabilities 0,1, we must simply choose n⃗ =(0,0,±1); note that the overall sign of n⃗ doesn't matter for the procedure.
Now, for cosx≠0, the states are non-orthogonal i.e. "not mutually exclusive" in the quantum sense and we can't measure directly whether the coin was tails or heads because those possibilities were mixed in the density matrix. In fact, the density matrix contains all probabilities of all measurements, so if we could get the same density matrix by a different mixture of possible states from coin tosses, the states of the qubit would be strictly indistinguishable.
Our probability of success will be below 100% if cosx≠0. But the only meaningful way to use the classical bit V=±1 from the measurement is to directly translate it to our guess about the initial state. Without a loss of generality, our translation may be chosen to be
(V=+1)→|i⟩=|0⟩
and
(V=−1)→|i⟩=cosx|0⟩+sinx|1⟩.
If we wanted the opposite, cross-identification of the heads-tails and the signs of V, we could simply achieve it by flipping the overall sign of n⃗ →−n⃗ .
Let's call the first simple initial state "heads" (the zero) and the second harder one "tails" (the cosine-sine superposition). The probability of success is, given our translation from +1 to heads and −1 to tails,
Psuccess=P(H)P(+1|H)+P(T)P(−1|T).
Because it's a fair coin, the two factors included above are P(H)=P(T)=1/2. The most difficult calculation among the four probabilities is P(−1|T). But we have already made a harder calculation above, it was the (1−⟨V⟩)/2. Here we just omit the constant term proportional to nz and multiply by two:
P(−1|T)=12−sin2xnx2−cos2xnz2
The result for "heads" is simply obtained by setting x=0 because the "heads" state equals "tails" states with x=0 substituted. So
P(−1|H)=1−nz2
and the complementary 1−P probability is
P(+1|H)=1+nz2
Substitute those results to our "success probability" to get
Psuccess=1+nz+1−(sin2x)nx−(cos2x)nz4
or
Psuccess=12−nx4sin2x+nz4(1−cos2x)
If we define (nx,nz)=(−cosα,−sinα), we may also write it as
Psuccess=12+sin(2x+α)−sinα4=12+sinxcos(x+α)2
We want to maximize that over α. Clearly, the maximum is for cos(x+α)=±1 where the sign agrees with that of sinx i.e. α=−x or α=π−x and the value at this maximum is
Psuccess=1+|sinx|2
which sits in the interval 50% and 100%.
That's a nice measurement which is really quantum mechanical. We use a different measurement than that of σz, i.e. the classical measurement of the bit. Instead, we measure the spin along the axis in the xz-plane that is defined by the same nonzero angle as the angle x at the beginning, with some correct signs and shifts by multiples of π/2. Note that if you measured simply σz, the classical bit, the success rate would be just (3−cos2x)/4, also between 50% and 100%, but smaller than our result. In particular, for a small x=0+ϵ, our optimal result would be Taylor-expanded as 1/2+|x|/2 while the non-optimum result using the classical measurement would increase above 1/2 more slowly, as 1/2+x2/2.
For many hours, a wrong answer (a mistake in the final portions) was posted here, despite the fact that I had previously fixed many wrong factors of two. I posted a slightly edited version of this answer on my weblog where some discussion may take place:
The Reference Frame: A fun simple problem in quantum computing
On that page, I also write the eigenstates of the measured operator in the appendix. The arguments in the angles may be surprising for some folks who think that this problem is obvious in terms of the wave functions or that the wave functions after the measurement have to be simple.