I am going to try to elaborate what William suggested.
Let ΩΩ be the sample space of tossing a coin twice. Define the ran. var. ξξ to be the num. of heads that occur in the experiment. Clearly, E[ξ]=1E[ξ]=1. One way of thinking of what 11, as an expec. value, represents is as the best possible estimate for ξξ. If we had to take a guess for what value ξξ would take, we would guess 11. This is because E[(ξ−1)2]≤E[(ξ−a)2]E[(ξ−1)2]≤E[(ξ−a)2] for any real number aa.
Denote by A={HT,HH}A={HT,HH} to be the event that the first outcome is a head. Let G={∅,A,Ac,Ω}G={∅,A,Ac,Ω} be the σσ-alg. gen. by AA. We think of GG as representing what we know after the first toss. After the first toss, either heads occured, or heads did not occur. Hence, we are either in the event AA or Ac after the first toss.
If we are in the event A, then the best possible estimate for ξ would be E[ξ|A]=1.5, and if we are in the event Ac, then the best possible estimate for ξ would be E[ξ|Ac]=0.5.
Now define the ran. var. η(ω) to be either 1.5 or 0.5 depending on whether or not ω∈A. This ran. var. η, is a better approximation than 1=E[ξ] since E[(ξ−η)2]≤E[(ξ−1)2].
What η is doing is providing the answer to the question: what is the best estimate of ξ after the first toss? Since we do not know the information after the first toss, η will depend on A. Once the event G is revealed to us, after the first toss, the value of η is determined and provides the best possible estimate for ξ.
The problem with using ξ as its own estimate, i.e. 0=E[(ξ−ξ)2]≤E[(ξ−η)2] is as follows. ξ is not well-defined after the first toss. Say the outcome of the experiment is ω with first outcome being heads, we are in the event A, but what is ξ(ω)=? We do not know from just the first toss, that value is ambiguous to us, and so ξ is not well-defined. More formally, we say that ξ is not G-measurable i.e. its value is not well-defined after the first toss. Thus, η is the best possible estimate of ξ after the first toss.
Perhaps, somebody here can come up with a more sophisticated example using the sample space [0,1], with ξ(ω)=ω, and G some non-trivial σ-algebra.