Heuristically, the probability density function on {x1,x2,..,.xn} with maximum entropy turns out to be the one that corresponds to the least amount of knowledge of {x1,x2,..,.xn}, in other words the Uniform distribution.
Now, for a more formal proof consider the following:
A probability density function on {x1,x2,..,.xn} is a set of nonnegative real numbers p1,...,pn that add up to 1. Entropy is a continuous function of the n-tuples (p1,...,pn), and these points lie in a compact subset of Rn, so there is an n-tuple where entropy is maximized. We want to show this occurs at (1/n,...,1/n) and nowhere else.
Suppose the pj are not all equal, say p1<p2. (Clearly n≠1.) We will find a new probability density with higher entropy. It then follows, since entropy is maximized at
some n-tuple, that entropy is uniquely maximized at the n-tuple with pi=1/n for all i.
Since p1<p2, for small positive ε we have p1+ε<p2−ε. The entropy of {p1+ε,p2−ε,p3,...,pn} minus the entropy of {p1,p2,p3,...,pn} equals
−p1log(p1+εp1)−εlog(p1+ε)−p2log(p2−εp2)+εlog(p2−ε)
To complete the proof, we want to show this is positive for small enough
ε. Rewrite the above equation as
−p1log(1+εp1)−ε(logp1+log(1+εp1))−p2log(1−εp2)+ε(logp2+log(1−εp2))
Recalling that log(1+x)=x+O(x2) for small x, the above equation is
−ε−εlogp1+ε+εlogp2+O(ε2)=εlog(p2/p1)+O(ε2)
which is positive when
ε is small enough since
p1<p2.
A less rigorous proof is the following:
Consider first the following Lemma:
Let p(x) and q(x) be continuous probability density functions on an interval
I in the real numbers, with p≥0 and q>0 on I. We have
−∫Iplogpdx≤−∫Iplogqdx
if both integrals exist. Moreover, there is equality if and only if
p(x)=q(x) for all
x.
Now, let p be any probability density function on {x1,...,xn}, with pi=p(xi). Letting qi=1/n for all i,
−∑i=1npilogqi=∑i=1npilogn=logn
which is the entropy of
q. Therefore our Lemma says
h(p)≤h(q), with equality if and only if
p is uniform.
Also, wikipedia has a brief discussion on this as well: wiki