The use of super scripts as you have stated I believe is not very common in machine learning literature. I'd have to review Ng's course notes to confirm, but if he's putting that use there, I would say he would be origin of the proliferation of this notation. This is a possibility. Either way, not to be too unkind, but I don't think many of the online course students are publishing literature on machine learning, so this notation is not very common in the actual literature. After all, these are introductory courses in machine learning, not PhD level courses. 
What is very common with super scripts is to denote the iteration of an algorithm using super scripts. For example, you could write an iteration of Newton's method as 
θ(t+1)=θ(t)−H(θ(t))−1∇θ(t)
where H(θ(t)) is the Hessian and ∇θ(t) is the gradient. 
(...yes this is not quite the best way to implement Newton's method due to the inversion of the Hessian matrix...)
Here, θ(t) represents the value of θ in the tth iteration. This is the most common (but certainly not only) use of super scripts that I am aware of. 
EDIT:
To clarify, in the original question, it appeared to suggest that in the ML notation, x(i) was equivalent to statistic's xi notation. In my answer, I state that this is not truly prevalent in ML literature. This is true. However, as pointed out by @amoeba, there is plenty of superscript notation in ML literature for data, but in these cases x(i) does not typically mean the ith observation of a single vector x.