Pooled variance is used to combine together variances from different samples by taking their weighted average, to get the "overall" variance. The problem with your example is that it is a pathological case, since each of the sub-samples has variance equal to zero. Such pathological case has very little in common with the data we usually encounter, since there is always some variability and if there is no variability, we don't care about such variables since they carry no information. You need to notice that this is a very simple method and there are more complicated ways of estimating variance in hierarchical data structures that are not prone to such problems.
As about your example in the edit, it shows that it is important to clearly state your assumptions before starting the analysis. Let's say that you have n data points in k groups, we would denote it as x1,1,x2,1,…,xn−1,k,xn,k, where the i-th index in xi,j stands for cases and j-th index stands for group indexes. There are several scenarios possible, you can assume that all the points come from the same distribution (for simplicity, let's assume normal distribution),
xi,j∼N(μ,σ2)(1)
you can assume that each of the sub-samples has its own mean
xi,j∼N(μj,σ2)(2)
or, its own variance
xi,j∼N(μ,σ2j)(3)
or, each of them have their own, distinct parameters
xi,j∼N(μj,σ2j)(4)
Depending on your assumptions, particular method may, or may not be adequate for analyzing the data.
In the first case, you wouldn't be interested in estimating the within-group variances, since you would assume that they all are the same. Nonetheless, if you aggregated the global variance from the group variances, you would get the same result as by using pooled variance since the definition of variance is
Var(X)=1n−1∑i(xi−μ)2
and in pooled estimator you first multiply it by n−1, then add together, and finally divide by n1+n2−1.
In the second case, means differ, but you have a common variance. This example is closest to your example in the edit. In this scenario, the pooled variance would correctly estimate the global variance, while if estimated variance on the whole dataset, you would obtain incorrect results, since you were not accounting for the fact that the groups have different means.
In the third case it doesn't make sense to estimate the "global" variance since you assume that each of the groups have its own variance. You may be still interested in obtaining the estimate for the whole population, but in such case both (a) calculating the individual variances per group, and (b) calculating the global variance from the whole dataset, can give you misleading results. If you are dealing with this kind of data, you should think of using more complicated model that accounts for the hierarchical nature of the data.
The fourth case is the most extreme and quite similar to the previous one. In this scenario, if you wanted to estimate the global mean and variance, you would need a different model and different set of assumptions. In such case, you would assume that your data is of hierarchical structure, and besides the within-group means and variances, there is a higher-level common variance, for example assuming the following model
xi,jμjσ2j∼N(μj,σ2j)∼N(μ0,σ20)∼IG(α,β)(5)
where each sample has its own means and variances μj,σ2j that are themselves draws from common distributions. In such case, you would use a hierarchical model that takes into consideration both the lower-level and upper-level variability. To read more about this kind of models, you can check the Bayesian Data Analysis book by Gelman et al. and their eight schools example. This is however much more complicated model then the simple pooled variance estimator.