My intuition is this: (1) decision surfaces are always linearly separable with e...

My intuition is this:

(1) decision surfaces are always linearly separable with enough dimensions

(2) NNs have enough dimensions

(3) NNs linear boundaries are coarse

(4) coarse boundaries in high dimensions are likely to approximate the low-loss true boundary (ie., given 1).

The idea behind (4) is just the linear regression idea: by (1) noise is gaussian and a straight-line is a good approximation. With a coarse line, we do not fit to noise, and hence prob. have a good aprox.

The phrase "neural network" disguises the obviousness of this reasoning: a NN is just high-dimensional piece-wise linear regression.

The only thing to be explained is why, in high dimensions, datasets end up nearly piece-wise linear.

That isnt so hard to explain.