Actually I tried chasing this up. Xerox did overlapping windows first and they performed abominably (they drew everything using painter's algorithm). They dropped overlapping windows as a performance optimization.
Then Andy Hertzfeld or Bill Atkinson implemented clipping in such a way that most drawing commands became no-ops when they were outside a clipping region (if you set a clip rect then call "fillrect" and it's outside the clipping rect then you do nothing, right?), allowing highly performant overlapping windows. There's a bit somewhere (folklore, Hackers? can't remember) where they showed this to Xerox folk who couldn't believe how well it worked.
Actually, Bill Atkinson saw the PARC demo along with Jobs, and he thought Xerox had come up with an efficient implementation for overlapping regions. This inspired him to work on his own implementation for a problem he previously had considered too difficult. Only after Atkinson had finished his implementation did he learn that PARC was using a brute-force technique. (The Alto was a more powerful and expensive computer than the Lisa and Macintosh, so it could get away with a less efficient implementation.)
Then Andy Hertzfeld or Bill Atkinson implemented clipping in such a way that most drawing commands became no-ops when they were outside a clipping region (if you set a clip rect then call "fillrect" and it's outside the clipping rect then you do nothing, right?), allowing highly performant overlapping windows. There's a bit somewhere (folklore, Hackers? can't remember) where they showed this to Xerox folk who couldn't believe how well it worked.