I'm working on a PhD in cognitive science. Something that I think relates to this is the idea of Emboddied Cognition [0], and in particular off-line emboddied cognition, where you use sensorimotor mechanisms in your body while thinking, even if you're not actually interacting with the environment. In this case it would be your brain activating the same audio processing areas you use when sound enters your ear, even though you're generating the sounds inside your head while thinking.
[0] https://en.wikipedia.org/wiki/Embodied_cognition