You mean something like http://www.voxforge.org/ Seriously there is no massive C...

You mean something like http://www.voxforge.org/

Seriously there is no massive CC-licensed source of audio data out there. Most of the fancy algorithms for doing speech recognition are on github. What isn't is a massive and diverse dataset. I encourage others to reply if they have seen otherwise.