You'd be correct. The largest portion of all languages in Common Crawl (aka the "whole open internet" training corpus) is English with 43%. No other language even reaches double digit percentages. The next biggest one is Russian at 6%, followed by German at 5%.
And lots of people write on the web using English as a second language, which both reduces the presence of their native language and increases the presence of English.