Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd kinda like to see which swear words appear most often in commit messages. I'm guessing that "shit" and "fuck" are much more common than "cocksucker" and "motherfucker", and if that's not true, I want to know which language has the most cocksuckers and motherfuckers.


Yeah, the pie chart doesn't quite cover it - I'd like to see both swear words per commit per language (if, say, Java has 10% of the swear words but 3% of the commits) and complexity of the swear words - a simple "Fuck" implies far less frustration than a "Motherfucking Cocksucker!"

Could develop quite a nice Programming Language Pain Index…


From what I remember there was only one or two "motherfuckers".

I will post up some more data if anyone is interested.


I run a slang dictionary website which lets users assign an offensiveness score to each term. That would be an interesting bit of data to add: not only the raw word count, but how offensive the swearing is for each language.

(For sample data, the 100 most vulgar words on the site are in a table here: http://onlineslangdictionary.com/lists/most-vulgar-words/ )


If it's legal, it would be awesome of you to make this commit message dataset available on Infochimps or something.


i'd say alter the list of swear words in general to a list possibly more tuned to programming. agreeing with above, word breakdown would be nice as well.

actually, maybe just a github swear browser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: