Mining Twitter follower network of 36 million users

buza · on April 2, 2011

I'm curious about how these classifications were done. I graduated from the MIT Media Lab a few years back, and after looking through the result set titled:

"A small community of users associated with MIT Media lab detected using the algorithm after 7th iteration"

I saw a handful of MIT undergrads listed, many of which I'm certain had no affiliation with the Media lab, while nearly all of the others I do not recognize, even as an active Twitter user. I chose two random entries from this list: 'RFheargaile' and 'realduedate', and found both to be of extremely dubious authenticity.

aksbhat · on April 2, 2011

I agree they are not very high confidence classifications. Due to random nature of the algorithm, It is very hard to correctly label a community. Also since for most users explicit permission to follow is not required which leads to dubious/spam users. Since the data was collected in June 2009, it is possible that a lot of users might have blocked spam profiles from following them.

I guess I need to clarify this point.

citricsquid · on April 2, 2011

why is the com in the title of this post in capitals, when it links to lowercase .com and the site has no redirect to .COM in place.

aksbhat · on April 2, 2011

I made a mistake in typing the url.

samratjp · on April 2, 2011

This is really cool stuff! Did you come across any interesting retweeters and spam rings?