*if your data isn't relational to start with (e.g. messages in a message queue, ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		neilc on April 16, 2009 \| parent \| context \| favorite \| on: Parallel DBMSs faster than MapReduce if your data isn't relational to start with (e.g. messages in a message queue, being written to files in HDFS), you're better of with Map/Reduce If you query the data once and then throw it away, then certainly load performance is a critical issue. If you load the data once and then query it many times, load performance is far less important -- trading a longer load time for much better query performance would be a good idea. So it depends on the workload as much as the format the data happens to start in.

strlen on April 16, 2009 [–]

I think Hadoop and M/R seem to be best for:

"Take the data, transform it into another form ONCE, query the transformed using a low latency scheme many times".

(E.g. building a web index or in my case, building per ad/per user/per publisher key/value pairs for ad targeting).

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact