Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

if your data isn't relational to start with (e.g. messages in a message queue, being written to files in HDFS), you're better of with Map/Reduce

If you query the data once and then throw it away, then certainly load performance is a critical issue. If you load the data once and then query it many times, load performance is far less important -- trading a longer load time for much better query performance would be a good idea. So it depends on the workload as much as the format the data happens to start in.



I think Hadoop and M/R seem to be best for:

"Take the data, transform it into another form ONCE, query the transformed using a low latency scheme many times".

(E.g. building a web index or in my case, building per ad/per user/per publisher key/value pairs for ad targeting).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: