Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You are already thinking in the right direction if you understand how Hadoop works. But don't bother for now. No, really, don't!

Just be ready to profile and refactor your code when you do need to scale.

That said, you can improve the odds with practices that are almost cost-free, but will help a lot later on.

- Read Cal Henderson's book.

- The center of your design should be the data store, not a process. You transition the data store from state to state, securely and reliably, in small increments.

- Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition.

- Don't make your data store too smart. Calculations and renderings should happen in a separate, asynchronous process.

- The data store should be able to handle lots of concurrent connections. Minimize locking. (Read about optimistic locking).

- Protect your algorithm from the implementation of the data store, with a helper class or module or whatever. But don't (DO NOT) try to build a framework for any conceivable query. Just the ones your algorithm needs.

Think this is obvious? Just the other day I heard of a project, staffed by so-called experts, that made every one of the mistakes I mentioned above. And in simulations, they cannot even keep up with the load they expect at launch.



> - Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition.

I'd like to emphasize this point in particular. Shared state is a big, inefficient, centralized bureaucrat and is the enemy of horizontal scaling. Statelessness and decentralization are best-friends-forever (they each have a denormalized copy of different aspects of the same friendship bracelet!), and you should figure out as many ways as possible to exploit statelessness and minimize unnecessary shared state in your application.

If you have the time, learn statelessness-in-the-small as well: play with a strict functional programming language (for your own edification, not necessarily for implementing this particular project!), and you will learn how to become keenly aware of the flow of state in any program, and how to maximize statelessness in any language. This will improve anything you ever program again, and will form a powerful mental model of state that will carry over by analogy to statelessness-in-the-large.


Ok - I am using the session to speed up performance - by keeping around an object that is central to a user's workflow for just about every request w/o going back to the db for it.

Are you suggesting that going back to the db each time is more scalable, or that I am better off using some kind of method level cache?

Honestly - just wondering - any good articles on what you are discussing? I guess I just don't quite 'get' it - not having worked on an application that scaled to the point that session access was the bottleneck.


Are you suggesting that going back to the db each time is more scalable, or that I am better off using some kind of method level cache?

Yes and yes.

This is a really long and deep topic. There are all sorts of reasons why sessions are not a good idea. But let's stick to scalability.

If you're using your session as a sort of cache for objects, that's probably okay (although, consider using something designed for this, like memcached). The point is, you ought to be able to reconstruct all the objects you need from just the request parameters.

This is a pretty good article about it all.

"Session State is Evil" -- http://davidvancouvering.blogspot.com/2007/09/session-state-...


Yep, sessions are nice when your site is in the 1-5k visitors per day range, but don't seem to scale much beyond that. (coming from an RoR background here)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: