You are already thinking in the right direction if you understand how Hadoop wor...

icky · on Dec 18, 2007

> - Avoid globals and session state. The more "pure" your function is, the easier it will be to cache or partition.

I'd like to emphasize this point in particular. Shared state is a big, inefficient, centralized bureaucrat and is the enemy of horizontal scaling. Statelessness and decentralization are best-friends-forever (they each have a denormalized copy of different aspects of the same friendship bracelet!), and you should figure out as many ways as possible to exploit statelessness and minimize unnecessary shared state in your application.

If you have the time, learn statelessness-in-the-small as well: play with a strict functional programming language (for your own edification, not necessarily for implementing this particular project!), and you will learn how to become keenly aware of the flow of state in any program, and how to maximize statelessness in any language. This will improve anything you ever program again, and will form a powerful mental model of state that will carry over by analogy to statelessness-in-the-large.

goodgoblin · on Dec 19, 2007

Ok - I am using the session to speed up performance - by keeping around an object that is central to a user's workflow for just about every request w/o going back to the db for it.

Are you suggesting that going back to the db each time is more scalable, or that I am better off using some kind of method level cache?

Honestly - just wondering - any good articles on what you are discussing? I guess I just don't quite 'get' it - not having worked on an application that scaled to the point that session access was the bottleneck.

neilk · on Dec 19, 2007

Are you suggesting that going back to the db each time is more scalable, or that I am better off using some kind of method level cache?

Yes and yes.

This is a really long and deep topic. There are all sorts of reasons why sessions are not a good idea. But let's stick to scalability.

If you're using your session as a sort of cache for objects, that's probably okay (although, consider using something designed for this, like memcached). The point is, you ought to be able to reconstruct all the objects you need from just the request parameters.

This is a pretty good article about it all.

"Session State is Evil" -- http://davidvancouvering.blogspot.com/2007/09/session-state-...

sbraford · on Dec 20, 2007

Yep, sessions are nice when your site is in the 1-5k visitors per day range, but don't seem to scale much beyond that. (coming from an RoR background here)