Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

RAIDing together multiple EBS volumes feels like a massive hack to me. I can't help but wonder if this compounds the problem at Amazon's end. If EBS performance is a problem, Amazon need to fix it. For example, if some way of tying together multiple EBS volumes is a reasonable way of working around the problem, then why aren't Amazon providing "high performance" EBS volumes which do that under the hood?

If I were faced with EBS performance issues, I would see this as a big red flag, consider EBS unsuitable for the application and avoid it, rather than carrying on with such a workaround.



One other huge downside of raiding EBS volumes is you can't use EBS's snapshotting features as you cannot guarantee a perfect sync (you could use LVM yourself however).

Honestly, since EBS vols are supposedly not tied to a single disk, the raiding should be done on Amazon's end. That it isn't is telling.


You have to snapshot at the system level anyway if you want a consistent snapshot: otherwise the filesystem (or your database) could have been reordering and delaying writes that end up not being part of the "consistent snapshot". This is simply not a RAID-specific issue, nor is it a problem with EBS (as it is generally easy to use LVM, xfs, and/or PostgreSQL to handle that part of the job).


This is something I've never quite understood. Best practice guides say you need to do a "flush all tables" in MySQL and then do a filesystem freeze (possible in XFS) before you can use a snapshot system like the ones built into EBS or LVM. If you don't, you apparently stand a good chance of getting an inconsistent snapshot, even if the snapshotting mechanism itself is (like EBS and LVM) "point in time" consistent.

Why is all this necessary? If the system (i.e. DB + FS + block device) are all working as they should, then once a commit returns, the data should be on disk. If it's not, you have no guarantee data that you thought was committed will still be there after a kernel panic or power outage.

In that case, no amount of xfs-freeze or table flushing during a snapshot is going to save you from the fact that your DB is one kernel panic away from losing what the rest of your system believed were committed transactions.


In the specific case of a database server that actually has correct fsync semantics that the user has not disabled for some crazy performance reason, you are correct. However, there are many use cases that people want consistent snapshots across, like "apt-get install", that do not use a write barrier for every atomic-feeling operation.

(In fact, with a good database solution, like PostgreSQL, the RAID issue of the parent post is also solved: put your write-ahead or checkpoint logs on a single device, as its linear writes will easily swamp network I/O on an EBS, and use RAID only for backend storage, where you need random I/O.)


This is one reason why Oracle is still the gold standard. when entering hot backup mode, which is what you do during a snapshot, it logs the FULL BLOCKS that are changed. Failures and inconsistencies can be replayed from the archive logs.

Of course this means you can quickly blow out your log archival , so it's meant to be a transitory mode:


PostgreSQL has this exact same feature.


True. However, for some cases where you don't mind losing some data due to a recovery process EBS snapshots are 'good enough'. Additionally, with a database like CouchDB with a 'crash only' design, it should work for some cases as well.


We use EBS snapshots as a last-resort backup. They're really convenient that way. We have a more robust backup system, but in the unlikely event that something goes wrong at least we have those snapshots, even if they're not perfect.


xfs_freeze

In fact there is a handy package called ec2-consistent-snapshot (https://launchpad.net/ec2-consistent-snapshot) that will manage this for you!


May be I'm missing something here; Why there's even a discussion about RAID at the EBS level? When Amamzon says, "Amazon EBS volumes are designed to be highly available and reliable" and if we have to talk about RAID then the issue is on Amazon's end


I think most people are doing RAID-0 to get more perf out of EBS volumes


It also seems that in 2008 adding mirroring also hurt performance. I'm going to dive into this tonight to see if things have changed at all with these benchmarks.

"His results show a single drive maxing out at just under 65MB/s, RAID 0 hitting the ceiling at 110MB/s, RAID 5 maxxing out about 60MB/s, and RAID 10 “F2″ at under 55MB/s."

Summary source: http://www.nevdull.com/2008/08/24/why-raid-10-doesnt-help-on...

Data source (google cache): http://webcache.googleusercontent.com/search?q=cache:Vscz-VX...


Yes. Except, anybody who is doing RAID-0 over an EBS volume for perf reasons is ASKING for trouble.

You need to do RAID-10. EBS volumes CAN and DO fail.


I wish I had more than one upvote for this: swimming against a trend like that never works out well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: