Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've built out many 42U racks in DC's in my time and there were a couple of rules that we never skipped:

1. Dual power in each server/device - One PSU was powered by one outlet, the other PSU by a different one with a different source meaning that we can lose a single power supply/circuit and nothing happens 2. Dual network (at minimum) - For the same reasons as above since the switches didn't always have dual power in them.

I've only had a DC fail once when the engineer was performing work on the power circuitry for the DC and thought he was taking down one, but was in fact the wrong one and took both power circuits down at the same time.

However, a power cut (in the traditional sense where the supplier has a failure so nothing comes in over the wire) should have literally zero effect!

What am I missing?

I've never worked anywhere with Amazon's budget so why are they not handling this? Is it more than just the imcoming supply being down?



> 1. Dual power in each server/device - One PSU was powered by one outlet, the other PSU by a different one with a different source meaning that we can lose a single power supply/circuit and nothing happens

Nothing happens if you remember that your new capacity limit per DC supply is 50% of the actual limit, and you're 100% confident that either of your supplies can seamlessly handle their load suddenly increasing by 100%.

I've seen more than one failure in a DC where they wired it up as you described, had a whole power side fail, followed by the other side promptly also failing because it couldn't handle the sudden new load placed on it.


EDIT: I misunderstood you were talking about power feeds, the normal case is the run "48% as if it's 100%" (because of power spikes, but also most types of transformers run more efficiently under specific levels of load (40-60).

Normally this is factored into the Rack you buy from a hardware provider, they will tell you that you have 10A or 16A on each feed, if you exceed that: it will work, but you are overloading their feed and they might complain about it.


The poster was speaking more of the power delivery going to the power supplies, not the server's power supplies themselves. So say each PSU 1 is wired to circuit A, each PSU 2 is wired to circuit B. Circuit A experiences a failure. All servers instantly switch over all their load to their PSU 2's on circuit B. Suddenly circuit B's load is roughly double what it was just moments ago. If proper planning wasn't created or followed, this might overload circuit B, meaning all PSU 2's go dark regardless of the server being able to do the change over or not.


Yeah I understand on re-reading: but that's also not how people run datacenters.

Obviously people can operate things however they want, but you wont get a tier 3 classification with that setup.


OP is talking about the DC power feed, not a single server PSU.


You don't get fed DC power, you get fed AC power.

But, point taken: yes your power feed should be running at <50%. But that just means you treat 50% as 100% just like any resource.

Mostly this is outsourced to the datacenter provider; they'll give you a per side rating. (usually 10A or 16A) which also matches the cooling profile of the cabinet.


I mean, in some datacenters they run DC power to each rack. Its definitely more esoteric than having each device run AC but some people do it.

However, with their comment DC == Data Center, not Direct Current.


Yeah, I got thrown off by the "per DC supply is 50% of the actual limit"

DC = Datacenter? makes no sense, so my head replaced it with "Power Supply" instead of "DC Supply", second sentence does make sense as being datacenter though.


> I've only had a DC fail once when the engineer was performing work on the power circuitry for the DC and thought he was taking down one, but was in fact the wrong one and took both power circuits down at the same time.

This is all local scale. Your setup would not survive a data center scale power outage. At scale power outages are datacenter scale.

Data centers lose supply lines. They lose transformers. Sometimes they lose primary feed and secondary feed at the same time. Automatic transfer switches cannot be tested periodically i.e. they are typically tested once. Testing them is not "fire up a generator and see if we can draw from it"

It is cheaper to design a system that must be up which accounts for a data center being totally down and a portion of the system being totally unavailable than to add more datacenter mitigations.


The datacenter we were in had dual-sourced grid power (two separate grid connections on opposite sides of the block, coming from different substations) along with a room of batteries (good for iirc 1hr total runtime for the whole datacenter, setup in quad banks, two on each "rail"), _and_ multiple independent massive diesel generators, which they ran and switched power to every month for at least an hour.

And to top it off each rack had its own smaller UPS at the bottom and top, fed off both rails, and each server was fed from both.

We never had a power issue there; in fact SDGE would ask them to throw to the generators during potential brown-out conditions.

Of course this was a datacenter that was a former General Atomics setup iirc ...


We were in a triple sourced data center. Fed by three different substations. Everything worked like a charm. Until Sandy hit. It did not affect us at all. But it affected the power company. And everything still worked fine, until one of the transfer switches transferred into UPS position and stopped working in that position.


Yes but if you have reliable power from two different sources then the biggest risk (I'd imagine) is the failover circuitry! Something that should be tested tbh.

Also, there are banks of batteries and generators in between the power company cables and the kit: did they not kick-in?

Again, this is all pure speculation: I have absolutely no idea of the exact failure, nor how their infrastructure is held together - this is all just speculation for the hell of it :)


> Yes but if you have reliable power from two different sources then the biggest risk (I'd imagine) is the failover circuitry! Something that should be tested tbh.

That's ATS. It is not really advisable to test their under load performance because the failure of an ATS would be catastrophic. ATS typically would be tested at the installation and after that their parameters would be monitored.

Replacing a functional in line ATS would be a 9-12 months long project.

> Also, there are banks of batteries and generators in between the power company cables and the kit: did they not kick-in?

At high energy you are pretty much always going to use an ATS.


> the failure of an ATS would be catastrophic

Because that would mean no power at all to the DC and no way to get it back? (I am completely ignorant on this topic)


> Because that would mean no power at all to the DC and no way to get it back? (I am completely ignorant on this topic)

While most of smarts in the ATS are in the electronics, the really nasty failures come from the mechanical part.

At the end of the day a high energy ATS looks just like a switch behind a meter in your house. There's a lip that goes from one position to another, except in a high energy ATS the lip is big and when the transfer occurs it slams from one source to another.

There are only so many of those physical slams that it can withstand to being with so you want to minimize that number.

The second failure mode is that after transfer to non-main source, the lip can get stuck there, making it impossible to switch back on the main. [Once I have seem the lip melt into the secondary position. While I thought it was weird, the guys from the power company said it is not that uncommon.] This creates a massive problem as the non-main source is typically not designed for long term 24x7 operation. So now you are stuck on a secondary feeding system and you cant just transfer to main without de-energizing the system i.e. taking the power out of the entire data center.


Frying hardware can affect much wider scope.

I've had bad power supplies fry out taking the whole power circuit with it, and thus half (or whatever fraction) of the rack's power. I've also had bad power supplies bring down the whole machine as they shunted everything internal too.

When things go bad, anything can happen. You can provide the best effort, and it'll usually work as expected, but there will always be something that can overcome your best efforts.


The only full datacenter outage I've personally experienced was a power maintenance tech testing the transfer switch between systems where the power was 90 degrees out of phase. Big oof.


Transfer switches at any facility that's worth being colocated in are exercised as periodically as the generators to which they connect. In all of the facilities I have had systems in (>20MW total steady state IT load), that meant once per month at minimum to keep generators happy -and to ensure the transfer functionality works-, and more often if the local grid demands it, e.g. ComEd in Chicago, or Dominion in NoVA asking for load shedding.


"It is cheaper to design a system that must be up which accounts for a data center being totally down and a portion of the system being totally unavailable than to add more datacenter mitigations."

Citation needed - the same issue with testing, data races and expensive bandwidth come up.


At high energy the lead time for the components is measured not in days but in years.


And so is development time of any distributed software system, and training time required to operate it correctly


> And so is development time of any distributed software system, and training time required to operate it correctly

Software is much easier than hardware. If you are to start a project today in this kind of hardware, you will be operating it in 2029, without changes.


"Software is much easier than hardware. If you are to start a project today in this kind of hardware, you will be operating it in 2029, without changes."

I don't think this makes sense, you are using the three statements "Software is cheaper", "Software takes less time" and "software is easier" as if they all mean the same thing, and proving one means proving all of them.

Hardware takes a long time, okay, that does not mean it's expensive. Building a hydroelectric dam takes 20 years, but it provides the cheapest source of electricity that ever existed. Ships can take a decade from order to delivery, they are the cheapest mode of transport.


Why spend the cost on dual X and Y when you can failover to another cluster?

For big DC workloads, it is usually, though not always, better to take the higher failure rate than add redundancy.


Really? You'd think at Amazon's scale an additional PSU in a 1U custom-built server (I assume they're custom) would be a few tens of $ at most.

Actually, now that I type that it makes sense. Scaling a few tens of dollars to a bajillion servers on the off-chance that you get an inbound power failure (quite rare I'd reckon) might cost more than what they'd lose if it does actually fail.

So yeah, they're potentially just balancing the risk here and minimising cost on the hardware.

Edit: changed grammar a bit.


At big cloud provider scale like Amazon, Azure, and Google they probably aren't even running PSUs at each server, they're probably doing DC at the rack these days. No point in having a million little transformers everywhere, far easier maintenance centralizing those and have multiple feeding the bus bars going to each rack.


The ones Im seeing designed have been moving the DC out to the cabinets with A/B 480VAC power feeds on the bus, and integrated DC inverters/rectifiers/batteries at the rack level.

More modular and a lot less copper at 10x the voltage. Still a lot of copper.


> I've never worked anywhere with Amazon's budget so why are they not handling this?

Perhaps we are going to discover how AWS produces such lofty margins by way of their next RCA publication.


> What am I missing?

My guess is that they cheaped out in having redundant PSUs to get you to use multiple availability zones. (More zones = more revenue)

Even a single PSU shouldn’t be an issue if they plugged in an ATS switch though.


Unless the ATS breaks, which happens.


Yup. I'm still upset (but not angry) about https://status.linode.com/incidents/kqhypy8v5cm8.


For sure, in my context I meant a ATS in single rack/cabinet. If that went bad the blast radius would be contained to a single cabinet. But yeah, anything can and will happen. At another place I worked at, a site UPS took down an entire server room. It was pretty nice Eaton system but there was some event that fried the whole thing. Eaton had to send an specialist to investigate the matter as those events are pretty rare.


What about a UPS/battery thingy? That's saved me a few times, though it normally just gives enough time for a short outage. Is it uncommon in cloud infra?


For even regular datacenters they'll often have UPS systems the size of a small car, usually several of these, to power the entire datacenters for a few minutes to get the diesel generator started.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: