Sunday, April 27, 2014

Seeing the Azure Cloud from the Inside

I had the wonderful opportunity last week to visit one of Microsoft's data centers. Since I am bound by NDA, I won't say where it was, and I won't talk about anything that I can't find discussed in a press release or an article on the web about Microsoft Azure and MSFT's data center footprint.  That still leaves a tremendous amount to talk about, though. And frankly, reading about a place like this is one thing, but actually going and touring it is quite another.  I'll try to give you a feeling for that here.

The company describes the number of global facilities they maintain as "more than 10 and less than 100". You can get a feeling for the global footprint by looking at this Azure Regions site, and also by reading the Azure Wikipedia entry.  Keep in mind that Microsoft hosts over 200 of their own services in their data centers, including Bing, Outlook (formerly Hotmail), the new Office365, Skydrive, Xbox, their robust advertising network, and much more. Matthew Sorvaag's website has some info on where the DCs are, although I think this is a bit out of date.

If you are like one of my many elitist techie Microsoft bashing friends, you'll be sad to know that Apple's iCloud runs on Microsoft Azure, and has since 2011.  Recall that earlier that year, AAPL had a very embarassing outage, and other issues.  Less than a year later, reports from insiders confirmed the service was running at both AMZN and MSFT. Most know that AAPL has their own data centers, too, so what parts of these services are operating on any of these facilities is a bit hard to determine, what with Apple's great penchant for secrecy and the imprudence (and likely contractually limited) of talking about the relationship for either of the Cloud vendors.

Microsoft has produced a terrific video, Windows Azure Data Centers: the 'Long Tour', describing several of their data centers extremely well. The video is a couple of years old, but does justice to the scale and approach. In fact, at least one of the facilities in the video has been substantially upgraded since the video was shot (and the description in the movie is already impressive!)

The Microsoft Quincy, WA DC, is today even bigger than this photo and as described in the movie: it has three big buildings now (new one is in the lower left, where it looks like an open field in this foto. Foto from NYT Kyle Bair/Bair Aerial)

The company is employing state of the art technologies and processes in all areas.  The locations, construction, staffing, networking, server deployment and maintenance, as well as operations are all advanced and world class.  It's worth noting that the DCs described in the video are generation 3 and generation 4.  MSFT now has gen 5 centers, and is looking forward to a day when they occupy generation 6 facilities.  If you aren't familiar with all the lingo around data centers, don't confuse these generations with data center "tiers", which generally describe "redundancy".

  • Tier 1 = Non-redundant capacity components (single uplink and servers).
  • Tier 2 = Tier 1 + Redundant capacity components.
  • Tier 3 = Tier 1 + Tier 2 + Dual-powered equipments and multiple uplinks.
  • Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered.
These MSFT facilities are all N+2, or in other words: whatever components in place are deemed to be mission critical, there are 2 backups in place.

Generation 4 in full operation now

ITPACs, which are being deployed in the newer MSFT data centers are purpose built, modular containers that resemble shipping containers, they are are power efficient and operationally better. Watch this great ITPAC video, which describes how they efficiently combine compute, power, cooling, and networking in self-contained modules. From an environmental perspective, the ITPACs are a great innovation. On cold days, a portion of the hot exhaust air from the servers is redirected internally through a mixing unit on top of the ITPAC where it mixes and warms the air to a temperature suitable for the servers.  I saw units with HP or Dell equipment installed, as well as storage and networking.  

One interesting statistic to note is that the many tens of thousands of servers in the facility together lose about 300 hard drives a week. In fact, this is one of the very limited number of tasks the operations personnel at the facility do-- replace hard drives.  There are several amazing magical elements to this:  first, no customer or service is impacted by the hard drive failures. Second, the techs perform the drive replacements without any knowledge of what data is stored on the servers. Finally, the hard drives are erased, and then crushed into teeny tiny little pieces.

Everyone who is anyone in the Internet has used F5 boxes at one point or another. These boxes are highly regarded. And those of you who know this, also know these devices are very expensive. I was only in one hall in this facility, and, along with many, many other of the usual suspects in Internetworking gear, I saw tens of millions of dollars in F5 gear ALONE.  Pretty impressive.

Another example is the number of generators in place at this facility, just astonishing.  I got to step inside one of the generator housings, and they aren't the cute little CATs shown in the movie above... instead they are monstrous 20 cylinder engines painted gray with fuel feed pipes as big as wide as a manhole cover.  While inside, I was praying to God that the generator didn't start because it would have been very, very scary and loud.

I was also in an electrical room with a similarly scary amount of electricity being fed through it.  This room was one of 6... remember that the facility is "larger than 10 football fields" in size. Let's face, when you've seen row after row of the same server and storage boxes, it sort of becomes mind numbing. So when you go visit the power distribution, it's a little exciting.  For example, the circuit breakers in the Siemens distribution equipment have their own cranes on the top of the rack, because they are so heavy, it takes more than one very strong person to pick them up.

Crazy giant Siemens Circuit Breaker that comes with its own little crane.

One debate I've had with many colleagues in both IT and in Printing over the years regards whether it is prudent to operate your own data center, or to outsource.  Cloud aside, when you visit a facility like this, you understand that an operation like this, with billions of dollars of investment, and run like a top secret military facility, you realize this is something that most companies cannot do.

When you then add in the economics of the Cloud, assuming you can determine that it makes sense for your business, it is hard not to come to the stark realization that you are getting capabilities that truly will provide an enormous competitive advantage for your company. There are numerous choices, and they need to be examined very thoroughly but it is pretty clear that these services are game changers. We must embrace them for as much of our IT infrastructure as we possibly can.