With continual improvement in technology, internal processes and industry best practices, data center outages have occurred less frequently. However, they still happen on occasion. Power goes out, a backup system fails, security is breached, a technician makes a mistake or something completely unexpected causes unplanned downtime.
During 2014, several data centers operated by enterprises, service providers and government agencies experienced outages. Some of the most noteworthy events of this year include:
1. State of Iowa Data Center Fire — February 18, 2014
The state of Iowa experienced an electrical fire in its primary data center. Fortunately, the chief operations officer for information technology quickly assessed the damage and determined the best options for bringing the data center back online. Given the seriousness of the event, the duration of the outage lasted only 16 hours.
At the time the data center lost power, evacuation alarms sounded. As the staff assembled in designated areas and attendance was taken, employees reported fire, smoke and noise.
Approximately two hours after the initial evacuation, the fire department allowed top IT personnel into the data center to investigate the source and assess the damage. The team attributed the fire to a failure in a wall-mounted electrical suppression unit.
After this assessment, the primary focus became restoring power and bypassing the failure point. In addition, the team had to vent the data center to remove the smoke and odor.
2. Three Unrelated New York City Outages for Internap Data Centers — May 16-22, 2014
Cloud hosting service provider Internap experienced three outages at its New York City data centers. Electrical equipment failure, specifically component failures in uninterruptible power supply systems, caused the downtime in each incident. Although the outages happened within one week of each other and in the same geographic location, Internap reported the events were unrelated.
The first outage occurred at the company’s 8th Avenue data center on May 16, 2014. An estimated 20 companies were affected by the outages, including online video streaming platform Livestream.
On May 20, 2014 and again on May 22, 2014, Internap’s Broad Street data center experienced outages. The company estimated less than half of its customers were affected by these two outages.
Although the Broad Street facility has redundant UPS systems, not all tenants choose this option because of additional cost. Those customers paying extra for redundancy experienced an automatic switchover to the backup UPS. However, the customers not paying for this feature faced several hours of downtime until the problems were fixed.
3. Admin Error Causes Outage at Joyent’s Ashburn Data Center — May 27, 2014
A human error brought down one of Joyent’s data centers located in Ashburn, Virginia. Joyent provides high-performance cloud infrastructure services.
Because one of the company’s administrators made a mistake, all of the servers in the facility had to be re-booted. Joyent provided no details on the nature of the error. However, a company spokesperson reported a “fat-finger” operator error was at fault.
Many in the industry questioned why Joyent’s system was not built to withstand such errors. The company indicated it would be improving its software and operational procedures to prevent future outages. It also said the administrator who made the error would not be disciplined. Instead, the company plans to learn from the incident.
4. Infrastructure Change Leads to Facebook Outage — September 3, 2014
Facebook was brought down by an infrastructure configuration change in September. During the outage, users could not access the popular social network.
According to the company’s statement, they discovered the issue quickly and resolved it. Although the outage lasted only 10 minutes, the fallout was significant. For example, many users took to Twitter posting sarcastic tweets. Facebook experienced two other outages in August and June of this year.
The company’s current data center infrastructure is designed and maintained by in-house engineers. Facilities are located on both U.S. coasts and in Sweden.
Facebook uses a different “web-scale” approach to redundancy than traditional enterprises. This approach relies on software to make IT systems resilient instead of building redundant layers in mechanical and electrical infrastructure.
As these events illustrate, outages affect all types of data centers and are caused by a variety of issues. Every outage causes real pain for the company and the users of its services. However, service provider data center outages are especially damaging because they host infrastructure for a number of companies. In a cloud scenario, every server may host multiple customer nodes.
Since 2001, CyrusOne has been providing fully redundant power architectures not typically found in in-house data centers. By using advanced distributed redundant power architecture, CyrusOne achieves 2N power levels supported by a 100% uptime Service Level Agreement. This power architecture ensures no interruptions in data center power availability.
With over two dozen data centers across the globe, CyrusOne helps many of the world’s largest global businesses – including nine of the global Fortune 20 companies and 140 of the Fortune 1000 – and companies of all sizes take advantage of the latest data center technology. For more information on how to prevent data center outages, visit http://www.cyrusone.com/.