Rackspace Outage Nov 12th
On November 12th at 13:51 CST Rackspace experienced an isolated issue in their core network. A small number of their customers were affected, including REW. The outage lasted about 90 minutes. In simple terms, a core network switch died and when the traffic failed over to the secondary switch it also died. Rackspace is investigating the incident to find ways to improve their network and processes to ensure this event is not repeated. REW Sysadmins were immediately notified of the outage by our monitoring tools and were in constant contact with Rackspace during the outage working to resolve as quickly as possible.
REW apologizes for this outage; we promise that we are putting Rackspace's feet to the fire to ensure maximum uptime for our customers!
Here is the incident report from Rackspace if you want the techy details:
On 12 November at 13:51 CST, an issue occurred with an ExNet aggregation router in our DFW data center. As a result, a portion of customers with devices provisioned to the router experienced an interruption in service for approximately 53 minutes due to a failed module on the device. Our engineers replaced the module at 15:21 CST to restore service.
REMEDIATION AND CURRENT STATUS
Engineers were alerted to failures on a switch within the affected aggregation router VSS pair. During remediation efforts being performed on the secondary affected switch, the primary switch became unstable and rebooted into recovery mode. The problem on the secondary switch was caused by a faulty module that was replaced to restore service. We apologize for any inconvenience you experienced and appreciate your patience as we worked to resolve the issue. We will be performing a root cause investigation to determine the cause of the issue as well as actions to ensure a stable and reliable network environment.