After a long RAID rebuild of a 40TB+ and subsequent ongoing scrub, the server is back online for customers to retrieve data. We will be retiring this node ASAP, and suggest all customers retrieve their data if they did not have backups.
We have active tickets opened with all impacted customers (<10), and will communicate directly with them to assist with any further recovery.
A single storage node has failed, though we are continuing to work to restore the node, we expect this to take some time over the next 48 hours as we work through the RAID hardware failure to recover.
This impact is limited to a handful of customer servers and we will work with them directly to resolve the remaining issues.
Posted 2 days ago. Sep 14, 2019 - 01:37 EDT
We are continuing to work on a fix for this issue.
Posted 2 days ago. Sep 14, 2019 - 01:20 EDT
We are investigating another outage with the same storage hypervisor. We will provide further details shortly.
Posted 2 days ago. Sep 13, 2019 - 21:01 EDT
We are continuing to investigate and fix issues as they come up on the storage hypervisor. There are still some VMs doing storage checks as they all come online. We expect to provide a full all-clear shortly. As promised earlier, an RFO will be provided in 72 hours.
Posted 4 days ago. Sep 12, 2019 - 20:00 EDT
We are still working through issues on the storage hypervisor from the initial outage. We are continuing to work on this and will provide further updates as we have them.
Posted 4 days ago. Sep 12, 2019 - 18:37 EDT
The storage hypervisor is now back online and our engineers are currently monitoring the issue. More updates are to follow.
Posted 4 days ago. Sep 12, 2019 - 17:36 EDT
The hypervisor has gone offline again, we are bringing it back online now and will provide an update shortly.
Posted 4 days ago. Sep 12, 2019 - 17:19 EDT
The issue with the storage hypervisor has been remediated. We are still monitoring and will provide further updates as necessary. An RFO will be published within 72 hours.
Posted 4 days ago. Sep 12, 2019 - 16:19 EDT
We have identified the issue and are continuing to work on a fix. We expect to have another update in 15 minutes.
Posted 4 days ago. Sep 12, 2019 - 16:04 EDT
We have identified the issue to one of our storage servers and are continuing to work on a resolution. More updates to come before 1600 ET.
Posted 4 days ago. Sep 12, 2019 - 14:41 EDT
We are still actively working on the resolution of this issue. Another update to come before 1500 ET
Posted 4 days ago. Sep 12, 2019 - 14:08 EDT
We identified an issue with a storage hypervisor, we are working on correcting this issue now.