Storage Server

Incident Report for MNX.io

Resolved

After a long RAID rebuild of a 40TB+ and subsequent ongoing scrub, the server is back online for customers to retrieve data. We will be retiring this node ASAP, and suggest all customers retrieve their data if they did not have backups.

We have active tickets opened with all impacted customers (<10), and will communicate directly with them to assist with any further recovery.

Posted Sep 15, 2019 - 22:18 EDT

Monitoring

A single storage node has failed, though we are continuing to work to restore the node, we expect this to take some time over the next 48 hours as we work through the RAID hardware failure to recover.

This impact is limited to a handful of customer servers and we will work with them directly to resolve the remaining issues.

Posted Sep 14, 2019 - 01:37 EDT

Update

We are continuing to work on a fix for this issue.

Posted Sep 14, 2019 - 01:20 EDT

Identified

We are investigating another outage with the same storage hypervisor. We will provide further details shortly.

Posted Sep 13, 2019 - 21:01 EDT

Update

We are continuing to investigate and fix issues as they come up on the storage hypervisor. There are still some VMs doing storage checks as they all come online. We expect to provide a full all-clear shortly. As promised earlier, an RFO will be provided in 72 hours.

Posted Sep 12, 2019 - 20:00 EDT

Update

We are still working through issues on the storage hypervisor from the initial outage. We are continuing to work on this and will provide further updates as we have them.

Posted Sep 12, 2019 - 18:37 EDT

Monitoring

The storage hypervisor is now back online and our engineers are currently monitoring the issue. More updates are to follow.

Posted Sep 12, 2019 - 17:36 EDT

Identified

The hypervisor has gone offline again, we are bringing it back online now and will provide an update shortly.

Posted Sep 12, 2019 - 17:19 EDT

Monitoring

The issue with the storage hypervisor has been remediated. We are still monitoring and will provide further updates as necessary. An RFO will be published within 72 hours.

Posted Sep 12, 2019 - 16:19 EDT

Update

We have identified the issue and are continuing to work on a fix. We expect to have another update in 15 minutes.

Posted Sep 12, 2019 - 16:04 EDT

Update

We have identified the issue to one of our storage servers and are continuing to work on a resolution. More updates to come before 1600 ET.

Posted Sep 12, 2019 - 14:41 EDT

Update

We are still actively working on the resolution of this issue. Another update to come before 1500 ET

Posted Sep 12, 2019 - 14:08 EDT

Identified

We identified an issue with a storage hypervisor, we are working on correcting this issue now.

Posted Sep 12, 2019 - 13:32 EDT