Incident kvm-7950x-1.eu-nbg.advinservers.com
Impact: kvm-7950x-1.eu-nbg.advinservers.com
We are currently investigating
Update 1 2:30AM EST:
We believe that there was a power issue with this server. We have dispatched a technician to check. There is no ETA for restoration at this time.
Update 2 5:08AM EST:
We are still working on this incident.
Update 3 8:35AM EST:
The problem seems to be a hardware fault relating to the hypervisor. We were able to get it to boot again, but we are waiting for further remote hands technician availability before we can finish a temporary fix, as there is still work that needs to be done to bring the hypervisor back online. Due to power maintenance that is coincidentally going on at around the same time, technician availability has been spotty, therefore prolonging this incident.
Unfortunately, we are still not absolutely sure what the original cause is. Therefore, we will be migrating all virtual machines on this hypervisor to our 9950X hypervisors, and we will be decommissioning this 7950X3D hypervisor once it is brought back online. We deeply apologize for the long wait and this incident will be eligible for SLA.
Update 4 12:30PM EST:
Unfortunately, we have tried a number of solutions but still could not resolve the problem. The VPS hypervisor crashes after a few minutes of runtime. At this point, we have requested the drives to be moved to a new server.
Update 5 3:00PM EST:
We are still working on this incident.
Update 6 3:53PM EST:
We have moved the disks to a new host node but there are signs of data corruption. We are still working on this incident.
Update 7 6:26PM EST:
Unfortunately, when we moved the disks to a new host node, there started to be I/O errors that halted disk operations. This made it impossible to copy out the data. We found out that this was due to a firmware bug with the Intel P4610 series of drives. We're not sure why this started to present itself upon moving to the new hardware.
After a few hours of troubleshooting, we managed to fix the I/O speeds for at least 1 out of the 2 disks in the RAID1 hosting VPS data. We are now copying out the virtual machines on this hypervisor. As of now, the data for 3/12 VMs (25%) has been copied out and booted again. We are slowly working on the remaining VMs.
Update 7 7:23PM EST:
6/12 VMs (50%) migrated out
Update 8 8:45PM EST:
We have migrated all VMs out and everything should now be back online. We will follow up with SLA details shortly. If your virtual machine is still offline, please open a ticket immediately so that we can assist you.
Please note that if you had any backup jobs, you may need to enable them again.
We are currently investigating
Update 1 2:30AM EST:
We believe that there was a power issue with this server. We have dispatched a technician to check. There is no ETA for restoration at this time.
Update 2 5:08AM EST:
We are still working on this incident.
Update 3 8:35AM EST:
The problem seems to be a hardware fault relating to the hypervisor. We were able to get it to boot again, but we are waiting for further remote hands technician availability before we can finish a temporary fix, as there is still work that needs to be done to bring the hypervisor back online. Due to power maintenance that is coincidentally going on at around the same time, technician availability has been spotty, therefore prolonging this incident.
Unfortunately, we are still not absolutely sure what the original cause is. Therefore, we will be migrating all virtual machines on this hypervisor to our 9950X hypervisors, and we will be decommissioning this 7950X3D hypervisor once it is brought back online. We deeply apologize for the long wait and this incident will be eligible for SLA.
Update 4 12:30PM EST:
Unfortunately, we have tried a number of solutions but still could not resolve the problem. The VPS hypervisor crashes after a few minutes of runtime. At this point, we have requested the drives to be moved to a new server.
Update 5 3:00PM EST:
We are still working on this incident.
Update 6 3:53PM EST:
We have moved the disks to a new host node but there are signs of data corruption. We are still working on this incident.
Update 7 6:26PM EST:
Unfortunately, when we moved the disks to a new host node, there started to be I/O errors that halted disk operations. This made it impossible to copy out the data. We found out that this was due to a firmware bug with the Intel P4610 series of drives. We're not sure why this started to present itself upon moving to the new hardware.
After a few hours of troubleshooting, we managed to fix the I/O speeds for at least 1 out of the 2 disks in the RAID1 hosting VPS data. We are now copying out the virtual machines on this hypervisor. As of now, the data for 3/12 VMs (25%) has been copied out and booted again. We are slowly working on the remaining VMs.
Update 7 7:23PM EST:
6/12 VMs (50%) migrated out
Update 8 8:45PM EST:
We have migrated all VMs out and everything should now be back online. We will follow up with SLA details shortly. If your virtual machine is still offline, please open a ticket immediately so that we can assist you.
Please note that if you had any backup jobs, you may need to enable them again.