Server Down - VPS issues

**Erelei** · May 16, 2017

Hey guys,

I've done some research. After not being able to SSH into the shell, or view services via HTTP, I've found the cause. Unfortunately our host is going through some issues, and have been for awhile. They're working on restoring everything, and I'm not sure how heavily affected we are. Please note we're part of the VPS node, and we may have some issues as far as 24+ hour backups not being available, which may mean your character might go back 24 or so hours.

I won't know anything until they update the status.

Services Affected: Riverwind, Nightshade, VPS Cluster Control Panel and VPS Services at Phoenix.

We are currently investigating issues across multiple virtual servers with service failure and database crash.

These services affect the srv04ssphzaz hardware node as well as Riverwind and Nightshade hosting servers.

2017-05-15 10:50 MST - An emergency reboot of the hardware node is necessary. All services on across the node are experiencing issues and the database problems are belived to be due to disk cache issues.
2017-05-15 11:19 MST - The server has failed to restart from reboot and we're currently investigating the cause. At this time no ETR.
2017-05-16 01:30 MST - We're currently reviewing options of migration of customer data to new hardware. The server is experiencing hardware issues at this time.
2017-05-16 04:00 MST - New hardware is being procured and setup. Data backup off of the previous server has begun to be used for restoration.
2017-06-16 06:50 MST - Data restoration is at approximiately 25%. Recovering the system's data from the raid disks has yielded in large amount of 'recovery' files that has the potential for issues. As we're restoring servers, we are manually comparing customer data for missing or corrupted files. Due to the data corruption, the last snapshot backup from earlier in the day was corrupted/missing data and we'll compare/restore against the previous daily. This may yield is previously deleted files being restored. These files can be safely deleted again. This may be necessary to verify that customers have all necessary files for operation.
2017-06-16 11:45 MST - Data restoration still in progress.
2017-06-16 12:30 MST - Data restoration is almost complete. Unfortunately, after making a detailed review of the file system we're finding widespead file corruption randomly throughout files in 100,000s of inodes. Most of the corruption does fortunately reside in operating system files and there is only sparse corruption throughout customer profiles. After reviewing the data and options, we're unfortunately making the decision to restore the last known good backup from May 13. We'll then work through each individual customer profile to try and determine if we can restore the live data to their profile. This process will take significantly longer due to the manual checking process, but we believe it will yield a more consistent result.

We apologize for the inconvience and appreciate everyone's patience as we work to restore services.
Email for support has been routed off to a backup MX host in the mean time as we work to reply to support requests.
We will restore any background processes that were running prior to issue.
Service credits will be issues for the outage, a follow-up email will be released once outage event has been resolved.

f0xx · May 16, 2017

Shit happens.

Sign In

Server Down - VPS issues

Recommended Posts

Erelei

Link to comment

Share on other sites

f0xx

Link to comment

Share on other sites

Browse

Wiki

Trackers

Other

Activity