One of our hosting filer units has stopped responding, likely due to a hardware fault. We recommend that you do not attempt to restart your server if it is not responding, and to wait until the incident is resolved. Our teams are on site to investigate and we will keep you informed of developments.
Update: 11:38 CET (10:38 GMT): The issue involves a control head on our old storage system. A fault on the disk controller has resulted in an interruption of service. We are currently recovering the storage volume.
Update 12:12 CET (11:12 GMT): We have corrected a kernel bug which in the event of a hard fault on the controller, will enable use to resolve the situation rapidly. We will be restarting the service shortly.
Update 13:01 CET (12:01 GMT): We have restarted the service on this filer. We are monitoring the controller for now, and will apply a patch during the afternoon which will incur a brief outage of this filer which is not expected to last more than a minute.
We have experienced a partial power failure at Equinix PA2 at 10:07 CEST this morning. The power failure lasted a few seconds, but had a knock-on effect on some equipment, notably some older generation equipment with single power supply causing them to reboot, and lose network connectivity with some backend services.
Our teams are working to restore all affected services.
Due to large DDoS attacks against hosts within the Gandi Network, network connectivity is suboptimal. We are seeing peak traffic in excess of 30Gbps as part of this attack. We are attempting to mitigate the effects of this attack, but some of these actions will result in sub-optimal routing between the Gandi network and some of our customers as long as the attacks persist.
Snapshots now use 50% less quota than before, and since the use of quota is divided in half, the operation is twice as fast!
This also means that you don't need as much available quota as before in order to take a Snapshot. For example, before, if you wanted to take a snapshot of a 10GB disk, you would have to have 10GB of unused quota available in your account. At present, however, for the same disk, you just need 5GB free.
The problem encountered yesterday on a hosting filer is occuring on another unit since 13:45 (GMT). We had planned a maintenance window to apply yesterdays patch across all storage unites, but due to this urgent situation we will be applying the patch immediately to the affected filer, and then on the remaining ones as quickly as possible.
Once again we apologise for the inconvenience caused.
14:30 GMT (7:30 AM CET): The patch is installed, and the filer is rebooting
14:38 GMT (7:38 AM CET): The filer has rebooted, we are inspecting the affected servers
15:17 GMT (8:17 AM CET): The service has returned to nominal operation and this emergency maintenance has been completed.
In view of the last two incidents, we are going to proceed with an urgent preventative maintenance operation on the platform's other storage hardware. Please do not reboot your servers during the maintenance: after 15 to 20 minutes of I/O loss, your service will return automatically.
Please accept our apologies for any inconvenience this may cause.
We have detected an anomaly on a hosting filer impacting several customer servers. Our teams are currently working to resolve the issue as soon as possible. We will update this notice as more information becomes available.
14:20 (GMT) / 10:20 (EST): We are still looking for the root of the problem before restarting your servers.
15:45 (GMT) / 11:45 (EST): Unfortunately at this stage we have no additional information available to relay. Our entire team is mobilised to identify the cause of the problem and restablish service as soon as possible.
17:00 (GMT) / 12:00 (EST): The attempt to transfer to the backup storage controller did not yield a satisfactory result.
18:30 (GMT) / 13:30 (CET): We have identified two or three potential sources of the problem, and our teams are attempting to apply the appropriate kernel patches. The problem is centered around disk-write operations. The "bug" appears to be known by Sun, but so far, not the solution.
20h30 (GMT) / 15h30 (EST) : Still working on the issue. Some disks now function, but not all of them. Unfortunately we still do not yet have an ETA to communicate, but we know that it will take several more hours. :(
20:50 (GMT) : 15:50 (EST) : A new kernel is being compiled currently and we will reboot the filer after the new kernel is installed. (watch this space...)
23:00 (GMT) : 18:00 (EST) : The new Kernel is compiled and currently tested on a stage filer. Once tested, we will apply it on the broken storage unit.
00:00 (GMT) : 19:00 (EST) : Victory ! Filer seems to be back and running properly. We will restart all servers and monitore them to see if everything is ok. A detailled report will be sent tomorrow to all clients involved. Thank you for your patience.