One of our hosting filer units has stopped responding, likely due to a hardware fault. We recommend that you do not attempt to restart your server if it is not responding, and to wait until the incident is resolved. Our teams are on site to investigate and we will keep you informed of developments.
Update: 11:38 CET (10:38 GMT): The issue involves a control head on our old storage system. A fault on the disk controller has resulted in an interruption of service. We are currently recovering the storage volume.
Update 12:12 CET (11:12 GMT): We have corrected a kernel bug which in the event of a hard fault on the controller, will enable use to resolve the situation rapidly. We will be restarting the service shortly.
Update 13:01 CET (12:01 GMT): We have restarted the service on this filer. We are monitoring the controller for now, and will apply a patch during the afternoon which will incur a brief outage of this filer which is not expected to last more than a minute.
We have experienced a partial power failure at Equinix PA2 at 10:07 CEST this morning. The power failure lasted a few seconds, but had a knock-on effect on some equipment, notably some older generation equipment with single power supply causing them to reboot, and lose network connectivity with some backend services.
Our teams are working to restore all affected services.
Due to large DDoS attacks against hosts within the Gandi Network, network connectivity is suboptimal. We are seeing peak traffic in excess of 30Gbps as part of this attack. We are attempting to mitigate the effects of this attack, but some of these actions will result in sub-optimal routing between the Gandi network and some of our customers as long as the attacks persist.
The problem encountered yesterday on a hosting filer is occuring on another unit since 13:45 (GMT). We had planned a maintenance window to apply yesterdays patch across all storage unites, but due to this urgent situation we will be applying the patch immediately to the affected filer, and then on the remaining ones as quickly as possible.
Once again we apologise for the inconvenience caused.
14:30 GMT (7:30 AM CET): The patch is installed, and the filer is rebooting
14:38 GMT (7:38 AM CET): The filer has rebooted, we are inspecting the affected servers
15:17 GMT (8:17 AM CET): The service has returned to nominal operation and this emergency maintenance has been completed.
In view of the last two incidents, we are going to proceed with an urgent preventative maintenance operation on the platform's other storage hardware. Please do not reboot your servers during the maintenance: after 15 to 20 minutes of I/O loss, your service will return automatically.
Please accept our apologies for any inconvenience this may cause.
We have detected an anomaly on a hosting filer impacting several customer servers. Our teams are currently working to resolve the issue as soon as possible. We will update this notice as more information becomes available.
14:20 (GMT) / 10:20 (EST): We are still looking for the root of the problem before restarting your servers.
15:45 (GMT) / 11:45 (EST): Unfortunately at this stage we have no additional information available to relay. Our entire team is mobilised to identify the cause of the problem and restablish service as soon as possible.
17:00 (GMT) / 12:00 (EST): The attempt to transfer to the backup storage controller did not yield a satisfactory result.
18:30 (GMT) / 13:30 (CET): We have identified two or three potential sources of the problem, and our teams are attempting to apply the appropriate kernel patches. The problem is centered around disk-write operations. The "bug" appears to be known by Sun, but so far, not the solution.
20h30 (GMT) / 15h30 (EST) : Still working on the issue. Some disks now function, but not all of them. Unfortunately we still do not yet have an ETA to communicate, but we know that it will take several more hours. :(
20:50 (GMT) : 15:50 (EST) : A new kernel is being compiled currently and we will reboot the filer after the new kernel is installed. (watch this space...)
23:00 (GMT) : 18:00 (EST) : The new Kernel is compiled and currently tested on a stage filer. Once tested, we will apply it on the broken storage unit.
00:00 (GMT) : 19:00 (EST) : Victory ! Filer seems to be back and running properly. We will restart all servers and monitore them to see if everything is ok. A detailled report will be sent tomorrow to all clients involved. Thank you for your patience.
The Gandi Mail service is currently under heavy load due to a multitude of botnets in Asia (principally India, Iran, and Vietnam) and Eastern Europe. As a result the number of connections to the Gandi Mail service is multiplied ten-fold resulting in disruptions of service. We currently do not have an efficient technical solution to deal with this kind of traffic anomaly. We are investigating possible solutions and will be deploying them in the coming days.
We apologise for the inconvenience caused.
Update 3 October 2011, 16:30 CET: Although the heavy load has subsided for the time being, we are leaving the service status indicator in orange for the time being while we monitor and attempt to implement a reliable solution over the coming days.
Due to the effects of a a number of large spam botnets, customers will have experienced delays in mail delivery. The delays are due primarily to the farm of mail filter servers, which are in place to weed-out the spam, virii and malware attached to emails, becoming heavily loaded dealing with the mails received by these botnets. Specifically these botnets appear (at first analysis) to be related to a large number of trojans infecting mobile devices around the world.
Our teams have been working to, initially, add additional mail filter servers and incoming spools to keep with the increased load caused by these botnets. Over the coming hours and days we will be working to implement more robust solutions to combat this kind of anomalous traffic.
We apologise for the inconvenience that these delays may have caused.