Tuesday June 19th 2012 at 9:30 GMT: our mail platform is currently experiencing a slowdown. Our technical team is on location and performing diagnostics to determine the reason for this incident. Affected services are email reception, Gandi's webmail, as well as the customer support ticket processing platform.
Please accept our apologies for the inconvienence. We will keep you informed of developments below as we learn more about the incident.
update 9:50 GMT: Our teams have fixed the problem, and the service is back to normal again.
Update Wednesday June 20 at 14:15 GMT: another machine has experienced the same problem as yesterday. We rebooted it as preventative measure, however it is currently in a filesystem correction. The service on this machine should return to normal in about 45 minutes.
Wednesday June 20th 2012 15:29 GMT: The unit has returned to normal operating conditions.
We are currently experiencing issues with webmail authentication. Our teams are working to resolve the problem as soon as possible. This issue does not impact reception of your emails, but only accessing them.
We apologise for the inconvenience.
Update 29/02/2012: 13:30 GMT: The issue has been stabilised since the end of the afternoon yesterday. We are nevertheless working on the platform to avoid furture recurrence of this issue.
To correctproblems we have identified inourstorage systems,we need to performcorrective maintenance procedurestonightbetween 23:30 and3:30 CET. This will impact some of your server systems, making certain data volumes unavailable for a period of 15 to20 minutes. Werecommend that you donotrestart yourservers duringthis period. All services should return to normal immediately following the termination of the maintenance window. If you will be affected by thismaintenance,you should have receivedamail from us advising you of the issues and timining. Please see this page for the contents of that message. We will updatethis postto keep you informedof our progress.
Edit 23:30 CET: operation started
Edit 00:32 CET: first reboot done, equipment beeing behaving as expected, we proceed further
Edit 01:00 CET: most reboots are now ongoing, our first upgrades having been successful
Edit 01:30 CET: most of our storage units have been upgraded - if your server recovered from I/O stalls, this issue is fixed for you -- otherwise, this upgrade will be finished in the next hour
Edit 01:34 CEST: a compute node crashed during this operation, we are starting the affected virtual servers on another machine right now.
Edit 02h30: maintenance is finished, thank your for your patience during this operation.
A piece of storage equipment is failing,likely due to defective components.Our teamsarecurrently workingto restorethesituationas soon as possible.Werecommend that you donotrestartyour serverif you areimpacted.Wewill keep you informedof our progress ofthis incidentin this article.
[02:28 CET] Recovery sucessful. Components repaired and storage unit now nominal.
We have a temporary emergency halt on the hosting storage system (filers). We recommend that you do NOT attempt to restart your server. The impacted servers should recover in the next few minutes. We will update you with further information as soon as possible.
[edit00:00]The services arefullyrestoredas of 21:20 CET. Most users were back to nominal function before 19:30,butsometook longerto start.Identifiable blocked systemswere managed and restarted manually.Pleaserestartyour servicesif theyare stillunavailable at this time,and contact supportif your serveris not available and cannot be restarted.
A storage unit is currently experiencing a slowdown. Our teams are currently working on a solution.
Update (09:45 GMT): The situation improved between 07:00 and 08:00 GMT. There were significant slowdowns between 05:00 and 06:50 GMT.
Update (January 25th 09:00 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one yesterday. Our technical team is working on solving the issue.
Update (January 25th 10:00 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.
Update (January 25th 10:22 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the one this morning. Our technical team is working on solving the issue.
Update (January 26th 11:26 GMT): The I/O situation improved. Our technical team is still working to find a complete fix to the issue.
Update (January 27th 19:11 GMT): A storage equipment is currently experiencing slowdown. The incident is similar to the incident of the week. Our technical team is working on solving the issue.
Update (January 27th 22:00 GMT): The I/O situation is now stabilized. Our technical team is still working to find a complete fix to the issue.
Update (February 2nd 03h30 GMT): Another incident affects one of our storage units. We're now rebooting the faulty equipment. We recently found a few corrective actions that we'll soon be able to take in order to solve this kind of issues.
Update (February 2nd 20:19 GMT): Another incident has occurred, and slowdown was noticed, however the situation is stable right now.
Update (February 6th 02:09 GMT): Slowdown on one of our storage units. Teams working on it.
Two storage units are concerned by these incidents, which are isolated slowdowns in read/write operations. We suspect that the problem is two-fold: a software problem (blocking of operations), and a hardware problem (some disk models are unusually slow).
When these slowdowns occur, the implementation of iSCSI that lets us connect your servers to their disks may be dysfunctional. The result is an "I/O wait" that is artificially high (100%) even if the storage is once again rapid.
We are currently working on these three problems by giving priority to the capacity of our system to re-establish service after a slowdown.