On Tuesday, 7 October, we experienced a series of serious incidents affecting some of the storage units in our Parisian datacenter. These incidents caused two interruptions in service for some of our customers, affecting both Simple Hosting instances and IaaS servers.

The combined effect of these interruptions represents the most serious hosting outage we've had in three years.

First and foremost, we want to apologize. We understand how disruptive this was for many of you, and we want to make it right.

In accordance with our Service Level Agreement, we will be issuing compensation to those whose services were unavailable.

Here's what happened:

On Tuesday, October 7, shortly before 8:00 p.m. Paris time (11:00 a.m. PDT), a storage unit in our Parisian datacenter housing a part of the disks of our IaaS servers and Simple Hosting instances became unresponsive.

At 8:00 p.m., after ruling out the most likely causes, we made the decision to switch to the backup equipment.

At 9:00 p.m., after one hour of importing data, the operation was interrupted, leading to a lengthy investigation that resulted in eventually falling back to the original storage unit. Our team, having determined the culprit to be the caching equipment, proceeded to change the disk of the write journal.

At 2:00 a.m., the storage unit whose disk had been replaced was rebooted.

Between 3:00 and 5:30 a.m., the recovery from a 6-hour outage caused a heavy overload, both on the network level and on the storage unit itself. The storage unit became unresponsive, and we were forced to restart the VMs in waves.

At 8:30 a.m., all the VMs and instances were once again functional, with a few exceptions which were handled manually.

We inspected our other storage units that were using the same model of disk, replacing one of them as a precaution.

At 12:30 p.m., we began investigating some slight misbehavior exhibited by the storage unit whose drive we had replaced as a precaution.

At 3:50 p.m., three virtual disks and a dozen VMs became unresponsive. We investigated and identified the cause, and proceeded to update the storage unit while our engineers worked on the fix.

Unfortunately, this update caused an unexpected automatic reboot, causing another interruption for the other Simple Hosting instances and IaaS servers on that storage unit.

By 4:15 p.m., all Simple Hosting instances were functional again, but there were problems remounting IaaS disks. By 5:30 p.m., 80% of the disks were accessible again, with the rest following by 5:45 p.m.

This latter incident lasted about two hours (4:00 to 6:00 p.m.). During this time, all hosting operations (creating, starting, or stopping servers) were queued.

Due to the large number of queued operations, it took until 7:30 p.m. for all of them to complete.

These incidents have seriously impacted the quality of our service, and for this we are truly sorry. We have already begun taking steps to minimize the consequences of such incidents in the future, and are working on tools to more accurately predict the risk of such hardware failures.

We are also working on a customer-facing tool for incident tracking which will be announced in the coming days. 

Thank you for using Gandi, and please accept our sincere apologies. If you have any questions, please do not hesitate to contact us.

The Gandi team


Following an incident on a storage unit, it has been necessary to reboot it in order to complete an update necessary for fixing the problem.
All operations will be paused until the unit is running normally again.

In the meantime, please do NOT launch any operation on your server(s). The situation will return to normal shortly.

20:00 CEST, 11:00 Pacific: Incident officially resolved, all operations back to normal. 


The west coast always comes first, which is why Los Angeles has had .la for a very long time. But now the East Coast is catching up, with .nyc!

There are, of course, a few obvious businesses that should definitely focus on getting a .nyc. Bagel makers, pizza joints, and the odd deli now have a unique opportunity to remind everybody who has the best stuff in the Big Apple.

.NYC is reserved to individuals, organizations and businesses having an address in one of the five boroughs of New York City (Manhattan, Brooklyn, Queens, the Bronx and Staten Island), and it's entering GoLive today, Wednesday 8 October, 2014.

.nyc will cost $38.34 for a one-year registration at A rates. 

 

Register a .nyc?

.nyc

An incident has occurred on one of our storage units in the Parisian datacenter. Our technical team is working to resolve the issue as quickly as possible.

Please do not perform any operations on your virtual machines in the meantime. Services should be restored automatically once the issue has been corrected.

We will update this post as new information arises.

Update Tue Oct 7 19:28:19 UTC: Some faulty hardware has been identified; we're in the process of swapping it out.

Update Tue Oct 7 22:33:14 UTC: Our technical team is still trying to fix the issue.

Update Tue Oct  7 23:35:44 UTC: A ZIL disk has failed, and its failover also failed. We're currently performing a manual switchover, and are proceeding very carefully to minimize the risk of data loss.

Most importantly: we understand how disruptive this is for you and we're working as hard as we can to fix it. We will do our best to make it right.

Update Wed Oct 8 00:39:21 UTC: Our technical team is bringing the storage unit back up. The incident is nearly resolved and services are already beginning to come back online.

Update 02:31:10 UTC: We're now seeing high loads on the problematic filer. The investigation continues!

Update 04:05:54 UTC: After working all night, our technical team in Paris has resolved the problem. Services should now be back to normal.

A postmortem and compensation details, as described in our IaaS Hosting Contract (section 2.2) will be provided in the days to come.

Update Thu Oct  9 17:31:34 UTC: A postmortem about this incident is available here.



We will be proceeding with some maintenance on a Gandi Mail storage unit.

The window for this maintenance will be from Tuesday October 7th 2014 from 11:30 PM to midnight CET (Paris time).

There will be several thousand Gandimail mailboxes that will be inaccessible for several minutes during that time.

No mails will be lost during this time, they will be held awaiting delivery.

[EDIT] The maintenance is postponed to the 8th of October 2014 from 11:30 PM to midnight CET (Paris time).



We will reboot a storage unit on the Paris/FR datacenter tonight.

The maintenance window will start 3 October at midnight and end at 1am CEST (3-4pm PDT, 22:00-23:00 UTC) Update: the maintenance window has been extended by 30 minutes and is expected to end at 1:30am CEST (4:30pm PDT, 23:30 UTC).

You will not need to reboot your server (IaaS) or instance (PaaS) during this maintenance.

Sorry for the inconvenience.

 

Update : end of the maintenance at 2AM CEST, sorry for the delay.


It's not Holi yet, but we're having our own festival of colors in the month of October.

All month long, you can register domains under any of the following TLDs for half the normal price! \o/

The list is below, accompanied by the corresponding Golive prices of a one-year registration, all at A rates:

 

Register a domain under one of these TLDs?:

.tld



Page   1 2 354 55 56
Change the news ticker size