We have lost contact with our infrastructure at Telehouse1 (Jeuneurs) in Paris as of 03:11 CET (02:11 GMT). Our technicians are en-route to investigate the problem.
Affected Gandi services are:
1/3 of the DNS systems, whois for domain registry, and mail redirection service.
We will update here as we obtain more details.
UPDATE: 04:01 CET (03:01 GMT) - our core router at Telehouse1 had suffered a system crash resulting in the outage of this site. Our engineer successfully restarted the router and we have obtained crash dump information from the router for analysis. All services are once again operational.
We will be carrying out network maintenance during the night of 15-16 January 2011. The purpose of this work is part of a multiphase plan to remove the legacy nework topology and migrate to a more stable, scalable, and efficent hierarchical model in Paris.
In this phase, the activity will only involve the interconnections beween the core and aggregation network elements at our datacenter in St. Denis.
This activity will have several minor impacts on connectivity for various Gandi services in Paris throughout the maintenance window, each up to five minutes as the migrations are performed in the various sections of the network, but no significant outages are expected.
We have scheduled this maintenance window from 02:00 CET (01:00 GMT) to 08:30 CET (07:30 GMT) on 16 January during the period of lowest impact to customers.
We will schedule follow-on maintenance activity over the coming weeks for the rest of the network migrations, to include activities at Telehouse as well as a number of services in particular, and we will of course endeavour to keep any disruption to a minimum.
Today (January 4th 2011), one of our routers went offline. This led to the partial and temporary loss of our network, impacting some of our services such as our website, SiteMaker, GandiBlogs, some email accounts, and all operations towards servers. Domain names did not encounter any unavailability, though some network paths to certain servers were unavailable.
The incident is currently being resolved, and services will progressively return to normal.
Please accept our apologies for the inconvenience.
UPDATE: Here is the technical explanation for yesterday's network incident:
Part of the Gandi France network is based on legacy topologies built over the past ten years, including multi-site spans for various VLANs and in some cases a relatively flat architecture. This part of the architecture relies, perhaps unwisely, on spanning-tree protocol to ensure a loop-free layer-2 topology in a bridged or switched network. Whilst we have have been performing various engineering works over the past 18 months to simplify the architecture, it takes a considerable amount of time to completely unbuild what has been built piece by piece over a period of ten years without significant outages of the Gandi services.
The incident yesterday was exacerbated by the legacy elements of the Gandi France network infrastructure and was caused by a fault in a downstream access switch cluster which created a layer-2 loop in the architecture. This in turn caused an unfortunate situation whereby the layer-2 topology of the legacy network was being constantly recalculated resulting in the spanning-tree protocol failing to converge, consuming 100% resources on the affected switches and thus preventing traffic flow. The offending switch cluster was isolated from the network, but we were also required to reload another switch in another datacentre to stop the "snowball" effect caused by the fault.
We have already scheduled for this quarter significant network engineering activities to finally unpick the remainder of the legacy topology and migrate to a fully hierarchical model limiting the layer-2 domains to locally contained subnets, and minimising the reliance upon such protocols as spanning-tree which was never designed to be used in such large scale designs in the first place. We will be communicating the dates and times of the maintenance windows over the coming weeks.
We apologise again for any inconvenience caused during this network incident yesterday.
The registry in charge of .pro domains has decided to lower their rates for 2011. We are going to do the same, therefore, and lower our rates by $6 (£2.5 or €3) excl. VAT in all price rates for creations, renewals, and transfers of .PRO domains.
We are currently experiencing an abnormally increased load on the incoming mail spools on the GandiMail service. As a result, new mail deliveries may be slower than usual. Our teams are investigating the source of this increased load and we will keep you updated as we have more information.
Update: 14:00 : The slow spool performance is related to an increased load on the antispam/antivirus filtering on the mail spools. Our teams are actively working to resolve the issue as soon as possible. Inbound mail is still being delivered, but of course at a slower rate.
Update: 17:50: Our teams have isolated the issue and have tweaked the processes on the antispam filters to further optimise performance. All mail in the spools have been delivered to the recipient mailboxes and the system is now running nominally.
We apologise for any inconvenience caused by the slower than normal delivery of mails today.