Tuesday, October 5, 2021

Back after mammoth 6-hour outage, Facebook blames changes it made to its routers

After an almost unprecedented six-hour global outage, Facebook restored its services and those of WhatsApp and Instagram on Monday and blamed the fiasco on configuration changes it made to the routers that coordinate network traffic between its data centers.

“This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,” Facebook vice president of infrastructure Santosh Janardhan said in a post.

The technical chaos reportedly extended to Facebook’s own employees’ email and work passes, complicating its efforts to fix the problem. New York Times technology reporter Sheera Frenkel told the BBC that “the people trying to figure out what this problem was couldn’t even physically get into the building” at its California campus to work out what had gone wrong.

Facebook made clear that the shutdown, which meant billions of users worldwide could not access its services, was caused internally, rather than by a cyberattack or other outside forces. It did not immediately disclose details of how the outage was fixed, but the Guardian noted “multiple reports” that it sent technical staff to manually reset the servers in California where the problem originated.

The US internet infrastructure and security firm Cloudfare explained that, in an internal change, Facebook had essentially told the internet that the routes to its platforms no longer existed.

“Externally, we saw the BGP [Border Gateway Protocol] and DNS [Domain Name System] problems … but the problem actually began with a configuration change that affected the entire internal backbone. That cascaded into Facebook and [its] other properties [such as Instagram and WhatsApp] disappearing and staff internal to Facebook having difficulty getting service going again.” READ MORE