Facebook Down – Part 1

Michael R Sheehan

|

November 1, 2021

On Monday, October 4, 2021, for just over six hours, a large minority of people on the planet were without their foremost and primary access to the Internet – Facebook (now Meta), and all of the 78 other companies under its umbrella – which stopped loading new content, essentially going dark, for almost six hours.

This is unbelievably huge.

The Scale And Scope Of This Outage Is Massive.

Consider, Facebook alone has nearly 2.92 billion users! That’s a third of the planet right there. Then consider WhatsApp, (2 billion users) Instagram, (1.3 billion) and Oculus (est. 93.7 million will use it at least once). Facebook released the statement that 3.6 billion people, 45% of the planet’s population, uses at least one of the its major apps. Then there is also the unseen side of Facebook’s empire, companies like Chainspace, LiveRail, and Threadsy, which don’t have customers among the public, but instead provide services to businesses, like facial recognition software, or an emotion detection app. 

For End Users, It Just Looked Like The Internet Was Down.  

For many people, Facebook is the Internet. When FB went down, many people assumed that it was their Internet connection. Some tried troubleshooting their own equipment, and found it to be normal. Many people reset their passwords and configurations, assuming they had been hacked. Others headed into repair stores and calls to technical support asking why their device was no longer functioning.

A huge number of people turned to Twitter as a way to communicate, from Facebook’s own apologies to people mocking Facebook for its very public mis-step.

For the technically savvy individuals, it appeared as a DNS error. DNS (Domain Name System) is essentially the address book of the Internet. It is the reference that translates “www.facebook.com” into the IP address of the location on the Internet (31.13.66.35 as one example) so mobile phones, laptops, workstations and applications were unable to find not only Facebook, but none of Facebook’s companies.

The Internet Forgot There Is A Facebook.

But what really happened at Facebook was a Border Gateway Protocol (BGP) error, which we will explore in more detail in our next post. BGP is the map and directions to locations on Internet, and it lost the exact locations of all of Facebook’s and its companies’ IP addresses.

So even if you had the specific IP address, you could not have gotten to their content anyway. Figuratively and effectively the Internet forgot where Facebook was, therefore could not direct you to it or any other Facebook company.

Bad Process Gets You Every Time.

A big cause of this failure is that Facebook administrates all of its companies’ Internet locations in-house. All of it apps, site locations and changes are operated from a centralized team that controls all of Facebook and its companies across the globe. Leveraging identical resources for all sites at once. Essentially, all of its eggs in the same configuration basket.

For their part, FB issued a fairly generic, non-committal response at first:

Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.

The next day, they then followed up with:

During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally.

So, a giant megacorporation, with more users of its vast collection of servers than most nations was brought down by a line of faulty code and a failure of process which is intended to prevent this very occurrence.

The fact is, this was a much larger error. It was a configuration error, one that was not only easily avoidable, but had been warned about due to a similar example of an outage just a few months earlier. To make matters worse, because all of their operation teams are in-house leveraging shared resources, once the outage occurred, they could not access their own tools to correct the issue! Facebook employees were nearly as helpless as its customers.
Hours later, once they had corrected things, getting all those connections back online with so many users clamoring for site access at the same time caused additional configuration issues and downtime.

Facebook (Meta) Lost Money

As a company, FB lost about five percent in value and $100 million in revenue during that single day. Heavy stock losses, combined with lost revenue, in addition to widespread consumer dissatisfaction set the megacorporation back on its heels with all the force of the marketplace. One could state that this error compounded Facebook’s already existing public opinion challenges therefore accelerating the corporate name change from Facebook to Meta.
Why are we talking about this? This problem can and does happen to many businesses across the Internet. Its the scope and scale of this outage that makes it newsworthy. Don’t let this kind of easily preventable problem make your company lose access to your customers, or vice versa.

Contact ConaLogix, let us help you find the best way to update, manage, configure, and adjust your data systems to ensure not only long life and secure access, but also to prevent this type of error from costing your business money.

Based in Dedham, MA, ConaLogix was founded in 2018 as a fractional CIO and advisory resource for the pharmaceutical, life science, and biotech industries. We provide C-level Information Technology services on a virtual basis, assisting with architecture, integration, and testing. Using a unique, customized approach, the ConaLogix team collaborates with entrepreneurs and scientific core teams to support their vision, while guiding the most efficient development model that benefits data management and core requirements

What Are You Doing With Your Data?

Parents Guide to Online Learning at Home: Blog Image
Parents Guide - Online Learning at Home

Please download your guide now!