FCC report on T-Mobile nationwide outage is a case study in network complexity and best practices (or lack thereof)

11 November 2020 by Steve Blum
, , , ,

Tmobile billboard 2 las vegas 6jan2020

The installation, and incomplete configuration, of a new router and a fiber link failure, both in the southeast U.S., combined with software and hardware bugs to take down T-Mobile’s national phone network in June, according to a report published in October by the Federal Communications Commission. The cascade of problems that began with a fiber route going down led to a “registration storm” in Atlanta as “mobile devices repeatedly attempted and failed to register” on the network, first using 4G, 3G and 2G mobile systems, and finally trying to complete calls via WiFi connections.

The storm spread “out of the Atlanta market and across the country”, disrupting phone service within T-Mobile’s network, and traffic between T-Mobile and other carriers. Millions of calls went nowhere…

Based on confidential call success and 3G and 2G call failure data shared by T-Mobile, together with data on 911 calls and calls originating outside of T-Mobile’s network, [FCC staff] estimates that at least 41% of all calls that attempted to use T-Mobile’s network during the outage did not complete successfully. This estimate does not include any possible call failures arising from T-Mobile subscribers’ VoLTE or Voice over Wi-Fi call attempts, which could not be determined. However, [staff] expects that if this number could be determined, it would result in [staff’s] estimate being much larger.

The impact on 911 calls wasn’t as severe as on general voice traffic, because emergency calls bypass the registration process. The system is designed to let people call 911 whether they have an active account or not. But the impact was still significant. T-Mobile said that 24,000 calls to 911 centers did not go through, and the FCC’s report said that people could not get the help they needed…

Based on the record, the June 15 outage on T-Mobile’s networks prevented some consumers from summoning the help that they needed during emergencies. Not only were some consumers unable to reach PSAPs by dialing 911, but they also were unable to reach roadside-service providers, medical professionals, and family…Fortunately, the Bureau did not receive any comments suggesting that individuals experienced physical harm as a direct result of this outage.

Usefully, the FCC’s report is not an indictment. It is a very readable case study with lessons learned that apply to all mobile carriers and fiber transport providers, and recommendations for preventing a reoccurrence. And a promise of corrective action ahead.