Service Degradation Summary 2020-05-18 and What YOU Can Do to Avoid Problems!
Summary
Today at around 21:50 UTC, Quad9 IPv4 prefixes started to be re-announced by AS10431 and AS6939, causing interruptions of service for clients who are clients or prefer those networks for transit. At 03:24 UTC, the issue was resolved. There were no IPv6 outages due to this issue, so anyone using those address ranges should have seen no change in behavior.
Details
Edge Networks (AS10431) peers with PCH (AS42) in Portland, Oregon, meaning that they directly interconnect with the network that provides Quad9 (AS19281) with transit and co-location services in this location. For reasons yet unknown, AS10431 started re-announcing Quad9 prefixes to AS6939 (Hurricane Electric), a transit provider for AS10431. AS6939, in turn, re-announced those prefixes to customers and/or peers, causing Quad9 reachability outages for some or all of those networks. The AS path was longer (+1) on this selection than most other ways to reach Quad9 networks, so any route decision using BGP AS-PATH length would be mostly insulated from the issue. However, customers of Hurricane Electric were most likely to prefer this path and experience the resulting service issues.
The affected networks carry only a small fraction of our overall traffic; therefore, the outage event did not trigger automated alerts. Customer reports via Twitter and our support email started coming in and our NOC started working toward a resolution.
Resolution
Announcements to AS10431 were turned off at 03:24 UTC, and the incorrect path was withdrawn. Conversations with administrators at AS10431 and AS6939 have indicated that there are now appropriate filters in place to prevent route leakage upstream. We will reactivate the peer with them during a maintenance window later today or tomorrow and supervise to ensure they are not leaking Quad9 prefixes.
As a Quad9 user, how can you protect yourself against issues like this?
Quad9 has two IPv4 prefixes and one IPv6 prefix on which we deliver services: 9.9.9.0/24 and 149.112.112.0/24, and 2620:fe::0/48.
If you only have IPv4, then make sure you have both 9.9.9.9 and 149.112.112.112 in your configurations.
Using all of the relevant addresses in your DNS configuration is recommended. If you have IPv6, then the IPv6 address should be included in your resolver list (typically 9.9.9.9, then 149.112.112.112, then 2620:fe::fe). If you are using Windows or Mac and have IPv6 available in your configuration you should configure 2620:fe::fe, 2620:fe::9.
It typically doesn’t hurt to put in the IPv6 addresses into your settings, even if you don’t yet have IPv6 – most client software will skip addresses they can’t use.
Next Steps
Our IPv4 prefixes are (for the most part) currently routed identically. We are making changes in the next several months to route them to separate server infrastructures. Under these circumstances, the partial routing fault conditions such as the one experienced this morning, half of the IPv4 prefixes would be unaffected, as well as IPv6 having a lower chance of experiencing problems. Having all of the possible Quad9 addresses in your configuration ahead of time will mean you have a good chance of not seeing any of the adverse effects of these types of problems if they should occur in the future.