Highly Available Bandwidth Guarantees on Highly Utilized Cloud WANs


Muhammed Uluyol, Ayush Goel, Harsha V. Madhyastha, Ben Zhang, Jonathan Zolla, Chi-Yao Hong, Sankalp Singh, Kirill Mendelev, Dina Papagiannaki, Amin Vahdat. Preprint


Private wide-area networks (WANs) deployed by large cloud providers enable them to offer predictable bandwidth and latency to their tenants, in contrast to the public Internet. To maximize the WAN’s utilization, instead of statically reserving capacity for every tenant, cloud providers dynamically control what traffic is admitted onto the network. But, as traffic demands change, the availability of promised bandwidth depends on a global controller to react, which is fundamentally slow at scale and can occasionally even be offline.

To ensure that efficiency does not come at the expense of predictability, we present HEYP, our new architecture for private WANs. First, when traffic demands change, HEYP’s global controller needs to intervene only to improve efficiency and not to guarantee the satisfaction of bandwidth guarantees. Second, HEYP’s controllers within each data center admit surplus traffic at a lower quality-of-service (QoS) level in a manner that accounts for how congestion control as well as applications will react to QoS changes. Our simulations using traces from a large global, private WAN suggest that HEYP would offer 99% availability for 10x as many bandwidth guarantees as state-of-the-art WAN designs without sacrificing efficiency.