Understanding the internet AS topology and its applications

Thumbnail Image
Date
2020-01-01
Authors
Kabala, Jinu
Major Professor
Advisor
Lu Ruan
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
Computer Science
Abstract

Autonomous Systems (AS) in the Internet use BGP to perform inter-domain routing. A set of import and export policies at an AS make up the routing table of an AS. Since AS relationships are not publicly available, several studies have proposed heuristic algorithms for inferring AS relationships using publicly available BGP data. Content Delivery Network (CDN) servers placed around the world cater to the needs of clients that access their content. Since, the majority of the Internet traffic today is content delivery traffic, it is important to study the efficiency of the routing paths from users to content servers which are not under the control of content providers. Netflix and Akamai are two major CDN providers. The user experience depends upon the performance of CDN servers. Hence, it is important for CDNs to choose the ideal server when a user requests content from its network. Due to lack of authentication of routes in BGP, prefixes are prone to being hijacked by ASes to which the prefixes do not belong. The mechanisms used to address this is to detect the hijack after it has happened and react to it. A more preventive mechanism is necessary to prevent it from happening in the first place. A recent work proposed a list of serial hijackers that would enable such a solution. Unfortunately, the ground truth of serial hijackers is very small.

We try to understand the Internet AS Topology using BGP routing data received from neighbors of an AS. We present a machine learning approach to edge type inference in AS graphs. We use our method to train classifiers for three AS graphs derived from different data sources--a BGP graph, a trace-route graph, and an IRR graph. The classifier annotated the edges into p2c and p2p edge types. We merge the three individual graphs to obtain a combined graph and propose a method to compute edge types in the combined graph. We analyze the characteristics of the three individual graphs and the combined graph and show that combining the three individual graphs gives us a significantly more complete view of both the p2p and p2c ecosystems in the Internet. We also present a method to compute the customer cones of peering networks using PCH data.

We conduct a case study of Netflix to understand the efficiency of the AS paths from various access ISPs to Netflix servers deployed at IXPs in different regions of the world. We discover inefficient AS paths in Europe, North America, and South America. Paths in South America are especially inefficient as many of them leave the continent. We also analyze long paths in each region, explore their causes, and propose ways to avoid long paths. We analyze the performance variation in accessing content on Akamai servers at different times of the day at different client and server locations by using active measurements. We measure the latency of paths from residential ISP users using RIPE Atlas probes, along with throughput and packet loss from a non-residential user using httping measurements. Based on our observations, we propose a server selection strategy that picks a low latency server or maximum throughput server based on predicted value of latency or throughput that matches the actual value with an accuracy of over 98%. The optimum server based on the measured throughput or latency is not always the geographically closest server. We also observe that the Akamai server choice does not always pick the minimum latency nor maximum throughput server, and we outline ways to improve their strategy.

We try to make the process of gathering the serial hijacker ground truth easier than manually going through the available mailing list by using a document classifier that can classify sources of interest from which the serial hijacker information can be derived from. The resulting classifier can identify the document sources of such BGP hijacking information with 89% accuracy. We further examine how to create an end-to-end tool to extract serial BGP AS Hijackers by using a BGP Hijacker detector that has an accuracy of approximately 87%.

Comments
Description
Keywords
Citation
Source
Copyright
Sat Aug 01 00:00:00 UTC 2020