Large Data Transfers on Shared Wide-Area Networks
Hamidreza Anvari
University of Alberta
Hamidreza is a PhD candidate at Computing Science Department, University of Alberta. His research interests include Software Systems, Internet Applications, High-performance networking. His current research concerns Traffic management in shared Wide-area Networks.
Abstract
One part of large-scale data processing is the problem of transferring the data across wide-area networks(WANs). Often, the data must be gathered (e.g., from remote sites), processed, possibly transferred (e.g., for further... [ view full abstract ]
One part of large-scale data processing is the problem of transferring the data across wide-area networks(WANs). Often, the data must be gathered (e.g., from remote sites), processed, possibly transferred (e.g., for further processing), and then possibly disseminated. If the data-transfer stages are bottlenecks, the overall data processing pipeline will be affected.
Although a variety of tools and protocols have been developed for large data transfers on WANs, most of the related work has been in the context of dedicated or non-shared networks. However, in practice, most networks are likely to be shared. This misalignment may result in unexpected behaviour of the tools over networks where bandwidth is shared between a number of users.
We consider and evaluate the problem of large data transfers on shared networks and large round-trip-times(RTT) as are found on many WANs. We investigated the behaviour of two well-known high-performance data transfer tools, GridFTP and UDT, over shared networks. We conducted a series of test scenarios to study the performance changes of these tools in the presence of a variety of synthetic background network traffic (e.g., uniform, square-waveform, bursty). We also studied their effect and fairness to other network traffic. Our results suggest that the performance of these tools are degraded across shared networks, and may even experience significant degradation in the presence of bursty background traffic.
In conclusion, when transferring data over the network, one should consider the network, its mode of utilization, and the type of background traffic, when choosing the right data transfer tool.
Authors
-
Hamidreza Anvari
(University of Alberta)
-
Paul Lu
(University of Alberta)
Topic Areas
Advanced Research Computing (ARC): Research data management: Challenges, opportunities and , Advanced Research Computing (ARC): Other
Session
HPC3.2.3 » Networks II (09:50 - Wednesday, 22nd June, CCIS 1-140)
Presentation Files
The presenter has not uploaded any presentation files.