A Parallel Data Stream Layer for Large Data Transfers in WANs
Abstract
------------------------- NOTE: This is almost the same abstract of a paper submitted to: The 2016 USENIX Annual Technical Conference (USENIX ATC '16), Denver, Colorado, U.S.A., June 22--24, 2016. with co-author P. Lu... [ view full abstract ]
-------------------------
NOTE: This is almost the same abstract of a paper submitted to:
The 2016 USENIX Annual Technical Conference (USENIX ATC '16), Denver, Colorado, U.S.A., June 22--24, 2016.
with co-author P. Lu
-------------------------
Multi-gigabyte (and larger) file transfers and data synchronization are important use-cases on wide-area networks (WANs). But, high bandwidth-delay-product (BDP) networks with even small amounts of packet loss (e.g., due to congestion) can suffer reduced TCP/IP throughput. Consequently, GridFTP is a widely used, parallel TCP-connection file transfer tool, particularly in computational science.
But, what about applications that are not easily framed as passive file transfers? rsync, git, distributed file systems, and other client-server applications have request-response messages interleaved with bulk-data transfer.
We introduce the open-source Parallel Data Streams (PDS) tool (Beta release on GitHub in April 2016; Alpha release available by email), which uses the parallel connections strategy of GridFTP. But like Secure Shell (SSH), PDS can also encapsulate the traffic of tools like rsync and git.
PDS can transfer a 10 GB file on a WAN between Alberta and Quebec (maximum 1 Gbps; over 3,100 km), using rsync and 6 parallel TCP cleartext streams at 833 Mbps which is comparable to the throughput of GridFTP with the same number of streams (i.e. 857 Mbps). Optionally, rsync over PDS achieves a peak of 386 Mbps with 3 parallel SSH streams. In contrast, rsync over a typical single, SSH link achieves 189 Mbps.
Authors
-
Nooshin Eghbal
(University of Alberta)
-
Paul Lu
(University of Alberta)
Topic Areas
Advanced Research Computing (ARC): Research data management: Challenges, opportunities and , Advanced Research Computing (ARC): Innovations in platform / portal tools & software devel , Advanced Research Computing (ARC): ARC applications in any discipline (i.e. the sciences,
Session
HPC3.2.3 » Networks II (09:50 - Wednesday, 22nd June, CCIS 1-140)
Presentation Files
The presenter has not uploaded any presentation files.