Page 182 - 《软件学报》2021年第7期
P. 182
2100 Journal of Software 软件学报 Vol.32, No.7, July 2021
[20] Hargrove PH, Duell JC. Berkeley laboratory checkpoint/restart (BLCR) for Linux clusters. Journal of Physics (Conf. Series), 2006,
46(1):494–9. [doi: 10.1088/1742-6596/46/1/067]
[21] Plank JS, Kai L. ICKP: A consistent checkpointer for multicomputers. IEEE Parallel & Distributed Technology: Systems &
Applications, 1994,2(2):62–67. [doi: 10.1109/88.311574]
[22] Plank JS, Kai L, Puening MA. Diskless checkpointing. IEEE Trans. on Parallel and Distributed Systems, 1998,9(10):972–986. [doi:
10.1109/71.730527]
[23] Sankaran S, Squyres JM, Barrett B, Sahay V, Lumsdaine A, Duell J, Hargrove P, Roman E. The LAM/MPI checkpoint/restart
framework: System-initiated checkpointing. The Int’l Journal of High Performance Computing Applications, 2005,19(4):479–493.
[doi: 10.1177/1094342005056139]
[24] Zheng G, Ni X, Kalé LV. A scalable double in-memory checkpoint and restart scheme towards exascale. In: Proc. of the IEEE/IFIP
Int’l Conf. on Dependable Systems and Networks Workshops (DSN 2012). Boston, 2012. 1–6. [doi: 10.1109/DSNW.2012.
6264677]
[25] Heidari S, Simmhan Y, Calheiros RN, Buyya R. Scalable graph processing frameworks: A taxonomy and open challenges. ACM
Computing Surveys (CSUR), 2018,51(3):1–53.
[26] Mccune RR, Weninger T, Madey G. Thinking like a Vertex: A survey of vertex-centric frameworks for large-scale distributed
graph processing. ACM Computing Surveys (CSUR), 2015,48(2):1–39.
[27] Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008,51(1):
107–113.
[28] Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: A
fault-tolerant abstraction for in-memory cluster computing. In: Proc. of the Presented as Part of the 9th USENIX Symp. on
Networked Systems Design and Implementation (NSDI 12). 2012. 15–28.
[29] Stutz P, Bernstein A, Cohen W. Signal/collect: Graph algorithms for the (semantic) Web. In: Patel-Schneider PF, et al. eds. Proc. of
the Int’l Semantic Web Conf. (ISWC). Berlin: Springer-Verlag, 2010. 764–780.
[30] Bronevetsky G, Marques D, Pingali K, Stodghill P. Automated application-level checkpointing of MPI programs. In: Proc. of the
9th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming. New York: Association for Computing Machinery,
2003. 84–94. [doi :10.1145/781498.781513]
[31] Beguelin A, Seligman E, Stephan P. Application level fault tolerance in heterogeneous networks of workstations. Journal of
Parallel and Distributed Computing, 1997,43(2):147–155.
[32] Dathathri R, Gill G, Hoang L, Pingali K. Phoenix: A substrate for resilient distributed graph analytics. In: Proc. of the 24th Int’l
Conf. on Architectural Support for Programming Languages and Operating Systems. New York: Association for Computing
Machinery, 2019. 615–630. [doi: 10.1145/3297858.3304056]
[33] Hoang L, Pontecorvi M, Dathathri R, Gill G, You B, Pingali K, Ramachandran V. A round-efficient distributed betweenness
centrality algorithm. In: Proc. of the 24th Symp. on Principles and Practice of Parallel Programming (PPoPP 2019). New York:
Association for Computing Machinery, 2019. 272–286.
[34] Iyer AP, Liu Z, Jin X, Venkataraman S, Braverman V, Stoica I. ASAP: Fast, approximate graph pattern mining at scale. In: Proc. of
the 13th USENIX Conf. on Operating Systems Design and Implementation. Carlsbad: USENIX Association, 2018. 745–761.
[35] Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative
computation. IEEE Trans. on Parallel and Distributed Systems, 2014,25(8):2091–2100. [doi: 10.1109/TPDS.2013.235]
[36] Wang Z, Gao L, Gu Y, Bao Y, Yu G. A fault-tolerant framework for asynchronous iterative computations in cloud environments. In:
Proc. of the 7th ACM Symp. on Cloud Computing. New York: Association for Computing Machinery, 2016. 71–83.
[37] Wang Z, Gao L, Gu Y, Bao Y, Yu G. A fault-tolerant framework for asynchronous iterative computations in cloud environments.
IEEE Trans. on Parallel and Distributed Systems, 2018,29(8):1678–1692.
[38] Avizienis A, Laprie JC, Randell B, Landwehr C. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans.
on Dependable and Secure Computing, 2004,1(1):11–33.
[39] Poola D, Salehi MA, Ramamohanarao K, Buyya R. Chapter 15—A Taxonomy and Survey of Fault-tolerant Workflow Management
Systems in Cloud and Distributed Computing Environments. Elsevier Inc., 2017. 285–320.