Page 295 - 《软件学报》2021年第8期
P. 295

蒲勇霖  等:Storm 平台下的线程重分配与数据迁移节能策略                                                 2577


                 为 300s.与 Storm 默认调度策略相比,相同数据量下执行 ERDM 集群拓扑中数据的传输与处理时间提高了 28%.
                 因此可以确定:在相同条件下,集群数据传输与处理的时间每提高 1%,则集群的能耗降低 1%.综上所述,相比于
                 Storm 默认的调度策略,本文提出的 ERDM 具有更好的节能效果.

                 5    总结与展望

                    高能耗问题,是限制大数据流式处理平台发展的主要障碍之一.Storm 是大数据流式处理中最具代表性的
                 平台之一,但是在最初的设计中并未考虑能耗问题,从而导致目前高能耗问题始终制约其发展.针对这一问题,
                 本文通过研究 Storm 集群的拓扑结构,建立了资源约束模型与最优线程重分配模型,并进一步提出了 Storm 平台
                 下的线程重分配与数据迁移节能策略.该策略由资源约束算法与数据迁移算法组成,使集群在减少节点间通信
                 成本的前提下,缩短了数据传输与处理的时间,并节约了能耗.最后,实验通过 4 组基准测试,从资源占用、性能与
                 能耗的角度验证了策略的有效性.
                    下一步的研究工作主要包括以下 4 个方面:(1)  将 ERDM 进一步部署到更为复杂的商业应用领域,使其可
                 以在更广阔的应用场景下使用;(2)  将布隆过滤器运用到集群,通过对集群拓扑内的数据进行预处理,删除数据
                 集内的重复数据,使集群单位时间内处理及传输的数据量减少,从而降低了集群延迟,并节约了能耗;(3)  目前,
                 Storm 集群内部的电子元件限制了性能与能效的发展,可通过替换高能效的电子元件,以提高集群的性能,并节
                 约能耗;(4)  目前,集群拓扑内的进程与线程的数量需要用户手动设置,研究拓扑内组件并行度自适应调节的调
                 度算法,由此提高了资源利用率,并节约了能耗.
                 References:
                 [1]    Sun DW, Zhang GY, Zheng WM. Big data stream computing: Technologies and instances. Ruan Jian Xue Bao/Journal of Software,
                     2014,25(4):839−862 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4558.html [doi: 10.13328/j.cnki.jos.0045
                     58]
                 [2]    Guo BL, Yu J, Liao B, Yang DX, Lu L. A green framework for DBMS based on energy-aware query optimization and energy-
                     efficient query processing. Journal of Network and Computer Applications, 2017,84:118−130. [doi: 10.1016/j.jnca.2017.02.015]
                 [3]    Guo  BL, Yu J, Yang DX,  et al.  Energy  modeling  and  plan  evaluation for queries in relational databases. Journal of  Computer
                     Research and Development, 2019,56(4):810−824 (in Chinese with English abstract).
                 [4]    Zhao XG, Hu QP, Ding L, et al. Energy saving scheduling strategy based on model prediction control for data centers. Ruan Jian
                     Xue Bao/Journal of Software, 2017,28(2):429−442 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5026. html
                     [doi: 10.13328/j.cnki.jos.005026]
                 [5]    Yang T,  Wang M, Zhang YJ,  et al. HDFS differential storage  energy-saving optimal  algorithm in  cloud data  center.  Chinese
                     Journal of Computers, 2019,42(4):47−61 (in Chinese with English abstract).
                 [6]    Seagate. Data age 2025. 2017. https://www.seagate.com/files/www-content/our-story/trends/files/data-age-2025-white-paper-simpli
                     fied-chinese.pdf
                 [7]    Sun DW, Zhang GY, Yang SL, Zheng WM, Khan SU, Li KQ. Re-Stream: Real-time and energy-efficient resource scheduling in big
                     data stream computing environments. Information Sciences, 2015,319:92−112. [doi: 10.1016/j.ins.2015.03.027]
                 [8]    Cheng D, Chen Y, Zhou X, Gmach D, Milojicic D. Adaptive scheduling of parallel jobs in spark streaming. In: Proc. of the IEEE
                     Conf. on Computer Communications. Piscataway: IEEE, 2017. 1−9. [doi: 10.1109/INFOCOM.2017.8057206]
                 [9]    Apache. Storm. 2017. http://storm.apache.org
                [10]    Li ZY, Yu J, Bian C, et al. Flow-network based auto rescale strategy for Flink. Journal on Communications, 2019,40(8):85−101 (in
                     Chinese with English abstract).
                [11]    Ying CT,  Yu J, Bian C,  et al. Criticality checkpoint management  strategy  based  on RDD characteristics  in Spark.  Journal of
                     Computer Research and Development, 2017,54(12):2858−2872 (in Chinese with English abstract).
                [12]    Kulkarni  S, Bhagat N,  Fu M, Kedigehalli V, Kellogg C, Mittal  S,  Patel  JM,  Ramasamy  K, Taneja  S. Twitter  heron:  Stream
                     processing at scale. In: Proc. of the ACM Conf. on Management of Data. New York: ACM, 2015. 239−250. [doi: 10.1145/2723372.
                     2742788]
   290   291   292   293   294   295   296   297   298   299   300