Page 295 - 《软件学报》2021年第8期
P. 295
蒲勇霖 等:Storm 平台下的线程重分配与数据迁移节能策略 2577
为 300s.与 Storm 默认调度策略相比,相同数据量下执行 ERDM 集群拓扑中数据的传输与处理时间提高了 28%.
因此可以确定:在相同条件下,集群数据传输与处理的时间每提高 1%,则集群的能耗降低 1%.综上所述,相比于
Storm 默认的调度策略,本文提出的 ERDM 具有更好的节能效果.
5 总结与展望
高能耗问题,是限制大数据流式处理平台发展的主要障碍之一.Storm 是大数据流式处理中最具代表性的
平台之一,但是在最初的设计中并未考虑能耗问题,从而导致目前高能耗问题始终制约其发展.针对这一问题,
本文通过研究 Storm 集群的拓扑结构,建立了资源约束模型与最优线程重分配模型,并进一步提出了 Storm 平台
下的线程重分配与数据迁移节能策略.该策略由资源约束算法与数据迁移算法组成,使集群在减少节点间通信
成本的前提下,缩短了数据传输与处理的时间,并节约了能耗.最后,实验通过 4 组基准测试,从资源占用、性能与
能耗的角度验证了策略的有效性.
下一步的研究工作主要包括以下 4 个方面:(1) 将 ERDM 进一步部署到更为复杂的商业应用领域,使其可
以在更广阔的应用场景下使用;(2) 将布隆过滤器运用到集群,通过对集群拓扑内的数据进行预处理,删除数据
集内的重复数据,使集群单位时间内处理及传输的数据量减少,从而降低了集群延迟,并节约了能耗;(3) 目前,
Storm 集群内部的电子元件限制了性能与能效的发展,可通过替换高能效的电子元件,以提高集群的性能,并节
约能耗;(4) 目前,集群拓扑内的进程与线程的数量需要用户手动设置,研究拓扑内组件并行度自适应调节的调
度算法,由此提高了资源利用率,并节约了能耗.
References:
[1] Sun DW, Zhang GY, Zheng WM. Big data stream computing: Technologies and instances. Ruan Jian Xue Bao/Journal of Software,
2014,25(4):839−862 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4558.html [doi: 10.13328/j.cnki.jos.0045
58]
[2] Guo BL, Yu J, Liao B, Yang DX, Lu L. A green framework for DBMS based on energy-aware query optimization and energy-
efficient query processing. Journal of Network and Computer Applications, 2017,84:118−130. [doi: 10.1016/j.jnca.2017.02.015]
[3] Guo BL, Yu J, Yang DX, et al. Energy modeling and plan evaluation for queries in relational databases. Journal of Computer
Research and Development, 2019,56(4):810−824 (in Chinese with English abstract).
[4] Zhao XG, Hu QP, Ding L, et al. Energy saving scheduling strategy based on model prediction control for data centers. Ruan Jian
Xue Bao/Journal of Software, 2017,28(2):429−442 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5026. html
[doi: 10.13328/j.cnki.jos.005026]
[5] Yang T, Wang M, Zhang YJ, et al. HDFS differential storage energy-saving optimal algorithm in cloud data center. Chinese
Journal of Computers, 2019,42(4):47−61 (in Chinese with English abstract).
[6] Seagate. Data age 2025. 2017. https://www.seagate.com/files/www-content/our-story/trends/files/data-age-2025-white-paper-simpli
fied-chinese.pdf
[7] Sun DW, Zhang GY, Yang SL, Zheng WM, Khan SU, Li KQ. Re-Stream: Real-time and energy-efficient resource scheduling in big
data stream computing environments. Information Sciences, 2015,319:92−112. [doi: 10.1016/j.ins.2015.03.027]
[8] Cheng D, Chen Y, Zhou X, Gmach D, Milojicic D. Adaptive scheduling of parallel jobs in spark streaming. In: Proc. of the IEEE
Conf. on Computer Communications. Piscataway: IEEE, 2017. 1−9. [doi: 10.1109/INFOCOM.2017.8057206]
[9] Apache. Storm. 2017. http://storm.apache.org
[10] Li ZY, Yu J, Bian C, et al. Flow-network based auto rescale strategy for Flink. Journal on Communications, 2019,40(8):85−101 (in
Chinese with English abstract).
[11] Ying CT, Yu J, Bian C, et al. Criticality checkpoint management strategy based on RDD characteristics in Spark. Journal of
Computer Research and Development, 2017,54(12):2858−2872 (in Chinese with English abstract).
[12] Kulkarni S, Bhagat N, Fu M, Kedigehalli V, Kellogg C, Mittal S, Patel JM, Ramasamy K, Taneja S. Twitter heron: Stream
processing at scale. In: Proc. of the ACM Conf. on Management of Data. New York: ACM, 2015. 239−250. [doi: 10.1145/2723372.
2742788]