Page 468 - 《软件学报》2025年第12期
P. 468

韩瑞琛 等: NUMA-conscious 外键连接优化技术                                                  5849


                 法设计的复杂度, 提高连接算法性能.
                    从提高   NUMA  架构下数据库连接操作性能的目标来看, 在               OLAP  查询算法上优先选择具有良好          cache 局部
                 性的向量连接算法, 最大化        cache 缓存效率并减少     NUMA  延迟的影响; 在    OLAP  数据库模式设计上, 通过模式优
                 化降低连接表大小, 使其连接向量尽量位于              cache 优化区间  (最新  Intel 144  核处理器  L3 cache 容量达到  108 MB,
                 可以支持最大     1  亿行表记录的    cache 内向量连接), 通过   NUMA-oblivious 连接算法降低    NUMA  延迟对性能和算
                 法复杂度的影响; 在      CPU  的硬件架构设计上, 数据库连接负载的强局部性特征需要大容量统一                       L3 cache、降低
                 NUMA  访问延迟等硬件特性以应用简单的自适应                NUMA-oblivious 连接算法, 多  NUMA  的  chiplet 架构会增加
                 NUMA  连接算法优化的复杂性, 同时需要数据库在内存存储模型上面向多                       NUMA  架构设计与之匹配的        NUMA
                 分区存储策略, 将集中式内存数据库扩展为轻量               SN  架构内存数据库.
                    随着新一代     CPU  技术的发展, CPU   硬件架构的差异性进一步扩大, 原有连接优化技术的结论在新型架构上
                 可能会产生偏差, 本文通过设计系统性的算法库、测试基准和性能对比方法为                          NUMA  连接优化提供一个可参考
                 的研究框架.


                 致谢 感谢英特尔       (中国) 实验室邓刚、唐曦、钟涛等英特尔高级工程师对本文的硬件设备和技术支持.

                 References:
                  [1]   Balkesen C, Teubner J, Alonso G, Özsu MT. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In: Proc.
                     of the 29th IEEE Int’l Conf. on Data Engineering (ICDE). Brisbane: IEEE, 2013. 362–373. [doi: 10.1109/ICDE.2013.6544839]
                  [2]   Balkesen C, Alonso G, Teubner J, Özsu MT. Multi-core, main-memory joins: Sort vs. hash revisited. Proc. of the VLDB Endowment,
                     2013, 7(1): 85–96. [doi: 10.14778/2732219.2732227]
                  [3]   Zhang YS, Zhang Y, Wang S, Lu JH. Fusion OLAP: Fusing the pros of MOLAP and ROLAP together for in-memory OLAP. IEEE
                     Trans. on Knowledge and Data Engineering, 2019, 31(9): 1722–1735. [doi: 10.1109/TKDE.2018.2867522]
                  [4]   Schuh S, Chen X, Dittrich J. An experimental comparison of thirteen relational equi-joins in main memory. In: Proc. of the 2016 Int’l
                     Conf. on Management of Data. San Francisco: ACM, 2020. 1961–1976. [doi: 10.1145/2882903.2882917]
                  [5]   Zhang YS, Zhou X, Zhang Y, Zhang Y, Su MC, Wang S. Virtual denormalization via array index reference for main memory OLAP.
                     IEEE Trans. on Knowledge and Data Engineering, 2016, 28(4): 1061–1074. [doi: 10.1109/TKDE.2015.2499199]
                  [6]   Blanas S, Li YN, Patel JM. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proc. of the 2011 ACM
                     SIGMOD Int’l Conf. on Management of Data. Athens: ACM, 2011. 37–48. [doi: 10.1145/1989323.1989328]
                  [7]   Bandle M, Giceva J, Neumann T. To partition, or not to partition, that is the join question in a real system. In: Proc. of the 2021 Int’l
                     Conf. on Management of Data. ACM, 2021. 168–180. [doi: 10.1145/3448016.3452831]
                  [8]   Lang H, Leis V, Albutiu MC, Neumann T, Kemper A. Massively parallel NUMA-aware hash joins. In: Proc. of the 1st and 2nd Int’l
                     Workshops on in Memory Data Management and Analysis. Riva del Garda: Springer, 2015. 3–14. [doi: 10.1007/978-3-319-13960-9_1]
                  [9]   Zhang YS, Zhang Y, Zhou X, Lu JH. Main-memory foreign key joins on advanced processors: Design and re-evaluations for OLAP
                     workloads. Distributed and Parallel Databases, 2019, 37(4): 469–506. [doi: 10.1007/s10619-018-7226-4]
                 [10]   Albutiu MC, Kemper A, Neumann T. Massively parallel sort-merge joins in main memory multi-core database systems. Proc. of the
                     VLDB Endowment, 2012, 5(10): 1064–1075. [doi: 10.14778/2336664.2336678]
                 [11]   He J, Lu M, He BS. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proc. of the VLDB Endowment, 2013,
                     6(10): 889–900. [doi: 10.14778/2536206.2536216]
                 [12]   Halstead RJ, Absalyamov I, Najjar WA, Tsotras VJ. FPGA-based multithreading for in-memory hash joins. 2014. https://kipdf.com/fpga-
                     based-multithreading-for-in-memory-hash-joins_5ab7c0ac1723dd329c647c31.html
                 [13]   Kara K, Giceva J, Alonso G. FPGA-based data partitioning. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago:
                     ACM, 2017. 433–445. [doi: 10.1145/3035918.3035946]
                 [14]   Rui  R,  Tu  YC.  Fast  equi-join  algorithms  on  GPUs:  Design  and  implementation.  In:  Proc.  of  the  29th  Int’l  Conf.  on  Scientific  and
                     Statistical Database Management. Chicago: ACM, 2017. 17. [doi: 10.1145/3085504.3085521]
                 [15]   Sioulas P, Chrysogelos P, Karpathiotakis M, Appuswamy R, Ailamaki A. Hardware-conscious hash-joins on GPUs. In: Proc. of the 35th
                     IEEE Int’l Conf. on Data Engineering (ICDE). Macao: IEEE, 2019. 698–709. [doi: 10.1109/ICDE.2019.00068]
                 [16]   Shanbhag A, Madden S, Yu XY. A study of the fundamental performance characteristics of GPUs and CPUs for database analytics. In:
   463   464   465   466   467   468   469   470   471   472   473