Page 62 - 《软件学报》2020年第10期

P. 62

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(10):3038−3055 [doi: 10.13328/j.cnki.jos.006069] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
一种适应 GPU 的混合访问缓存索引框架

1
1
1,2
张鸿骏 , 武延军 , 张珩 , 张立波 1
1
(中国科学院软件研究所,北京 100190)
2 (中国科学院大学,北京 100049)
通讯作者: 张鸿骏, E-mail: hongjun@iscas.ac.cn

摘要: 散列表(hash table)作为一类根据关键码值(key value)提供高效数据访问的数据索引结构,其广泛应用于
各类计算机应用中,尤其是在对性能要求极高的系统软件、数据库以及高性能计算领域.在网络、云计算和物联网
服务方面,以散列表为核心结构已经成为缓存系统的重要系统组件.然而,随着大规模数据量的大幅度增加,以多核
CPU 为核心设计散列表结构的系统已经逐渐出现性能瓶颈,亟需进一步改进散列表的高性能和可扩展性.随着通用
图形处理器(graphic processing unit,简称 GPU)的日益普及以及硬件计算能力和并发性能的大幅度提升,各类以并行
计算为核心的系统软件任务在 GPU 上进行了优化设计并得到可观的性能提升.由于存在稀疏性和随机性,采用现有
散列表的并行结构直接在 GPU 上应用势必会带来高频次的内存访问和频繁的总线数据传输,影响了散列表在 GPU
上的性能发挥.重点分析了缓存系统中散列表索引的内存访问、命中率与索引开销,提出并设计了一种适应 GPU 的
混合访问缓存索引框架 CCHT(cache cuckoo hash table),提供了两种适应不同命中率和索引开销要求的缓存策略,允
许写入与查询操作并发执行,最大程度地利用了 GPU 硬件的计算性能与并发特性,减少了内存访问与总线传输.通
过在 GPU 硬件上的实现与实验验证,CCHT 在保证缓存命中率的同时,性能优于其他用于缓存索引的散列表.
关键词: 系统软件;缓存索引;散列表;GPU
中图法分类号: TP306

中文引用格式: 张鸿骏,武延军,张珩,张立波.一种适应 GPU 的混合访问缓存索引框架.软件学报,2020,31(10):3038−3055.
http://www.jos.org.cn/1000-9825/6069.htm
英文引用格式: Zhang HJ, Wu YJ, Zhang H, Zhang LB. Hybrid access cache indexing framework adapted to GPU. Ruan Jian
Xue Bao/Journal of Software, 2020,31(10):3038−3055 (in Chinese). http://www.jos.org.cn/1000-9825/6069.htm

Hybrid Access Cache Indexing Framework Adapted to GPU

1
1,2
1
ZHANG Hong-Jun , WU Yan-Jun , ZHANG Heng , ZHANG Li-Bo 1
1 (Institute of Software, Chinese Academy of Sciences, Beijing 100190, China)
2 (University of Chinese Academy of Sciences, Beijing 100049, China)
Abstract: Hash tables, as a type of data indexing structure that provides efficient data access based on key values, are widely used in
various computer applications, especially in system software, databases, and high performance that require high performance Computing
field. In network, cloud computing and IoT services, hash tables have become the core system components of cache systems. However,
with the large-scale increase in the amount of large-scale data, performance bottlenecks have gradually emerged in systems designed with
a multi-core CPU as the core of the hash table structure. There is an urgent need to further improve the high performance and scalability
of the hash table. With the increasing popularity of general-purpose graphics processing units (GPUs) and the substantial improvement of

∗ 基金项目: 中国科学院战略性先导科技专项预研课题(Y8XD373105); 中国科学院前沿科学重点研究计划(ZDBS-LY-JSC038)
Foundation item: Strategic Priority Research Program of the Chinese Academy of Sciences (Y8XD373105); Key Research Program
of Frontier Sciences, Chinese Academy of Sciences (ZDBS-LY-JSC038)
本文由“系统软件前沿进展”专题特约编辑武延军研究员、陈海波教授、包云岗研究员、李玲研究员推荐.
收稿时间: 2020-02-10; 修改时间: 2020-04-04; 采用时间: 2020-05-09; jos 在线出版时间: 2020-06-10

57 58 59 60 61 62 63 64 65 66 67