Page 437 - 《软件学报》2025年第7期
P. 437
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(7):3358−3374 [doi: 10.13328/j.cnki.jos.007221] [CSTR: 32375.14.jos.007221] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
*
代价敏感的指纹可变哈希布谷鸟过滤器
李 猛, 罗文啟, 戴海鹏, 王瀚橙, 顾 荣, 陈贵海
(南京大学 计算机学院, 江苏 南京 210023)
通信作者: 戴海鹏, E-mail: haipengdai@nju.edu.cn; 陈贵海, E-mail: gchen@nju.edu.cn
摘 要: 布谷鸟过滤器是一种空间高效的近似成员资格查询数据结构, 在网络系统中被广泛应用于网络路由、网
络测量和网络缓存等. 然而, 传统的布谷鸟过滤器设计并未充分考虑在网络系统中, 部分或全部查询集合已知的情
况, 以及这部分查询具有代价的情况. 这导致现有的布谷鸟过滤器在该情况下性能无法达到最优. 为此, 设计了指
纹可变哈希布谷鸟过滤器 (VHCF). VHCF 提出了指纹可变哈希技术, 感知已知的查询集合及其代价, 通过为每个
哈希桶搜索最优指纹哈希函数, 从而大幅降低误判代价. 随后, 每个哈希桶的最优指纹哈希函数会被独立地记录进
入每个哈希桶内的哈希索引单元. 此外, 提出了一种单哈希的技术用于降低引入指纹可变哈希技术导致的额外计
算开销, 还对 VHCF 的操作复杂度和误判率进行了理论分析. 最后, 实验和理论结果都一致表明, VHCF 在保证查
询吞吐量相当的情况下, 取得了比现有布谷鸟过滤器及其变种都要低的误判率. 特别的, 在保持指纹长度相同的情
况下, VHCF 只需为每个哈希索引单元分配 1–2 比特, 即可相比标准布谷鸟过滤器降低误判率 12.5%–50%.
关键词: 布谷鸟过滤器; 代价敏感; 集合成员查询; 误判率
中图法分类号: TP393
中文引用格式: 李猛, 罗文啟, 戴海鹏, 王瀚橙, 顾荣, 陈贵海. 代价敏感的指纹可变哈希布谷鸟过滤器. 软件学报, 2025, 36(7):
3358–3374. http://www.jos.org.cn/1000-9825/7221.htm
英文引用格式: Li M, Luo WQ, Dai HP, Wang HC, Gu R, Chen GH. Cost-sensitive Variable Hashing-fingerprint Cuckoo Filter. Ruan
Jian Xue Bao/Journal of Software, 2025, 36(7): 3358–3374 (in Chinese). http://www.jos.org.cn/1000-9825/7221.htm
Cost-sensitive Variable Hashing-fingerprint Cuckoo Filter
LI Meng, LUO Wen-Qi, DAI Hai-Peng, WANG Han-Cheng, GU Rong, CHEN Gui-Hai
(School of Computer Science, Nanjing University, Nanjing 210023, China)
Abstract: A cuckoo filter is a space-efficient approximate membership query data structure, widely used in network systems for
applications such as network routing, network measurement, and network caching. However, the traditional design of cuckoo filters has not
adequately considered the scenario in network systems where some or all queries in the collection are known, and these queries come with
associated costs. This limitation results in the suboptimal performance of existing cuckoo filters in such situations. To address this, the
variable hashing-fingerprint cuckoo filter (VHCF) has been developed. VHCF introduces variable fingerprint hashing technology, taking
into account the known query collection and their associated costs. By searching for the optimal fingerprint hash function for each hash
bucket, the overall cost of false positives is significantly reduced. In addition, this study proposes a single-hash technology to reduce the
additional computational overhead caused by the variable-hash technology. A theoretical analysis of the operational complexity and false
positive rate of VHCF is also provided. Finally, experimental and theoretical results both demonstrate that VHCF achieves a significantly
lower false positive rate than existing cuckoo filters and their variants while ensuring comparable query throughput. Specifically, VHCF
only needs to allocate 1–2 bits for each hash index unit, which can reduce the false positive rate to 12.5%–50% of the original.
Key words: cuckoo filter; cost-sensitivity; membership testing; false positive rate
过滤器作为一种在空间和时间上高效的数据结构, 被广泛应用于数据挖掘 [1−3] 、高性能网络 [4−13] 和数据库系
统 [14−16] 等领域, 用于处理近似集合成员查询问题. 作为经典的布隆过滤器 [17] 的改进方案, 布谷鸟过滤器 (cuckoo
* 收稿时间: 2023-09-14; 修改时间: 2024-05-08; 采用时间: 2024-05-11; jos 在线出版时间: 2024-07-03
CNKI 网络首发时间: 2024-07-05

