Page 437 - 《软件学报》2025年第7期
P. 437

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(7):3358−3374 [doi: 10.13328/j.cnki.jos.007221] [CSTR: 32375.14.jos.007221]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                 *
                 代价敏感的指纹可变哈希布谷鸟过滤器

                 李    猛,    罗文啟,    戴海鹏,    王瀚橙,    顾    荣,    陈贵海


                 (南京大学 计算机学院, 江苏 南京 210023)
                 通信作者: 戴海鹏, E-mail: haipengdai@nju.edu.cn; 陈贵海, E-mail: gchen@nju.edu.cn

                 摘 要: 布谷鸟过滤器是一种空间高效的近似成员资格查询数据结构, 在网络系统中被广泛应用于网络路由、网
                 络测量和网络缓存等. 然而, 传统的布谷鸟过滤器设计并未充分考虑在网络系统中, 部分或全部查询集合已知的情
                 况, 以及这部分查询具有代价的情况. 这导致现有的布谷鸟过滤器在该情况下性能无法达到最优. 为此, 设计了指
                 纹可变哈希布谷鸟过滤器         (VHCF). VHCF  提出了指纹可变哈希技术, 感知已知的查询集合及其代价, 通过为每个
                 哈希桶搜索最优指纹哈希函数, 从而大幅降低误判代价. 随后, 每个哈希桶的最优指纹哈希函数会被独立地记录进
                 入每个哈希桶内的哈希索引单元. 此外, 提出了一种单哈希的技术用于降低引入指纹可变哈希技术导致的额外计
                 算开销, 还对   VHCF  的操作复杂度和误判率进行了理论分析. 最后, 实验和理论结果都一致表明, VHCF                       在保证查
                 询吞吐量相当的情况下, 取得了比现有布谷鸟过滤器及其变种都要低的误判率. 特别的, 在保持指纹长度相同的情
                 况下, VHCF  只需为每个哈希索引单元分配 1–2         比特, 即可相比标准布谷鸟过滤器降低误判率               12.5%–50%.
                 关键词: 布谷鸟过滤器; 代价敏感; 集合成员查询; 误判率
                 中图法分类号: TP393

                 中文引用格式: 李猛,  罗文啟,  戴海鹏,  王瀚橙,  顾荣,  陈贵海.  代价敏感的指纹可变哈希布谷鸟过滤器.  软件学报,  2025,  36(7):
                 3358–3374. http://www.jos.org.cn/1000-9825/7221.htm
                 英文引用格式: Li M, Luo WQ, Dai HP, Wang HC, Gu R, Chen GH. Cost-sensitive Variable Hashing-fingerprint  Cuckoo  Filter.  Ruan
                 Jian Xue Bao/Journal of Software, 2025, 36(7): 3358–3374 (in Chinese). http://www.jos.org.cn/1000-9825/7221.htm

                 Cost-sensitive Variable Hashing-fingerprint Cuckoo Filter
                 LI Meng, LUO Wen-Qi, DAI Hai-Peng, WANG Han-Cheng, GU Rong, CHEN Gui-Hai
                 (School of Computer Science, Nanjing University, Nanjing 210023, China)
                 Abstract:  A  cuckoo  filter  is  a  space-efficient  approximate  membership  query  data  structure,  widely  used  in  network  systems  for
                 applications such as network routing, network measurement, and network caching. However, the traditional design of cuckoo filters has not
                 adequately considered the scenario in network systems where some or all queries in the collection are known, and these queries come with
                 associated  costs.  This  limitation  results  in  the  suboptimal  performance  of  existing  cuckoo  filters  in  such  situations.  To  address  this,  the
                 variable  hashing-fingerprint  cuckoo  filter  (VHCF)  has  been  developed.  VHCF  introduces  variable  fingerprint  hashing  technology,  taking
                 into  account  the  known  query  collection  and  their  associated  costs.  By  searching  for  the  optimal  fingerprint  hash  function  for  each  hash
                 bucket,  the  overall  cost  of  false  positives  is  significantly  reduced.  In  addition,  this  study  proposes  a  single-hash  technology  to  reduce  the
                 additional  computational  overhead  caused  by  the  variable-hash  technology.  A  theoretical  analysis  of  the  operational  complexity  and  false
                 positive  rate  of  VHCF  is  also  provided.  Finally,  experimental  and  theoretical  results  both  demonstrate  that  VHCF  achieves  a  significantly
                 lower  false  positive  rate  than  existing  cuckoo  filters  and  their  variants  while  ensuring  comparable  query  throughput.  Specifically,  VHCF
                 only needs to allocate 1–2 bits for each hash index unit, which can reduce the false positive rate to 12.5%–50% of the original.
                 Key words:  cuckoo filter; cost-sensitivity; membership testing; false positive rate
                    过滤器作为一种在空间和时间上高效的数据结构, 被广泛应用于数据挖掘                         [1−3] 、高性能网络  [4−13] 和数据库系
                 统  [14−16] 等领域, 用于处理近似集合成员查询问题. 作为经典的布隆过滤器                [17] 的改进方案, 布谷鸟过滤器 (cuckoo


                 *    收稿时间: 2023-09-14; 修改时间: 2024-05-08; 采用时间: 2024-05-11; jos 在线出版时间: 2024-07-03
                  CNKI 网络首发时间: 2024-07-05
   432   433   434   435   436   437   438   439   440   441   442