Page 374 - 《软件学报》2025年第9期
P. 374

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(9):4285−4310 [doi: 10.13328/j.cnki.jos.007315] [CSTR: 32375.14.jos.007315]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                            *
                 基于神经网络的分布式追踪数据压缩和查询方法

                 王    尚  1,2 ,    张晨曦  1,2 ,    彭    鑫  1,2


                 1
                  (复旦大学 计算机科学技术学院, 上海 200438)
                  (上海市数据科学重点实验室       (复旦大学), 上海 200438)
                 2
                 通信作者: 张晨曦, E-mail: cxzhang20@fudan.edu.cn

                 摘 要: 分布式追踪数据作为一种重要的可观测性数据, 对性能分析、故障诊断、系统理解等运维任务起着至关
                 重要的作用. 由于系统规模和复杂性的快速增加, 追踪数据的规模愈发庞大, 对存储提出了更高的要求. 为了降低
                 追踪数据的存储成本, 数据压缩成为一种至关重要的方式. 现有的压缩方法无法充分利用追踪的数据特征实现高
                 效压缩, 而且不支持对压缩数据的复杂查询. 提出了一种基于神经网络的分布式追踪数据压缩和查询方法. 该方法
                 采用一种新的冗余抽取方式来识别追踪数据中的模式冗余和结构冗余, 并利用神经网络模型和算术编码实现高效
                 的数据压缩. 同时, 该方法可以在压缩数据上进行高效查询, 而无需解压所有数据. 在                        4  个开源微服务系统上收集
                 多个不同大小的追踪数据集, 并对该方法展开评估. 实验结果表明, 该方法实现了较高的压缩比                             (105.5–197.6), 平
                 均是现有通用压缩算法的         4  倍. 此外, 还验证了该方法在压缩数据上的查询效率, 在最优情况下快于现有查询工具.
                 关键词: 分布式追踪; 无损压缩; 查询; 神经网络
                 中图法分类号: TP311

                 中文引用格式: 王尚, 张晨曦, 彭鑫. 基于神经网络的分布式追踪数据压缩和查询方法. 软件学报, 2025, 36(9): 4285–4310. http://
                 www.jos.org.cn/1000-9825/7315.htm
                 英文引用格式: Wang S, Zhang CX, Peng X. Neural-network-based Compression and Query Approach for Distributed Tracing Data.
                 Ruan Jian Xue Bao/Journal of Software, 2025, 36(9): 4285–4310 (in Chinese). http://www.jos.org.cn/1000-9825/7315.htm

                 Neural-network-based Compression and Query Approach for Distributed Tracing Data
                           1,2
                                           1,2
                 WANG Shang , ZHANG Chen-Xi , PENG Xin 1,2
                 1
                 (School of Computer Science, Fudan University, Shanghai 200438, China)
                 2
                 (Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 200438, China)
                 Abstract:  As  an  essential  type  of  observability  data,  distributed  tracing  data  plays  a  crucial  role  in  operation  and  maintenance  tasks  like
                 performance  analysis,  fault  diagnosis,  and  system  understanding.  Due  to  the  rapid  increase  in  system  scale  and  complexity,  the  volume  of
                 tracing  data  grows  exponentially,  putting  forward  higher  storage  requirements.  To  mitigate  the  storage  cost  of  tracing  data,  data
                 compression  becomes  a  crucial  approach.  Existing  compression  methods  fail  to  fully  exploit  tracing  data  features  for  achieving  efficient
                 compression,  and  they  do  not  support  complex  queries  on  compressed  data  either.  This  study  introduces  a  neural-network-based  approach
                 for compressing and querying distributed tracing data. It employs a novel redundancy extraction technique to identify pattern and structural
                 redundancies  within  tracing  data,  and  leverages  neural  network  models  and  arithmetic  coding  to  achieve  efficient  data  compression.
                 Meanwhile,  the  method  enables  efficient  querying  of  compressed  data  without  decompressing  all  the  data.  Variously  sized  tracing  datasets
                 are  collected  from  four  open-source  microservices  systems,  and  the  proposed  method  is  evaluated.  Experimental  results  show  relatively
                 high  compression  ratios  (105.5–197.6)  are  achieved  by  the  proposed  method,  which  are  four  times  those  of  state-of-the-art  general
                 compression  algorithms  on  average.  Additionally,  the  querying  efficiency  of  the  proposed  method  on  the  compressed  data  is  validated,
                 showcasing faster performance than existing query tools in optimal scenarios.
                 Key words:  distributed tracing; lossless compression; querying; neural network


                 *    收稿时间: 2024-01-15; 修改时间: 2024-06-06; 采用时间: 2024-10-30; jos 在线出版时间: 2025-06-04
                  CNKI 网络首发时间: 2025-06-05
   369   370   371   372   373   374   375   376   377   378   379