Page 374 - 《软件学报》2025年第9期

P. 374

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(9):4285−4310 [doi: 10.13328/j.cnki.jos.007315] [CSTR: 32375.14.jos.007315] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

*
基于神经网络的分布式追踪数据压缩和查询方法

王尚 1,2 , 张晨曦 1,2 , 彭鑫 1,2

1
(复旦大学计算机科学技术学院, 上海 200438)
(上海市数据科学重点实验室 (复旦大学), 上海 200438)
2
通信作者: 张晨曦, E-mail: cxzhang20@fudan.edu.cn

摘要: 分布式追踪数据作为一种重要的可观测性数据, 对性能分析、故障诊断、系统理解等运维任务起着至关
重要的作用. 由于系统规模和复杂性的快速增加, 追踪数据的规模愈发庞大, 对存储提出了更高的要求. 为了降低
追踪数据的存储成本, 数据压缩成为一种至关重要的方式. 现有的压缩方法无法充分利用追踪的数据特征实现高
效压缩, 而且不支持对压缩数据的复杂查询. 提出了一种基于神经网络的分布式追踪数据压缩和查询方法. 该方法
采用一种新的冗余抽取方式来识别追踪数据中的模式冗余和结构冗余, 并利用神经网络模型和算术编码实现高效
的数据压缩. 同时, 该方法可以在压缩数据上进行高效查询, 而无需解压所有数据. 在 4 个开源微服务系统上收集
多个不同大小的追踪数据集, 并对该方法展开评估. 实验结果表明, 该方法实现了较高的压缩比 (105.5–197.6), 平
均是现有通用压缩算法的 4 倍. 此外, 还验证了该方法在压缩数据上的查询效率, 在最优情况下快于现有查询工具.
关键词: 分布式追踪; 无损压缩; 查询; 神经网络
中图法分类号: TP311

中文引用格式: 王尚, 张晨曦, 彭鑫. 基于神经网络的分布式追踪数据压缩和查询方法. 软件学报, 2025, 36(9): 4285–4310. http://
www.jos.org.cn/1000-9825/7315.htm
英文引用格式: Wang S, Zhang CX, Peng X. Neural-network-based Compression and Query Approach for Distributed Tracing Data.
Ruan Jian Xue Bao/Journal of Software, 2025, 36(9): 4285–4310 (in Chinese). http://www.jos.org.cn/1000-9825/7315.htm

Neural-network-based Compression and Query Approach for Distributed Tracing Data
1,2
1,2
WANG Shang , ZHANG Chen-Xi , PENG Xin 1,2
1
(School of Computer Science, Fudan University, Shanghai 200438, China)
2
(Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 200438, China)
Abstract: As an essential type of observability data, distributed tracing data plays a crucial role in operation and maintenance tasks like
performance analysis, fault diagnosis, and system understanding. Due to the rapid increase in system scale and complexity, the volume of
tracing data grows exponentially, putting forward higher storage requirements. To mitigate the storage cost of tracing data, data
compression becomes a crucial approach. Existing compression methods fail to fully exploit tracing data features for achieving efficient
compression, and they do not support complex queries on compressed data either. This study introduces a neural-network-based approach
for compressing and querying distributed tracing data. It employs a novel redundancy extraction technique to identify pattern and structural
redundancies within tracing data, and leverages neural network models and arithmetic coding to achieve efficient data compression.
Meanwhile, the method enables efficient querying of compressed data without decompressing all the data. Variously sized tracing datasets
are collected from four open-source microservices systems, and the proposed method is evaluated. Experimental results show relatively
high compression ratios (105.5–197.6) are achieved by the proposed method, which are four times those of state-of-the-art general
compression algorithms on average. Additionally, the querying efficiency of the proposed method on the compressed data is validated,
showcasing faster performance than existing query tools in optimal scenarios.
Key words: distributed tracing; lossless compression; querying; neural network

* 收稿时间: 2024-01-15; 修改时间: 2024-06-06; 采用时间: 2024-10-30; jos 在线出版时间: 2025-06-04
CNKI 网络首发时间: 2025-06-05

369 370 371 372 373 374 375 376 377 378 379