Page 230 - 《软件学报》2025年第5期
P. 230
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(5):2130−2150 [doi: 10.13328/j.cnki.jos.007199] [CSTR: 32375.14.jos.007199] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
*
基于去噪图自编码器的无监督社交媒体文本摘要
贺瑞芳 1,2 , 赵堂龙 1,2 , 刘焕宇 1,2
1
(天津大学 智能与计算学部, 天津 300350)
2
(天津市认知计算与应用重点实验室, 天津 300350)
通信作者: 贺瑞芳, E-mail: rfhe@tju.edu.cn
摘 要: 社交媒体文本摘要旨在为面向特定话题的大规模社交媒体短文本 (称为帖子) 产生简明扼要的摘要描述.
考虑帖子表达内容短小、非正式等特点, 传统方法面临特征稀疏与信息不足的挑战. 近期研究利用帖子间的社交
关系学习更好的帖子表示并去除冗余信息, 但其忽略了真实社交媒体情景中存在的不可靠噪声关系, 使得模型会
误导帖子的重要性与多样性判断. 因此, 提出一种无监督模型 DSNSum, 其通过去除社交网络中的噪声关系来改善
摘要性能. 首先, 对真实社交关系网络中的噪声关系进行了统计验证; 其次, 根据社会学理论设计两个噪声函数, 并
构建一种去噪图自编码器 (denoising graph auto-encoder, DGAE), 以降低噪声关系的影响, 并学习融合可信社交关
系的帖子表示; 最终, 通过稀疏重构框架选择保持覆盖性、重要性及多样性的帖子构成一定长度的摘要. 在两个真
实社交媒体 (Twitter 与新浪微博) 共计 22 个话题上的实验结果证明了所提模型的有效性, 也为后续相关领域的研
究提供了新的思路.
关键词: 社交媒体文本摘要; 图表示学习; 图神经网络; 去噪自编码器
中图法分类号: TP18
中文引用格式: 贺瑞芳, 赵堂龙, 刘焕宇. 基于去噪图自编码器的无监督社交媒体文本摘要. 软件学报, 2025, 36(5): 2130–2150.
http://www.jos.org.cn/1000-9825/7199.htm
英文引用格式: He RF, Zhao TL, Liu HY. Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization. Ruan
Jian Xue Bao/Journal of Software, 2025, 36(5): 2130–2150 (in Chinese). http://www.jos.org.cn/1000-9825/7199.htm
Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization
1,2
1,2
HE Rui-Fang , ZHAO Tang-Long , LIU Huan-Yu 1,2
1
(College of Intelligence and Computing, Tianjin University, Tianjin 300350, China)
2
(Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China)
Abstract: Social media text summarization aims to provide concise summaries for large-scale social media short texts (referred to as
posts) targeting specific topics. Given the brief and informal contents of posts, traditional methods confront the challenges of sparse
features and insufficient information. Recent research endeavors have leveraged social relationships among posts to refine post contents and
remove redundant information, but these efforts neglect the presence of unreliable noise relationships in real social media contexts, leading
to erroneous assessments of post importance and diversity. Therefore, this study proposes a novel unsupervised model DSNSum, which
improves summarization performance by removing noise relationships in the social networks. Firstly, the noise relationships in real social
relationship networks are statistically verified. Secondly, two noise functions are designed based on sociological theories, and a denoising
graph auto-encoder (DGAE) is constructed to mitigate the influence of noise relationships and cultivate post contents of credible social
relationships. Finally, a sparse reconstruction framework is utilized to select posts that maintain coverage, importance, and diversity to
form a summary of a certain length. Experimental results on a total of 22 topics from two real social media platforms (Twitter and Sina
Weibo) demonstrate the efficacy of the proposed model and provide new insights for subsequent research in related fields.
* 基金项目: 国家自然科学基金 (62376192, 62376188)
收稿时间: 2023-07-05; 修改时间: 2023-11-22, 2024-02-06; 采用时间: 2024-04-02; jos 在线出版时间: 2024-06-20
CNKI 网络首发时间: 2024-06-21