Page 230 - 《软件学报》2025年第5期
P. 230

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(5):2130−2150 [doi: 10.13328/j.cnki.jos.007199] [CSTR: 32375.14.jos.007199]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                            *
                 基于去噪图自编码器的无监督社交媒体文本摘要

                 贺瑞芳  1,2 ,    赵堂龙  1,2 ,    刘焕宇  1,2


                 1
                  (天津大学 智能与计算学部, 天津 300350)
                 2
                  (天津市认知计算与应用重点实验室, 天津 300350)
                 通信作者: 贺瑞芳, E-mail: rfhe@tju.edu.cn

                 摘 要: 社交媒体文本摘要旨在为面向特定话题的大规模社交媒体短文本                         (称为帖子) 产生简明扼要的摘要描述.
                 考虑帖子表达内容短小、非正式等特点, 传统方法面临特征稀疏与信息不足的挑战. 近期研究利用帖子间的社交
                 关系学习更好的帖子表示并去除冗余信息, 但其忽略了真实社交媒体情景中存在的不可靠噪声关系, 使得模型会
                 误导帖子的重要性与多样性判断. 因此, 提出一种无监督模型                  DSNSum, 其通过去除社交网络中的噪声关系来改善
                 摘要性能. 首先, 对真实社交关系网络中的噪声关系进行了统计验证; 其次, 根据社会学理论设计两个噪声函数, 并
                 构建一种去噪图自编码器         (denoising graph auto-encoder, DGAE), 以降低噪声关系的影响, 并学习融合可信社交关
                 系的帖子表示; 最终, 通过稀疏重构框架选择保持覆盖性、重要性及多样性的帖子构成一定长度的摘要. 在两个真
                 实社交媒体    (Twitter 与新浪微博) 共计   22  个话题上的实验结果证明了所提模型的有效性, 也为后续相关领域的研
                 究提供了新的思路.
                 关键词: 社交媒体文本摘要; 图表示学习; 图神经网络; 去噪自编码器
                 中图法分类号: TP18

                 中文引用格式: 贺瑞芳, 赵堂龙, 刘焕宇. 基于去噪图自编码器的无监督社交媒体文本摘要. 软件学报, 2025, 36(5): 2130–2150.
                 http://www.jos.org.cn/1000-9825/7199.htm
                 英文引用格式: He RF, Zhao TL, Liu HY. Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization. Ruan
                 Jian Xue Bao/Journal of Software, 2025, 36(5): 2130–2150 (in Chinese). http://www.jos.org.cn/1000-9825/7199.htm

                 Denoising Graph Auto-encoder for Unsupervised Social Media Text Summarization
                          1,2
                                          1,2
                 HE Rui-Fang , ZHAO Tang-Long , LIU Huan-Yu 1,2
                 1
                 (College of Intelligence and Computing, Tianjin University, Tianjin 300350, China)
                 2
                 (Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin 300350, China)
                 Abstract:  Social  media  text  summarization  aims  to  provide  concise  summaries  for  large-scale  social  media  short  texts  (referred  to  as
                 posts)  targeting  specific  topics.  Given  the  brief  and  informal  contents  of  posts,  traditional  methods  confront  the  challenges  of  sparse
                 features and insufficient information. Recent research endeavors have leveraged social relationships among posts to refine post contents and
                 remove  redundant  information,  but  these  efforts  neglect  the  presence  of  unreliable  noise  relationships  in  real  social  media  contexts,  leading
                 to  erroneous  assessments  of  post  importance  and  diversity.  Therefore,  this  study  proposes  a  novel  unsupervised  model  DSNSum,  which
                 improves  summarization  performance  by  removing  noise  relationships  in  the  social  networks.  Firstly,  the  noise  relationships  in  real  social
                 relationship  networks  are  statistically  verified.  Secondly,  two  noise  functions  are  designed  based  on  sociological  theories,  and  a  denoising
                 graph  auto-encoder  (DGAE)  is  constructed  to  mitigate  the  influence  of  noise  relationships  and  cultivate  post  contents  of  credible  social
                 relationships.  Finally,  a  sparse  reconstruction  framework  is  utilized  to  select  posts  that  maintain  coverage,  importance,  and  diversity  to
                 form  a  summary  of  a  certain  length.  Experimental  results  on  a  total  of  22  topics  from  two  real  social  media  platforms  (Twitter  and  Sina
                 Weibo) demonstrate the efficacy of the proposed model and provide new insights for subsequent research in related fields.


                 *    基金项目: 国家自然科学基金  (62376192, 62376188)
                  收稿时间: 2023-07-05; 修改时间: 2023-11-22, 2024-02-06; 采用时间: 2024-04-02; jos 在线出版时间: 2024-06-20
                  CNKI 网络首发时间: 2024-06-21
   225   226   227   228   229   230   231   232   233   234   235