Page 233 - 《软件学报》2024年第4期
P. 233
万常选 等: 主题方面共享的领域主题层次模型 1811
出具有明确层次关系和关联关系的主题层次结构.
本文通过分层次的主题方面共享机制改变 nCRP 构造方法中主题树形结构生成过程, 提出 nCRP+层次构造方
法和 rHDP 层次主题模型, 挖掘不同主题下的关联子主题. 结合领域类别信息定义基于投票机制的领域隶属度计
算方法, 目的是引导每层级词语集的主题分配过程, 明确主题与领域之间的映射关系, 构建领域主题之间的层次关
系. 通过词语与领域主题的语义相关度引导主题-词语分配过程, 目的是将语义相近的词语分配在相同主题中, 凝
聚领域主题涵义. 同时, 通过词语与其所在主题树分支中主题的领域相关性, 定义层次化的主题-词语贡献度, 明确
关联子主题在主题词上的领域差异性.
结合基于投票机制的领域隶属度、词语与领域主题的语义相关度和层次化的主题-词语贡献度, 设计领域知
识的形式化描述, 改进层次化的采样过程, 提出一种通用的、结合领域知识的层次主题模型 rHDP_DK, 实现领域
主题层次关系和关联子主题共享关系的构建, 以及领域主题词的提取.
下一步工作将研究基于时变信息的领域主题层次结构, 便于分析各领域主题下的子主题及其主题词在不同时
期的变化规律.
References:
[1] Liu TX, Xu MF. Can internet search behavior help to forecast the macro economy? Economic Research Journal, 2015, 50(12): 68–83 (in
Chinese with English abstract).
[2] Blei DM, Griffiths TL, Jordan MI. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies.
Advances in Neural Information Processing Systems, 2007, 16(2): 17–24.
[3] Paisley J, Wang C, Blei DM, Jordan MI. Nested hierarchical Dirichlet processes. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 2015, 37(2): 256–270. [doi: 10.1109/TPAMI.2014.2318728]
[4] Ahmed A, Hong LJ, Smola AJ. Nested Chinese restaurant franchise processes: Applications to user tracking and document modeling. In:
Proc. of the 30th Int’l Conf. on Machine Learning. Atlanta: JMLR.org, 2013. 1426–1434.
[5] Meng Y, Zhang YY, Huang JX, Zhang Y, Zhang C, Han JW. Hierarchical topic mining via joint spherical tree and text embedding. In:
Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. ACM, 2020. 1908–1917. [doi: 10.1145/3394486.
3403242]
[6] Huang JX, Xie YQ, Meng Y, Zhang YY, Han JW. CoRel: Seed-guided topical taxonomy construction by concept learning and relation
transferring. In: Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. ACM, 2020. 1928–1936. [doi: 10.
1145/3394486.3403244]
[7] Zhao H, Du L, Buntine W, Zhou MY. Inter and intra topic structure learning with word embeddings. In: Proc. of the 35th Int’l Conf. on
Machine Learning. Stroudsburg: PMLR, 2018. 5892–5901.
[8] Zhao H, Du L, Buntine W, Zhou MY. Dirichlet belief networks for topic structure learning. In: Proc. of the 32nd Int’l Conf. on Neural
Information Processing Systems. Montréal: Curran Associates Inc., 2018. 7966–7977.
[9] Isonuma M, Mori J, Bollegala D, Sakata I. Tree-structured neural topic model. In: Proc. of the 58th Annual Meeting of the Association
for Computational Linguistics. ACL, 2020. 800–806. [doi: 10.18653/v1/2020.acl-main.73]
[10] Gan Z, Chen CY, Henao R, Carlson D, Carin L. Scalable deep Poisson factor analysis for topic modeling. In: Proc. of the 32nd Int’l Conf.
on Machine Learning. Lille: JMLR.org, 2015. 1823–1832.
[11] The YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006,
101(476): 1566–1581. [doi: 10.1198/016214506000000302]
[12] Zhang YT, Wan CX, Liu XP, Jiang TJ, Liu DX, Liao GQ. Mining unstructured economic indicators based on PSP_HDP topic model.
Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 845–865 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5898.
htm [doi: 10.13328/j.cnki.jos.005898]
[13] Han ZM, Zhang MM, Li MQ, Duan DG, Chen Y. Flow hierarchical Dirichlet process for complex topic modeling. Chinese Journal of
Computers, 2019, 42(7): 1539–1552 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2019.01539]
[14] Ma TF, Sato I, Nakagawa H. The hybrid nested/hierarchical Dirichlet process and its application to topic modeling with word
differentiation. In: Proc. of the 29th AAAI Conf. on Artificial Intelligence. AAAI, 2015. 2835–2841.
[15] Ding YQ, Li SP, Zhang Z, Shen B. Hierarchical topic modeling with nested hierarchical Dirichlet process. Journal of Zhejiang University-
SCIENCE A, 2009, 10(6): 858–867. [doi: 10.1631/jzus.A0820796]