Page 240 - 《软件学报》2020年第10期

P. 240

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(10):3216–3237 [doi: 10.13328/j.cnki.jos.005852] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
一种时间序列鉴别性特征字典构建算法

张伟, 王志海, 原继东, 郝石磊

(北京交通大学计算机与信息技术学院,北京 100044)
通讯作者: 原继东, E-mail: yuanjd@bjtu.edu.cn

摘要: 时间序列数据广泛产生于科技和经济的多个领域.基于符号傅里叶近似(symbolic Fourier approximation)
和滑动窗口的定长单词抽取算法是目前时间序列特征字典构建过程中最有效的特征生成算法之一,但是该算法在
特征生成过程中不能根据不同滑动窗口长度动态地选择保留的最优傅里叶值的个数,而且特征字典构建过程中缺
少从生成的海量特征中对鉴别性特征进行有效选择的算法.为此,提出一种鉴别性特征字典构建算法.首先,提出一
种针对不同长度滑动窗口学习最优单词长度的基于 Fourier 近似的可变长度单词抽取方法;其次,构建了一种新的特
征鉴别性评价指标,并依据其动态阈值对生成的特征进行选择.实验结果表明,基于构建的特征字典的逻辑回归模型
不仅分类精度高,而且可以有效发现预测过程中的鉴别性特征.
关键词: 时间序列分类;特征生成;鉴别性特征选择;特征字典学习
中图法分类号: TP311

中文引用格式: 张伟,王志海,原继东,郝石磊.一种时间序列鉴别性特征字典构建算法.软件学报,2020,31(10):3216–3237.
http://www.jos.org.cn/1000-9825/5852.htm
英文引用格式: Zhang W, Wang ZH, Yuan JD, Hao SL. Time series discriminative feature dictionary construction algorithm.
Ruan Jian Xue Bao/Journal of Software, 2020,31(10):3216–3237 (in Chinese). http://www.jos.org.cn/1000-9825/5852. htm
Time Series Discriminative Feature Dictionary Construction Algorithm

ZHANG Wei, WANG Zhi-Hai, YUAN Ji-Dong, HAO Shi-Lei
(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
Abstract: Time series data are widely generated in many fields of science, technology and economy. Time series feature generation
algorithm based on Symbolic Fourier Approximation (SFA) and sliding window transformation mechanism is one of the most effective
feature dictionary construction algorithms, but there are some obvious shortcomings in this kind of methods. Firstly, the number of optimal
Fourier values cannot be dynamically selected for different sliding window lengths in the process of transformation. Secondly, there is a
lack of effective algorithm to select discriminant features from the generated massive features. To this end, a new variable length feature
dictionary building algorithm is proposed in this study. First, a variable length word extraction method based on SFA is proposed. The
method dynamically selects the optimal number of Fourier values for different sliding window lengths. Second, a new feature discriminant
evaluation indicator is designed, and the generated features are selected according to its dynamic threshold. Experimental results show that,
based on the proposed time series dictionary, the logistic regression model can achieve high classification accuracy and find the
discriminant features in the prediction process.
Key words: time series classification; feature generation; discriminant feature selection; feature dictionary learning

时间序列是一系列按时间进行排序的实值数据组成的集合.在许多研究领域或实际应用领域之中存在着
大量的时间序列数据,例如恶意软件检测、风能预测、工业自动化、电压稳定评估、移动设备追踪等领域 [1−3] .

∗ 基金项目: 中央高校基本科研业务费专项资金(2018JBM014); 国家自然科学基金(61702030, 61672086); 北京市自然科学基
金(4182052); 北京市优秀人才项目资助(2017000020124G056)
Foundation item: Fundamental Research Funds for the Central Universities (2018JBM014); National Natural Science Foundation of
China (61702030, 61672086); Beijing Natural Science Foundation of China (4182052); Beijing Excellent Talents (2017000020124G056)
收稿时间: 2018-10-23; 修改时间: 2019-01-01; 采用时间: 2019-04-22

235 236 237 238 239 240 241 242 243 244 245