Page 455 - 《软件学报》2025年第7期

P. 455

3376 软件学报 2025 年第 36 卷第 7 期

accurately identify harmful videos that are encrypted and transmitted in the network. The existing methods collect traffic data at main
network access points to extract the features of encrypted video traffic and identify the harmful videos by matching the traffic features
based on harmful video databases. However, with the progress of encryption protocol for video transmission, HTTP/2 using new
multiplexing technologies has been widely applied, which makes the traditional traffic analysis method based on HTTP/1.1 features fail to
identify encrypted videos using HTTP/2. Moreover, the current research mostly focuses on videos with a fixed resolution during playback.
Few studies have considered the impact of resolution switching in video identification. To address the above problems, this study analyzes
the factors that cause offsets in the length of the audio/video data during the HTTP/2 transmission process and proposes a method to
precisely reconstruct corrected fingerprints for encrypted videos by calculating the size of the combined audio and video segments in the
encrypted traffic. The study also proposes an encrypted video identification model based on the hidden Markov model and the Viterbi
algorithm by using the corrected fingerprints of encrypted videos and a large plaintext fingerprint database for videos. The model applies
dynamic planning to solve the problems caused by adaptive video resolution switching. The proposed model achieves identification
accuracy of 98.41% and 97.91% respectively for encrypted videos with fixed and adaptive resolutions in 400 000-level fingerprint
databases, namely Facebook and Instagram. The study validates the generality and generalization of the proposed method using three video
platforms: Triller, Twitter, and Mango TV. The higher application value of the proposed method has been validated through comparisons
with similar work in terms of recognition effectiveness, generalization, and time overhead.
Key words: HTTP/2; DASH; large database; corrected fingerprinting; HMM; encrypted video identification

近年来, 随着移动互联网的发展, 视频应用已经成为互联网中的主流应用. YouTube、Facebook、TikTok 和抖
音等国内外视频分享平台为用户提供了便捷的视频分享和转发功能. 根据 Ericsson Mobility 网站 [1] 的报告, 2023
年全球所有移动网站流量中约有 73% 是视频流量, 预计到 2028 年底, 这一比例将达到 80%, 其中来自社交媒体平
台的视频流量占比最高. 互联网中传播的视频已经深度渗透到网民的社会生活中.
由于视频平台和社交平台中的视频来源多样, 如果平台审核不及时, 各平台提供的视频也会包含部分色情、
暴力、谣言等类别的有害视频, 本文称为公害视频, 这些公害视频给网络空间和社会造成了严重的不良影响. 然
而, 公害视频因其数量庞大、制作成本低、传播速度快、加密传输等特点 [2] , 给监管造成了极大的困难.
网络监管部门为了防止公害视频的传播, 可以采取多种方式进行监管. 对一些专门提供公害视频的平台, 当监
管部门识别出这些视频平台的 IP 地址或域名后, 如果视频平台在本网络管理域范围内, 可以对视频平台进行处罚
和治理; 如果视频平台不在本网络管辖域范围内, 可以根据 IP 地址或域名对其在本管理域内的传播进行阻断. 但
是, 互联网中绝大部分视频平台是正常视频平台, 在大量的正常视频中只有少量的公害视频. 在这样的现实场景
中, 如果能精准地识别视频内容, 就可以对公害视频的传播进行细粒度管控, 对正常的视频流量则无须管控, 从而
达到精细化网络管理的目的. 为达成这一目标, 需要及时识别出被传输的公害视频.
目前, 全球主流视频平台均已采用加密技术来传输视频数据. 根据 W3Techs 网站 [3] 发布的报告, 在全球网站
中, 默认使用 HTTPS 等加密协议的网站比例已从 2023 年 1 月的 81.5% 上升至 2024 年 1 月的 85.6%. 随着互联网
中加密流量占比的提升, 尤其是加密视频流量占比的迅速提升, 加密传输技术给普通用户带来安全保护的同时, 也
给不法分子套上了一层加密外衣 [4,5] , 导致网络监管机构对网络环境的监管难度成倍提升.
根据数据来源的不同, 现有对公害视频进行识别的方法主要有两类. 第 1 类方法是分析视频平台的视频文件,
通过深度学习对视频中的图像进行学习 [6,7] , 基于训练出的模型对未知视频抽取画面帧进行内容识别, 从而对识别
出的公害视频进行传播阻断. 这类方法的数据源是视频平台的视频文件, 适用于视频平台的管理者进行内容审查,
但是这类方法需要的硬件资源价格昂贵, 很多小的平台迫于成本和技术的限制无力实施, 也有一些视频平台主观
上不愿意进行内容审查, 导致网络公害视频泛滥. 第 2 类方法是分析在网络主要接入点采集的流量数据, 提取加密
视频流量的特征 [8,9] , 基于已有的公害视频库, 通过流量特征的匹配识别出被传输的公害视频. 这类方法不需要寻
求平台的合作, 监管部门部署时具有很好的可控性. 难点在于, 由于网络加密传输协议的持续演进, 已有的方法无
法分析使用新协议传输的数据. 使用多路复用技术传输的协议, 如 HTTP/2, 已经被广泛部署并极大地改变了加密
流量的传输特点, 因此第 2 类方法需要对使用多路复用技术而产生的新流量特征展开分析.
目前已有不少从网络流量中识别视频的研究成果. 根据识别目标的不同, 可分为加密视频平台分类 [10,11] 、加

450 451 452 453 454 455 456 457 458 459 460