Page 289 - 《软件学报》2020年第11期
P. 289
3604 Journal of Software 软件学报 Vol.31, No.11, November 2020
Abstract: Fast block-wise motion estimation algorithm based on translational model solves the high computational complexity issue to
some extent, but it sacrifices the motion compensation quality, whilst the higher-order motion model still exhibits the problems of
computationally inefficiency and unstable convergence. Through a number of experiments, it is found that about 56.21% of the video
blocks contain zoom motion, thus a conclusion is drawn that zoom motion is one of the most important motion forms in video except for
the translational motion. Therefore, a zoom coefficient is introduced into the conventional block-wise translational model by bilinear
interpolation, and model the motion-compensated error into a quadratic function with regard to the zoom coefficient. Subsequently, the
approach is derived to compute the optimal zoom coefficient under the condition of 1D zoom motion through Vieta’s theorem, which is
further extended to the condition of 2D zoom motion with equal proportion. Based on the above, a fast block-matching motion estimation
algorithm is presented and is optimized by the adaptive zoom coefficient. It first uses the diamond search (DS) to compute the
translational motion vector, and then determines an optimal matching block for the block to be predicted with the adaptive zoom
coefficient. Experimental results carried out on 33 standard test video sequences showed that the proposed algorithm gains separately 0.11
dB and 0.64 dB higher motion-compensated peak signal-to-noise ratio (PSNR) than those of the full search (FS) and the DS based on
block-wise translational model. And its computational complexity is 96.02% lower than that of the FS, slightly higher than that of the DS.
Compared with the motion estimation based on the zoom model, the average PSNR of the proposed algorithm is 0.62 dB lower than that
of 3D full search, but 0.008 dB higher than that of fast 3D diamond search. And the computational complexity only amounts to 0.11% and
3.86% of the 3D full search and the 3D diamond search, respectively. Meanwhile, the proposed algorithm can realize the self-
synchronization between the encoder and decoder without transmitting the zoom vectors, so it does not increase the overhead of the side
information. Additionally, the proposed adaptive zoom coefficient computation can also be combined with state-of-art fast block-wise
motion estimation algorithms other than the diamond search, improving their motion-compensation quality.
Key words: video coding; motion estimation; block matching; zoom model; adaptive zoom coefficient
运动估计是 AVS、H.265/HEVC 和 MPEG 等视频编码器所采用的一种时间域差分预测方法,它为“差分预
测+变换”的闭环反馈编码架构带来了最主要的编码增益 [1,2] .然而文献[1,3,4]在对各个编码环节的计算量进行
定量分析后发现,运动估计环节的计算开销占整个编码器所需计算资源的 40%以上.若视频编码器开启了可变
块尺寸的 1/8 像素运动估计、自适应运动矢量预测等高级模式,运动估计甚至会耗费编码器全部计算资源的
80%.在这种情况下,为了达到更加合理的码率-失真-计算复杂度(rate-distortion-complexity,简称 R-D-C)性能,现
有视频编码标准均采用了基于平移模型的块匹配算法来去除由物体平移运动所产生的时间域冗余,并出现了 7
[5]
类快速视频运动估计算法 .
(1) 基于候选向量下采样的运动估计:按照某种原则(如中心偏置原则),选择搜索窗口中的少数运动向量
[6]
作 为 候 选向量 集合 ,进而确 定补 偿误差 最小 的候选 向量 作为运 动向 量 , 如 UMHexagonS 、
[7]
[9]
[8]
TZSearch 、EPZS 和抛物线搜索 等.
(2) 基于像素下采样的运动估计:采用某种采样矩阵(如层次采样矩阵 [10,11] 、梅花形采样矩阵 [12] 和自适应
采样矩阵 [13] )将待匹配的宏块进行下采样,进而在搜索窗口中,为尺寸缩小了的待预测宏块计算每个
候选向量的补偿误差,得到最佳运动向量.
(3) 基于像素预排序的运动估计:在计算待匹配宏块的运动补偿误差时,采用一定的预测策略确定宏块中
可能产生较大帧差的像素,并优先统计其对应的运动补偿误差.若累积误差超过当前最优向量的预测
误差,则可排除该候选向量成为最佳运动向量的可能,例如 PDS 算法 [14,15] 等.
(4) 基于低复杂度匹配函数的运动估计:采用异或、比较、取绝对值等运算替代均方差函数的减法、乘
法、平方根操作,从而减少计算运动补偿误差所需要的 CPU 时钟周期数及硬件开销,如文献[16,17]等.
(5) 基于低比特深度像素的运动估计:采用某种位深度映射函数,将具有较高位深(如 12bit 和 8bit)的像素
转换为低位深的像素,进而将多个像素的低位深表示合并到 1 个机器字中.若与第(4)类的低复杂度匹
配函数联合使用,则能达到 1 次操作即可求解多个像素的补偿误差的目的,如 1bit 运动估计 [18] 和 2bit
运动估计 [19,20] 等.
(6) 基于散列表的运动估计:利用散列函数将待匹配块的像素值映射为一个散列值,再借助散列表查找最
佳匹配块 [21−23] ,从而避免了匹配误差的重复计算,可将运动估计的时间复杂度由平方阶降低到线性