Page 8 - 《软件学报》2020年第10期

P. 8

2984 Journal of Software 软件学报 Vol.31, No.10, October 2020

achieve high accuracy and deal with complex constraints and expressions, which lead to too many false positives. And it will increase the
burden of developers. In this study, a static checking method is proposed based on taint analysis. First, a flow-sensitive, context-sensitive,
and on-demand pointer analysis is proposed to analyze the range of array length. Then, an on-demand taint analysis is performed for all
array indices and array length expressions. Finally, the rules are defined for checking array index out of bounds defects and the checking is
realized based on backward data ﬂow analysis. During the analysis, in order to deal with complex constraints and expressions, it is
proposed to check the satisfiability of the conditions by invoking the constraint solver. If none statement for avoiding array index
out-of-bound is found in the program, an array index out-of-bound warning will be reported. An automatic static analysis tool, Carray
bound have been implemented, and the experimental results show that Carraybound can work effectively and efficiently.
Key words: array index out-of-bounds; static analysis; buffer overflow; constraint solving

随着移动计算、物联网、云计算、人工智能、开源软件、RISC-V 开源指令集等领域的飞速发展,相关的
软/硬件都迎来了新的发展机遇和挑战.为了适应软/硬件的发展,当前也涌现出了很多新的编程语言和编译器.
但是,C 语言作为高效率、面向过程、抽象化的通用程序设计语言,仍然广泛应用于系统软件的开发.系统软件
中如果存在漏洞,就可能会被恶意利用,将会严重影响人们的生产生活,甚至威胁到生命财产安全.软件安全已
成为软件企业不能回避的挑战.
C 语言被广泛应用于底层软件生态系统的开发,这是因为 C 语言程序具有更高的运行效率.其中,数组是 C
语言最重要的数据结构之一.当一个数组在程序中被使用时,访问该数组的下标索引必须在一定范围之内,即不
小于 0 并且小于数组的大小,否则会造成数组下标越界.数组越界的缺陷在很多编译后的系统中都存在,而且通
过实验,我们发现,像 gcc、clang 这样主流的 C 语言编译器,在编译过程中不会对数组下标取值范围的合法性加
以严格检查.数组越界有两种模式,读越界和写越界.读越界会导致读取到随机的值,继而使用该值会导致未定
义行为.相比之下,写越界会造成更加严重的后果,不仅会造成未定义行为,甚至会导致控制流截取,使得攻击者
可执行任意恶意代码 [1,2] .如图 1 所示,根据 CVE 历史统计可以发现,占据前三的漏洞类型为拒绝服务、代码执
[3]
行和溢出 .而数组越界缺陷常常会导致拒绝服务、代码执行和溢出.比如 Adobe 阅读器 2017 之前版本中存在
的远程代码执行和拒绝服务漏洞就是由于外部输入作为数组下标从而可能导致写越界引起的(CVE-2017-
16391、CVE-2017-16410).同时,研究表明 [4−6] ,31%的缓冲区溢出是由数组越界造成的.因此,可见数组越界缺陷
仍然是严重威胁系统软件安全的重要缺陷类型.本文主要针对在给定 C 程序源码时,检测数组下标越界和由循
环导致的数组下标越界问题.为了提高系统软件的安全性,程序必须对由外部输入控制的数组下标进行边界检
查.但是,开发者可能会遗漏边界检查或者没有进行正确的边界检查,使得程序处于可被攻击利用的不利状态.

Fig.1 Statistics of the number of common types of CVE vulnerabilities
图 1 CVE 常见类型漏洞的数目统计

目前已有一些研究工作提出使用静态分析方法和动态测试方法来进行数组越界检查.由于动态方法总是
依赖于测试用例的完整性,从而导致这些方法无法达到足够高的程序覆盖率.静态分析方法通过扫描分析程序
源代码以检查程序中的缺陷.当前的静态分析方法出于效率的考虑,未对程序进行高精度的分析,未处理循环导
致的数组越界,并且多是基于规则的匹配方法,无法处理复杂约束、复杂表达式等情况,导致一些已有的数组越
界静态检测方法从而存在大量的误报和漏报现象.

3 4 5 6 7 8 9 10 11 12 13