Page 223 - 《软件学报》2021年第11期
P. 223

欧阳佳  等:面向频繁项集挖掘的本地差分隐私事务数据收集方法                                                  3549


                                                   α
                                                  −⋅ dist (, )t 1 t′
                                                 e   2
                                                      α
                                                     −⋅ dist (, )t 1 z  −⋅ dist (, )t 1 t′  − ⋅ dist (t 2 , )z
                                                                               α
                                                                α
                                  Pr[ ( ) tΦ  =  ] t′  ∑  e  2  e  2   ∑     e   2
                                                 ∈
                                                                         ∈
                                       1    ≤   z CandI     =         ⋅  z CandI                     (12)
                                                                               α
                                                                α⋅
                                                      2 t t′
                                  Pr[ ( ) =  ] t′  e − α  dist 2 (, )  e −⋅ dist 2 (t 2 , ) t′  ∑  e −⋅ dist 2 (t 1 , ) z
                                    Φ
                                      t
                                       2
                                                                         ∈
                                                                         z CandI
                                                      α
                                                     −⋅ dist (, )
                                                         2 t z
                                                                 *
                                              ∑ z CandI e  2                 **
                                                ∈
                    首先处理*部分,可以得到:
                                                  α
                                                 −⋅ dist (, )t 1 t′
                                                e   2     α⋅  [dist (t 2 , )t′ −  dist (t 1 , )]t′
                                                 −⋅ dist (, )  =  e  2                               (13)
                                                     2 t t′
                                                  α
                                                e   2
                    由于 dist 为距离度量,满足三角不等式的性质,即 dist(t 2 ,t′)−dist(t 1 ,t′)≤dist(t 1 ,t 2 )成立,因此可以得到*部分的
                 结论:
                                                    −⋅ dist (, )t 1 t′
                                                    α
                                                   e   2      α⋅ dist (,t 12 )t
                                                           ≤ e  2                                    (14)
                                                        2 t t′
                                                    −⋅ dist (, )
                                                    α
                                                   e   2
                    接下来处理**部分,先对分子进行处理,得到公式(15).
                                                              α⋅
                                                                          α
                                                                    α
                                             − α  dist (t 2 , )z  −⋅ dist (t 2 ,)z + ⋅ dist (t 1 , )z −⋅ dist (t 1 ,)z
                                      ∑      e  2     ∑     e         2
                                        z CandI∈  −⋅ dist (, )  =  z CandI∈  −⋅ dist (t 1 , ) z      (15)
                                              α
                                                                    α
                                                  1 t z
                                      ∑  z CandI∈  e  2     ∑  z CandI∈  e  2
                    对公式(15)应用三角不等式,dist(t 1 ,z)−dist(t 2 ,z)≤dist(t 1 ,t 2 ),得到公式(16).
                                                                       α
                                                 α
                                                −⋅ dist (, )t 2 z  α  ⋅ dist ( ,)t 1 2 t  − ⋅ dist (t 1 , )z
                                         ∑     e   2      ∑     e     2
                                           z CandI∈  −⋅ dist (, )t 1 z  ≤  z CandI∈  − ⋅ dist (t 1 , )z  (16)
                                                                    α
                                                 α
                                         ∑  z CandI∈  e  2  ∑  z CandI∈  e  2
                    观察公式(16),发现α⋅dist(t 1 ,t 2 )与 z 无关,可以直接从求和运算中提出,得到公式(17)为**部分的结论:
                                           −⋅ dist (, )t 2 z  α  ⋅ dist ( ,t 1 2 )t  − ⋅ dist (t 1 , )z
                                           α
                                                                    α
                                    ∑     e   2     e  2   ⋅ ∑    e   2      α⋅ dist (,t 12 )t
                                      z CandI∈  −⋅ dist (, )  ≤  z CandI∈  − ⋅ dist (t 1 , ) z  ≤  e  2  (17)
                                               1 t z
                                                                α
                                            α
                                    ∑  z CandI∈  e  2   ∑  z CandI∈  e  2
                    综合*部分与**部分的结论,完成定理 1 的证明:
                                          Pr[ ( )tΦ  =  ] t′  α  dist (,t 12 )t  α⋅  ⋅  dist ( ,t 1 2 )t
                                              1      e   2   e ⋅ ≤  2  =  e α⋅ dist (,t 12 )t        (18)
                                            Φ
                                          Pr[ ( ) =  ] t′
                                              t
                                              2
                    定理 1 证明完毕.                                                                        □
                 3.3   支持度计数的分布估计
                    项支持度计数的估计是差分隐私事务数据收集的重要任务,是评价隐私保护数据收集方法的重要指标,项
                 支持度计数定义为包含该项的事务数据的数目,即 P a =#{t i |a∈t i ,i∈[1,n]}.考虑项 a i 以及事务数据 t,如果 a i ∈t,收集
                 到的事务数据 t′中同样包含 a i 的概率定义为真正率 TPR(true positive rate):
                                           −  α  ⋅  (k −  1)  k ⎛  −  α  ⋅  (k inter  )  ⎞
                                                                −
                                                                            −
                                          e  2  ⋅  (C k − 1 ) +  ∑  e ⎜  2  ⋅  (C inter− 1  ⋅  C k inter )⎟
                                                   d      2 ⎝  ⎜      m− 1  d   ⎟
                                     TPR =             inter=                   ⎠                    (19)
                                                             Ω
                    反之,如果 a i ∉t,而收集到的事务数据 t′中包含 a i 的概率定义为错正率 FPR(false positive rate):
                                             −  α  0 ⋅  k − 1 ⎛  −  α  ⋅  (k −  inter )  ⎞
                                                                           −
                                                   1
                                                    +
                                            e  2  ⋅  C k −  d −  1 ∑  ⎜  e ⎜  2  ⋅  (C m inter  ⋅  C d −  k −  1 inter )⎟  ⎟
                                                                          1
                                       FPR =          inter= 1 ⎝              ⎠                      (20)
                                                             Ω
   218   219   220   221   222   223   224   225   226   227   228