Moritz Hardt Aaron Roth October 30, 2018 arXiv:1111.0623v1 [cs.DS] 2 ...

arXiv:1111.0623v1 [cs.DS] 2 Nov 2011

Beating Randomized Response on Incoherent Matrices

Moritz Hardt

Aaron Roth

October 30, 2018

Abstract

Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix.

We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m ? n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn).

IBM Research Almaden. Email: mhardt@us. Department of Computer and Information Sciences, University of Pennsylvania. Email: aaroth@cis.upenn.edu

1 Introduction

Consider a large m ? n matrix A in which rows correspond to individuals, columns correspond to movies, and the non-zero entry in A(i, j) represent the rating that individual i has given to movie j. Such a data set shares two important characteristics with many other data sets:

1. It can be represented as a matrix with very different dimensions. There are many more people than movies, so n m

2. It is composed of sensitive information: the rating that an individual gives to a particular movie (and the very fact that he watched said movie) can be possibly compromising information.

Nevertheless, although we want to reveal little about the existence of individual ratings in this data set, it might be extremely useful to be able to allow data analysts to mine such a matrix for statistical information. Even while protecting the privacy of individual entries, it might still be possible to release another matrix that encodes a great deal of information about the original data set. For example, we might hope to be able to recover the cut structure of the corresponding rating graph, perform principal component analysis (PCA), or apply some other data mining technique.

Indeed, this example is not merely theoretical. Data of exactly this form was released by Netflix as part of their competition to design improved recommender systems. Spectral methods such as PCA were commonly used on this dataset, and privacy concerns were acknowledged: Netflix attempted to "anonymize" the dataset in an ad-hoc way. Following this supposedly anonymized release, Naranyanan and Shmatikov [NS08] were able to re-identify many individuals in the dataset by crossreferencing the reviews with publicly available reviews in the internet movie database. As a result of their work, a planned second Netflix challenge was canceled. The story need not have ended this way however ? the formal privacy guarantee known as differential privacy could have prevented the attack of [NS08], and indeed, McSherry and Mironov [MM09] demonstrated that many of the recommender systems proposed in the competition could have been implemented in a differentially private way. [MM09] make use of private low-rank matrix approximations using input perturbation methods. In fact, it is not possible to generically improve on input perturbation methods for all matrices without violating blatant non privacy [DN03]. Nevertheless, in this paper, we give the first algorithms for low rank matrix approximation with performance guarantees that are significantly better than input perturbation, under certain commonly satisfied conditions which are already assumed in prior work on non-private low-rank matrix approximation.

In this paper, we consider the problem of privately releasing accurate low-rank approximations to datasets that can be represented as matrices. Such matrix approximations are one of the most fundamental building blocks for statistical analysis and data mining, with key applications including latent semantic indexing and principle component analysis. We provide theorems bounding the accuracy of our approximations as compared to the optimal low rank approximations in the Frobenius norm. The classical Eckart-Young theorem asserts that the optimal rank-k approximation of a matrix A (in either the Frobenius or Spectral norms) is obtained by computing the singular value decomposition A = UVT , and releasing the truncated SVD Ak = UkVT , where in k, all but the top k singular values have been zeroed out. Computing the SVD of a matrix takes time O(mn2). In addition to offering privacy guarantees, our algorithm is also extremely efficient: it requires only elementary matrix operations and simple noisy perturbations, and for constant k takes time only O(mn). This represents a happy confluence of the two goals of privacy and efficiency. Normally, the two are at odds, and differentially private algorithms tend to be (much) less efficient than their non-private counterparts. In this

2

case, however, we will see that some algorithms for fast approximate low-rank matrix approximation are much more amenable to a private implementation than their slower counterparts.

Computing low rank matrix approximations privately has been considered at least since [BDMN05], and to date, no algorithm has improved over simple input perturbation, which achieves an error (when compared with the best rank k approximation Ak) in Frobenius norm of ( k(n + m)). Although this error is optimal without making any assumptions on the matrix, this error can be prohibitive when the best rank k approximation is actually very good: when A- Ak F k(n + m). That is, exactly in the case when a low rank approximation to the matrix would be most useful. We give an algorithm which improves over input perturbation under the conditions that m n and that the coherence of the matrix is small: roughly, that no single row of the matrix is too significantly correlated with any of the right singular vectors of the matrix. Equivalently, no left singular vector has large correlation with one of the standard basis vectors. Low coherence is a commonly studied and satisfied condition. For example, Candes and Tao, motivated by the same Netflix Prize dataset re-identified by [NS08], consider the problem of matrix completion under low coherence conditions [CT10]. They show that matrix completion is possible under low coherence assumptions, and that several reasonable random matrix models exhibit a strong incoherence property. Notably, [CT10] were not concerned with privacy at all: they viewed low coherence as a natural assumption satisfied for datasets resembling the Netflix prize data that could be leveraged to obtain stronger utility guarantees. This represents a second happy confluence of the goals of data privacy and utility: low coherence is an assumption that others already make free of privacy concerns in order to improve the state of the art in data analysis. We show that the same assumption can simultaneously be leveraged for data privacy. In retrospect, low coherence is also an extremely natural condition in the context of privacy, although one that has not previously been considered in the literature. If a matrix fails to have low coherence, then intuitively the data of individual rows of the matrix is encoded closely in individual singular vectors. If it does have low coherence, no small set of singular vectors can be used to encode any row of the matrix with high accuracy, and intuitively, low rank approximations reveal less local information about particular entries of the matrix.

The problem we solve is the following: Given a matrix A and a target rank k we privately compute and release a rank O(k) matrix B such that A - B F is not much larger than A - Ak F, where Ak is the optimal rank k approximation to A, and ? F is the Frobenius norm. The quality of the approximation depends on several factors, including n, m, the desired rank k, and the coherence of the matrix. Our approach improves over input perturbation when the matrix coherence is small.

Our algorithm promises (, )-differential privacy [DMNS06] with respect to changes of any single row of magnitude 1 in the 2-norm. This is only stronger than the standard notion of changing any single entry in the matrix by a unit amount. In the very special case of the matrix representing a (possibly unbalanced) graph, this captures (for example) the addition or removal of a single edge. Therefore in this case our algorithm is promising edge privacy rather than vertex privacy. From a privacy point of view, this is less desirable than vertex privacy, but is still a strong guarantee which is appropriate in many settings. We note that edge privacy is well studied with respect to graph problems (see, e.g. [NRS07, GLM+10, GRU11]), and we do not know of any algorithms with non-trivial guarantees on graphs that promise vertex privacy, nor any algorithms in the more general case of matrices that promise privacy with respect to entire rows.

3

1.1 Our results

We start with our first algorithm that improves over randomized response on matrices of small Ccoherence. We say that an m ? n matrix A has coherence C, if no row has Euclidean norm more than C ? A F/ m, i.e., more than C times the the typical row norm.This parameter varies between 1 and

m, since no row can have Euclidean norm more than A F. Intuitively the condition says that no single row contributes too significantly to the Frobenius norm of the matrix.

Theorem 1.1 (Informal version of Theorem 6.2). There is an (, )-differentially private algorithm which given a matrix A m?n of coherence C such that n m computes a rank 2k matrix B such

that with probability 9/10,

A-B F

O ( A - Ak F ) + O,

km +

kn ?

Ck A F (nm)1/4

.

Moreover, the algorithm runs in time O(kmn).

Hidden in the O,-notation is a factor of O(log(k/)/) that depends on the privacy parameters.

Usually, 1/k so that log(k/) 2 log(1/). To understand the error bound note that the first term

is proportional to the best possible approximation error A - Ak F of any rank k approximation. In

particular, this term is optimal up to constant factors. The second term expresses a more interesting

phenomenon. Recall that we assume n m so that kn would usually dominate km except that

the the kn term is multiplied by a factor which can be very small if the matrix has low coherence

andis not O( m +

tono/mde1n/4se) .wFhoirchexcaamnpblee,

when k = O(1), C = O(1) and as small as O(n3/8) depending

AF= on the

O( n), the magnitude

error is roughly of m. However,

already in a much wider range of parameters we observe an error of o( kn). In fact, in Section C we

illustrate why the Netflix data satisfies the assumptions made here and why they are likely to hold in

other recommendersystems. When A F n, the previous theorem cannot improve on randomized response by more than

a factor of O(m1/4). Our next theorem uses a stronger but standard notion of coherence known as

?0-coherence. We defer a formal definition of ?0-coherence to Section 5, but we remark that this

parameter varies between 1 and m. Using this notion we are able to obtain improvements roughly of order O( m).

Theorem 1.2 (Informal version of Theorem 6.3). There is an (, )-differentially private algorithm which given a matrix A m?n with n m and of ?0-coherence ? and rank r 2k computes a rank 2k matrix B such that with probability 9/10,

A-B F

O(

A

-

Ak

F

)

+

O,

km

+

kn ?

?kr m

.

Moreover, the algorithm runs in time O(kmn).

The hidden factor here is the same as before. Note that when ?kr = polylog(n), the theorem can lead to an error bound O~ n1/4 depending on the magnitude of m. Note that this is roughly the square root of what randomized response would give. But again under much milder assumptions on the coherence, the error remains o kn . Notably, Candes and Tao [CT10] work with a stronger incoherence assumption than what is needed here. Nevertheless they show that even their stronger assumption is satisfied in a number of reasonable random matrix models. A slight disadvantage of

4

the error bound in Theorem 1.2 is that the actual rank r of the matrix enters the picture. Theorem 1.2 hence cannot improve over Theorem 1.1 when the matrix has very large rank. We do not know if the dependence on r in the above bound is inherent or rather an artifact of our analysis.

Finally, we remark that while our result depends on the ?0-coherence of the input matrix, our algorithm does not require knowledge or estimation of the ?0-coherence of the input matrix. The only parameters provided to the algorithm are the target rank and the privacy parameters.

Reconstruction attacks and tightness of our results. As it turns out, existing work on "blatant

non-privacy" and reconstruction attacks [DN03] demonstrates that our results are essentially tight

under the given assumptions. To draw this connection, let us first observe why input perturbation

cannot be improved without any assumption on the matrix. To be more precise, by input perturbation

we refer to the method which simply perturbs each entry of the matrix with independent Gaussian noise of magnitude O -1 log(1/) , which is sufficient to achieve (, )-differential privacy with

respect to unit 2 perturbations of the entire matrix. To obtain a rank k approximation to the original matrix, one can then simply compute the exactly optimal rank k approximation to the perturbed matrix using the singular value decomposition, which as one can show introduces error O, km + kn

compared to the optimal rank k approximation to the original matrix in the Frobenius norm. First,

let us observe that it is not possible in general to have an algorithm which guarantees error in the Frobenius norm of o( kn) for every matrix A, without violating blatant non-privacy1, as defined by

[DN03]. This is because there is a simple reduction which starts with an (, )-differentially private algorithm for computing rank k approximations to matrices A m?n and gives an (, )-differentially private algorithm which can be used to reconstruct almost every entry in any database D {0, 1}n for n = k ? n. It is known that (, )-private mechanisms do not admit such reconstruction attacks,

and so the result is a lower bound. The reduction follows from the fact that we can always encode a bit-valued database D {0, 1}n for n = k ? n as k rows of an m ? n matrix for any m k, simply

by zeroing out all additional m - k rows. Note that the resulting matrix only has rank k, and so the

woApeti-wmoAaullrdFanh=kavkoe(apApkirno-)x,iAtmhi ias2tiwo=nouotol(dthmnis)e,amannatdthriaxtAhfioa-sr

zero error. If we could recover a matrix A such that

a typical nonzero row Ai of the matrix with i [k], Ai 1 o(n). Then, by simply rounding the entries,

we could reconstruct the original database D in almost all of its entries, giving blatant non-privacy as

defined by [DN03].

What is happening in the above example? Intuitively, the problem is that in the rank k matrix we

construct from D, the k nonzero rows of the matrix are encoded accurately by only k right singular

vectors. On the other hand, low coherence implies that any k right singular vectors poorly represent a

set of only k rows. Hence, there is hope to circumvent the above impediment using a low coherence

assumption on the matrix. Indeed, this is precisely what Theorem 1.1 and Theorem 1.2 demonstrate.

Nevertheless, reconstruction attacks still lead to lower bounds even under low coherence assumptions.

Indeed, using the above ideas, the next proposition shows that Theorem 1.1 is essentially tight up to a factor of O( k). Since in many applications k = O(1), this discrepancy between our upper bound

and the lower bound is often insignificant.

Proposition 1.3. Any algorithm which given an m ? n matrix A of coherence C outputs a rank k

1An algorithm M is blatantly non-private if for every database D {0, 1}n it is possible to reconstruct a 1 - o(1) fraction of the entries of D exactly, given only the output of the mechanism M(D).

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download