Enhanced Synonym Queries Supported Encrypted Cloud Multi ...

ISSN 2319-8885 Vol.04,Issue.19, June-2015, Pages:3533-3538



Enhanced Synonym Queries Supported Encrypted Cloud Multi-Keyword

Ranked Search System

K. SWATHI PRIYA1, B. SHEREESHA2

1PG Scholar, Dept of CSE, Sreenivasa College of Engineering & Technology, Kurnool, AP, India,

E-mail: Spkswathi1@. 2Assistant Professor, Dept of CSE, Sreenivasa College of Engineering & Technology, Kurnool, AP, India,

E-mail: shereesha123@.

Abstract: We first propose a basic idea for the Multi-keyword Ranked Search over Encrypted cloud data (MRSE) based on secure inner product computation and efficient similarity measure of coordinate matching, i.e., as many matches as possible, in order to capture the relevance of data documents to the search query, then we give two significantly improved MRSE schemes to achieve various stringent privacy requirements in two different threat models. Assignment of anonymous ID to the user to provide more security to the data on cloud server is done. To improve the search experience of the data search service, further extension of the two schemes to support more search semantics is done. As an enhancement we enhance the existing system and in this paper we propose an effective approach to solve the problem of multi-keyword ranked search over encrypted cloud data supporting synonym queries. The main contribution of this paper is summarized in two aspects: multi-keyword ranked search to achieve more accurate search results and synonym-based search to support synonym queries. Meanwhile, existing search approaches over encrypted cloud data support only exact or fuzzy keyword search, but not semantics-based multi-keyword ranked search. Therefore, how to enable an effective searchable system with support of ranked search remains a very challenging problem.

Keywords: Cloud Computing, Searchable Encryption, Privacy-Preserving, Keyword Search, Ranked Search.

I. INTRODUCTION Cloud computing is the long dreamed vision of computing as a utility, where cloud customers remotely store their data into the cloud so as to enjoy the on-demand high-quality applications and services from a shared pool of configurable computing resources. Its great flexibility and economic savings are motivating both individuals and enterprises to outsource their local complex data management system into the cloud. To protect privacy of data and oppose unsolicited accesses in the cloud and beyond it, sensitive data, for instance, e-mails, personal health records, photo albums, tax documents, and so on, may have to be encrypted by data owners before outsourcing to the commercial public cloud; this, however, obsoletes the traditional data utilization service based on plaintext keyword search. The insignificant solution of downloading all the data and decrypting locally is clearly impractical, due to the large amount of bandwidth cost in cloud scale systems. Images also contain useful and important information, so proposed system also provides image tagging in MRSE scheme. Moreover, aside from eliminating the local storage management, storing data into the cloud doesn't serve any purpose unless they can be easily searched and utilized. Hence, exploring privacy preserving and effective search service over encrypted cloud data is of great importance.

Considering potentially huge number of on-demand data users and large amount of outsourced data documents in the cloud, this problem is particularly challenging as it is extremely difficult to meet also the requirements of performance, system usability, and scalability. Document ranking is provided for fast search, but the priorities of all the data documents is kept same so that the cloud service provider and third party remains unaware of the important documents, thus, maintaining privacy of data. Ranked search can also elegantly eliminate unnecessary network traffic by sending back only the most relevant data, which is highly desirable in the "pay-as-you-use" cloud paradigm. For privacy protection, such ranking operation, however, should not leak any keyword related information. Besides, to improve search result accuracy as well as to enhance the user searching experience, it is also necessary for such ranking system to support multiple keyword searches, as single keyword search often yields far too coarse results. As a common practice indicated by today's web search engines (ex. Google search), data users may tend to provide a set of keywords instead of only one as the indicator of their search interest to retrieve the most relevant data. Along with the privacy of data and efficient searching schemes, real privacy is obtained only if the user's identity remains hidden from the Cloud Service Provider (CSP) as well as the third party user on the cloud server.

Copyright @ 2015 IJSETR. All rights reserved.

K. SWATHI PRIYA, B. SHEREESHA

In this paper, for the first time, we define and solve the problem of multi-keyword ranked search over encrypted cloud data (MRSE) while preserving strict system wise privacy in the cloud computing paradigm as shown in Fig.1. Among various multi-keyword semantics, we choose the efficient similarity measure of "coordinate matching," i.e., as many matches as possible, to capture the relevance of data documents to the search query. Specifically, we use "inner product similarity" the number of query keywords appearing in a document, to quantitatively evaluate such similarity measure of that document to the search query. During the index construction, each document is associated with a binary vector as a sub index where each bit represents whether corresponding keyword is contained in the document. The search query is also described as a binary vector where each bit means whether corresponding keyword appears in this search request, so the similarity could be exactly measured by the inner product of the query vector with the data vector. However, directly outsourcing the data vector or the query vector will violate the index privacy or the search privacy.

Fig.1. Architecture of the search over encrypted cloud data.

To meet the challenge of supporting such multi-keyword semantic without privacy breaches, we propose a basic idea for the MRSE using secure inner product computation, which is adapted from a secure k-nearest neighbor (kNN) technique, and then give two significantly improved MRSE schemes in a step-by-step manner to achieve various stringent privacy requirements in two threat models with increased attack capabilities. Our contributions are summarized as follows:

For the first time, we explore the problem of multikeyword ranked search over encrypted cloud data, and establish a set of strict privacy requirements for such a secure cloud data utilization system.

We propose two MRSE schemes based on the similarity measure of "coordinate matching" while meeting different privacy requirements in two different threat models.

We investigate some further enhancements of our ranked search mechanism to support more search semantics and dynamic data operations.

Thorough analysis investigating privacy and efficiency guarantees of the proposed schemes is given, and

experiments on the real-world data set further show the proposed schemes indeed introduce low overhead on computation and communication.

Compared with the preliminary version [2] of this paper, this journal version proposes two new mechanisms to support more search semantics. This version also studies the support of data/index dynamics in the mechanism design. Moreover, we improve the experimental works by adding the analysis and evaluation of two new schemes. In addition to these improvements, we add more analysis on secure inner product and the privacy part.

II. LITERATURE SURVEY A. Secured Multi-keyword Ranked Search over Encrypted Cloud Data

In cloud computing data possessor are goaded to farm out their complex data management systems from local sites to the commercial public cloud for greater flexibility and economic savings. To ensure safety of stored data, it is must to encrypt the data before storing. It is necessary to invoke search with the encrypted data also. The specialty of cloud data storage should allow copious keywords in a solitary query and result the data documents in the relevance order. In, main aim is to find the solution of multi-keyword ranked search over encrypted cloud data (MRSE) while preserving strict system-wise privacy in the cloud computing paradigm. A variety of multi- keyword semantics are available, an efficient similarity measure of "coordinate matching" (as many matches as possible), to capture the data documents' relevancy to the search query is used. Specifically "inner product similarity", i.e., the number of query keywords appearing in a document, to quantitatively evaluate such similarity measure of that document to the search query is used in MRSE algorithm. The main limitation of this paper was, the user's identity (ID) is not kept hidden. Due to this, whoever puts the data on Cloud Service Provider was known. This may be risky in some situations where confidentiality of data needs to be maintained. Hence, this drawback is overcome in the proposed system.

B. Privacy Preserving Keyword Searches on Remote Encrypted Data

Consider the problem: a user U wants to store his files in an encrypted form on a remote file server S. Later the user U wants to efficiently retrieve some of the encrypted files containing specific keywords, keeping the keywords themselves secret and not to endanger the security of the remotely stored files. For example, a user may want to store old e-mail messages encrypted on a server managed by Yahoo or another large vendor, and later retrieve certain messages while travelling with a mobile device. In, solutions for this problem under well-defined security requirements are offered. The schemes are efficient as no public-key cryptosystem is involved. Indeed, the approach is independent of the encryption method chosen for the remote files. They are incremental too. In that, user U can submit new files which are secure against previous queries but still searchable against future queries. From this, the main theme taken is of storing data remotely on other server and retrieving that data from anywhere via mobile, laptop etc.

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.19, June-2015, Pages: 3533-3538

Enhanced Synonym Queries Supported Encrypted Cloud Multi-Keyword Ranked Search System

C. Cryptographic Cloud Storage When the benefits of using a public cloud infrastructure

are clear, it introduces significant security and privacy risks. In fact, it seems that the biggest obstacle to the adoption of cloud storage (and cloud computing in general) is concern over the confidentiality and integrity of data. In, an overview of the benefits of a cryptographic storage service, for example, reducing the legal exposure of both customers and cloud providers, and achieving regulatory compliance is provided. Besides this, cloud services that could be built on top of a cryptographic storage service such as secure backups, archival, health record systems, secure data exchange and e-discovery is stated briefly.

D. Efficient and Secure Multi-Keyword Search on Encrypted Cloud Data

On one hand, users who do not necessarily have prior knowledge of the encrypted cloud data, have to post process every retrieved file in order to find ones most matching their interest; On the other hand, invariably retrieving all files containing the queried keyword further incurs unnecessary network traffic, which is absolutely undesirable in today's pay-as-you-use cloud paradigm. This paper has defined and solved the problem of effective yet secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by returning the matching files in a ranked order regarding to certain relevance criteria (e.g., keyword frequency) thus making one step closer towards practical deployment of privacy-preserving data hosting services in Cloud Computing. For the first time, the paper has defined and solved the challenging problem of privacypreserving multi-keyword ranked search over encrypted cloud data (MRSE), and establishes a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. The proposed ranking method proves to be efficient to return highly relevant documents corresponding to submitted search terms. The idea of proposed ranking method is used in our proposed system in order to enhance the security of data on Cloud Service Provider.

E. Providing Privacy Preserving in Cloud Computing Privacy is an important issue for cloud computing, both in

terms of legal compliance and user trust and needs to be considered at every phase of design. The paper tells the importance of protecting individual's privacy in cloud computing and provides some privacy preserving technologies used in cloud computing services. Paper tells that it is very important to take privacy into account while designing cloud services, if these involve the collection, processing or sharing of personal data. From this paper, main theme taken is of preserving privacy of data. This paper only describes privacy of data but doesn't allow indexed search as well as doesn't hide user's identity. Thus, these two drawbacks are overcome in our proposed system.

F. Privacy Preserving Data Sharing With Anonymous ID Assignment

In this paper, an algorithm for anonymous sharing of private data among N parties is developed. This technique is used iteratively to assign these nodes ID numbers ranging from 1 to N. This assignment is anonymous in that the

identities received are unknown to the other members of the group. In, existing and new algorithms for assigning anonymous IDs are examined with respect to trade-offs between communication and computational requirements. These new algorithms are built on top of a secure sum data mining operation using Newton's identities and Sturm's theorem. The main idea taken from this paper is of assigning anonymous ID to the user on the cloud.

G.Enabling Efficient Fuzzy Keyword Search over Encrypted Data in Cloud Computing

In this paper, main idea is to formalize and solve the problem of effective fuzzy keyword search over encrypted cloud data while maintaining keyword privacy. This basic idea is taken but it is for multi-keyword raked search (MRSE scheme) in our proposed system. In, design of secure cloud storage service which addresses the reliability issue with near optimal overall performance is proposed.

III. EXISTING AND PROPOSED SYSTEMS A. Existing System

The large number of data users and documents in cloud, it is crucial for the search service to allow multi-keyword query and provide result similarity ranking to meet the effective data retrieval need. The searchable encryption focuses on single keyword search or Boolean keyword search, and rarely differentiates the search results.

B. Proposed System We define and solve the challenging problem of privacy-

preserving multi-keyword ranked search over encrypted cloud data (MRSE), and establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. Among various multi-keyword semantics, we choose the efficient principle of "coordinate matching".

1. Proposed Scheme In this section, we give a detailed description of our

scheme. We firstly propose to employ "Latent Semantic Analysis" to implement the latent semantic multi-keyword ranked search.

Our Scheme: Data owner wants to outsource m data files D {d1, d2, dm} that he prepares to outsource to the cloud server in encrypted form while still keeping the capability to search through them. To do so, data owner firstly builds a secure searchable index I from a set of n distinct keywords W =extracted from the file collection D. According to the above definition about LSA, the data owner builds a term-document matrix A . MatrixA can be decomposed into the product of three other matrices. And then, we reduce the dimensions of the original matrix A to get a new matrix A which is calculated the best "reduced-dimension" approximation to the original term-document matrix . With t keywords of interest in W =as input, one binary vector Q is generated. In this section, we show a thorough experimental evaluation of the proposed technique on a real dataset: the MED dataset .The whole experiment is implemented by C++ language on a computer with Core 2.83GHz Processor, on Windows 7 system. For the proposed scheme, we will reduce to separate

International Journal of Scientific Engineering and Technology Research Volume.04, IssueNo.19, June-2015, Pages: 3533-3538

K. SWATHI PRIYA, B. SHEREESHA

dimensions. The performance of our method is compared with the original MRSE scheme.

Efficiency: The proposed scheme is depicted in details in previous section, except the Key Gen algorithm. In our scheme, we adopt Gauss-Jordan to compute the inverse matrix. The time of generating key is decided by the scale of the matrix. Besides, the proposed scheme that processed by SVD algorithm will consume time. Other algorithms, such as index construction, trapdoor generation, query, which is put forward by us, are consistent with the original MRSE in timeconsuming.

Measure: In this paper, we still use the measure of traditional information retrieval. Before the introduction of the F-measure's concept, we will firstly give the brief of the

Fig.2. With different choice of standard deviation for the random variable , there exists tradeoff between (a) Precision, and (b) Rank Privacy.

precision and recall. Precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. F-measure that combines precision and recall is the harmonic mean of precision and recall. Here, we adopt Fmeasure to weigh the result of our experiments.

To evaluate this privacy guarantee, we first define the rank

perturbation as

where ri is the rank

number of document Fi in the retrieved top-k documents and r'i is its rank number in the real ranked documents. The overall rank privacy measure at point k is then defined as the

average of all the for every document i in the retrieved top-

IV. PERFORMANCE ANALYSIS In this section, we demonstrate a thorough experimental evaluation of the proposed technique on a real-world data set: the Enron Email Data Set. We randomly select different number of e-mails to build data set. The whole experiment system is implemented by C language on a Linux Server with Intel Xeon Processor 2.93 GHz. The public utility routines by Numerical Recipes are employed to compute the inverse of matrix. The performance of our technique is evaluated

k documents, denoted as = /k. Fig. 2b shows the rank

privacy at different points with two standard deviations = 1 and = 0.5, respectively. From these two figures, we can see that small leads to higher precision of search result but lower rank privacy guarantee, while large results in higher rank privacy guarantee but lower precision. In other words, our scheme provides a balance parameter for data users to satisfy their different requirements on precision and rank privacy.

regarding the efficiency of four proposed MRSE schemes, as well as the tradeoff between search precision and privacy.

B. Efficiency Index Construction: To build a searchable sub index Ii for

A. Precision and Privacy As presented in dummy keywords are inserted into each

data vector and some of them are selected in every query. Therefore, similarity scores of documents will be not exactly accurate. In other words, when the cloud server returns top-k documents based on similarity scores of data vectors to query vector, some of real top-k relevant documents for the query may be excluded. This is because either their original similarity scores are decreased or the similarity scores of some documents out of the real top-k are increased, both of which are due to the impact of dummy keywords inserted into data vectors. To evaluate the purity of the k documents retrieved by user, we define a measure as precision Pk = k'/k where k' is number of real top-k documents that are returned by the cloud server. Fig. 2a shows that the precision in MRSE scheme is evidently affected by the standard deviation of the random variable . From the consideration of effectiveness, standard deviation is expected to be smaller so as to obtain high precision indicating the good purity of

each document Fi in the data set F, the first step is to map the keyword set extracted from the document Fi to a data vector Di, followed by encrypting every data vector. The time cost of mapping or encrypting depends directly on the dimensionality of data vector which is determined by the size of the dictionary, i.e., the number of indexed keywords. And the time cost of building the whole index is also related to the number of sub index which is equal to the number of documents in the data set. Fig. 4a shows that, given the same dictionary where |W| = 4,000, the time cost of building the whole index is nearly linear with the size of data set since the time cost of building each sub index is fixed. Fig. 4b shows that the number of keywords indexed in the dictionary determines the time cost of building a sub index. As presented in the major computation to generate a sub index in MRSE_I includes the splitting process and two multiplications of a (n +2) ? (n+2) matrix and a (n+2)dimension vector where n = |W|, both of which have direct relationship with the size of dictionary.

retrieved documents. However, user's rank privacy may have been partially leaked to the cloud server as a consequence of small . As described in, the access pattern is defined as the sequence of ranked search results. Although search results cannot be protected (excluding costly PIR technique), we can still hide the rank order of retrieved documents as much as possible.

The dimensionality of matrices in MRSE_II is (n+ U +1)

?(n+U+1) so that its index construction time with complexity O(m(n + U) 2) is bigger than that in MRSE_I with complexity O (mn2) as shown in Figs. 3a and 3b. Both the MRSE_I_TF

and the MRSE_II_TF, presented in, respectively, introduce

more computation during the index construction since we

need to collect the term frequency information for each

International Journal of Scientific Engineering and Technology Research

Volume.04, IssueNo.19, June-2015, Pages: 3533-3538

Enhanced Synonym Queries Supported Encrypted Cloud Multi-Keyword Ranked Search System

keyword in every document and then perform the normalization calculation. But, as shown in both figures, such additional computation in the TF _ IDF weighting rule is insignificant considering much more computation are caused by the splitting process and matrix multiplication. Although the time of building index is not a negligible overhead for the data owner, this is a one-time operation before data outsourcing. Besides, Table 1 lists the storage overhead of each sub index in two MRSE schemes within different sizes of dictionary. The size of sub index is absolutely linear with the dimensionality of data vector which is determined by the number of keywords in the dictionary. The sizes of sub index are very close in the two MRSE schemes because of trivial differences in the dimensionality of data vector.

Fig.3. Time cost of building index. (a) For the different size of data set with the same dictionary, n = 4,000. (b) For the same data set with different size of dictionary, m = 1,000.

Trapdoor Generation: Fig. 4a shows that the time to generate a trapdoor is greatly affected by the number of keywords in the dictionary. Like index construction, every trapdoor generation incurs two multiplications of a matrix and a split query vector, where the dimensionality of matrix or query vector is different

sub index generation, the difference of costs to generate trapdoors is mainly caused by the different dimensionality of vector and matrices in the two MRSE schemes. More importantly, it shows that the number of query keywords has little influence on the overhead of trapdoor generation, which is a significant advantage over related works on multikeyword searchable encryption.

TABLE I: Size of Sub index/Trapdoor

Query: Query execution in the cloud server consists of computing and ranking similarity scores for all documents in the data set. The computation of similarity scores for the whole data collection is O (mn) in MRSE_I and MRSE_I_TF, and the computation increases to O (m) (n+U)) in MRSE_II and MRSE_II_TF. Fig. 5 shows the query time is dominated by the number of documents in the data set while the number of keywords in the query has very slight impact on it like the cost of trapdoor generation above. The two schemes in the known ciphertext model as MRSE_I and MRSE_I_TF have very similar query speed since they have the same dimensionality which is the major factor deciding the computation cost in the query. The query speed difference between MRSE_I and MRSE_I_TF or between MRSE_II and MRSE_II_TF is also caused by the dimensionality of data vector and query vector. With respect to the communication cost in Query, the size of the trapdoor is the same as that of the sub index listed in the Table 1, which keeps constant given the same dictionary, no matter how many keywords are contained in a query. While the computation and communication cost in the query procedure is linear with the number of query keywords in other multiple-keyword search schemes, our proposed schemes introduce nearly constant overhead while increasing the number of query keywords. Therefore, our schemes cannot be compromised by timing-based side channel attacks that try to differentiate certain queries based on their query time.

Fig.4. Time cost of generating trapdoor. (a) For the same

query keywords within different sizes of dictionary, t =

10. (b) For different numbers of query keywords within

the same dictionary, n = 4,000.

Fig.5. Time cost of query. (a) For the same query

in two proposed schemes and becomes larger with the

increasing size of dictionary. Fig. 5b demonstrates the

trapdoor generation cost in the MRSE_II scheme with complexity O ((n + U) 2) is about 10 percent larger than that in the MRSE_I scheme with complexity O (n2). The

MRSE_I_TF and MRSE_II_TF have similar difference

where the additional logarithm computation accounts for very

small proportion of the whole trapdoor generation. Like the

keywords in different sizes of data set, t = 10. (b) For different numbers of query keywords in the same data set, m = 1,000.

V. CONCLUSION The previous work mainly focused on providing privacy to the data on cloud in which using multi-keyword ranked search was provided over encrypted cloud data using efficient similarity measure of co-ordinate matching. The

International Journal of Scientific Engineering and Technology Research

Volume.04, IssueNo.19, June-2015, Pages: 3533-3538

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download