The rest of the paper is organized as follows



The Role of Cryptography in the Database as a Service Model

Hemavathy Alaganandam

Contents

The Role of Cryptography in the Database as a Service Model 1

Contents 2

1. INTRODUCTION 3

2. DAS Scenario 3

3. Data Privacy 1st challenge 3

3.1 Software level encryption 3

3.2 Hardware level encryption 3

3.3 Encryption Penalty 3

4. Data Privacy 2nd Challenge 3

4.1 Relation Encryption and Storage Model 3

4.2 Mapping Conditions (MapCond) 3

4.3 Implementing Relational Operators over Encrypted Relations 3

4.4 Problems with the Strategy 3

5. CONCLUSIONS 3

6. REFERENCES 3

INTRODUCTION

"Database as a Service" model provides users power to create, store, modify, and retrieve data from anywhere in the world, as long as they have access to the Internet. It introduces several challenges, an important issue being data privacy. There are two main privacy issues. First, the owner of the data needs to be assured that the data stored on the service-provider site is protected against data thefts from outsiders. Second, data needs to be protected even from the service providers, if the providers themselves cannot be trusted.

In this paper, I focus on the research made towards the first and second challenge. I specifically focused on techniques to execute SQL queries over encrypted data. The strategy in the papers I read was to process as much of the query as possible at the service providers' site, without having to decrypt the data. Decryption and the remainder of the query processing are performed at the client site. The basic idea was similar in most papers with each paper trying to overcome the drawback in the other solutions.

The rest of the paper is organized as follows. Section 2 presents the Database as a Service Scenario. Section 3 & Section 4 discusses the data privacy challenges and solutions. I conclude the paper in Section 5.

DAS Scenario

The DAS scenario involves mainly four entities (see Figure1):

• Data owner: an organization that produces data to be made available for controlled external release;

• User: human entity that presents requests (queries) to the system;

• Client : front-end that transforms the user queries into queries on the encrypted data stored on the server;

• Server: an organization that receives the encrypted data from a data owner and makes them available for distribution to clients.

Clients and data owners are assumed to trust the server to faithfully maintain outsourced data. Specifically, the server is relied upon for the availability of outsourced databases. However, the server is assumed not to be trusted with the confidentiality of the actual database content. The server should be prevented from making unauthorized access to the data stored in the database. To this purpose, the data owner encrypts her data and gives the encrypted database to the server. The end users, instead, are trusted to access the database, according to the data owner’s policy.

[pic]

Figure 1: The service-provider architecture.

Data Privacy 1st challenge

If database as a service is to be successful, and customer data is to reside on the site of the database service provider, then the service provider needs to find a way to preserve the privacy of the user data. There needs to be security measure in place so that even if the data is stolen, the thief cannot make sense of it. Encryption is the perfect technique to solve this problem. There are two dimensions to encryption support in databases. One is the granularity of data to be encrypted or decrypted. The field, the row and the page, typically 4KB, are the alternatives. The field may appear to be the best choice, because it would minimize the number of bytes encrypted. However, practical methods of embedding encryption within relational databases entail a significant start up cost for an encryption operation. Row or the page level encryption amortizes this cost over larger data. The second dimension is software versus hardware level implementation of encryption algorithms.

1 Software level encryption

First of all symmetric ciphers do much better than asymmetric. However for example if we use Blowfish which is a 64-bit block cipher, which means that data is encrypted and decrypted

in 64-bit chunks. This has implication on short data. Even 8-bit data, when encrypted by the algorithm will result in 64 bits.

In the paper [4] Blowfish implementation was registered into the database as a user defined function (UDF). Once it was registered, it could be used to encrypt the data in one or more fields - whenever data was inserted into the chosen fields, the values are encrypted before being stored. On read access, the stored data is decrypted before being operated upon. For example, if we were to encrypt the column discount of a table called lineitem using the user defined function called ”encrypt”, and decrypt it by the user defined function ”decrypt” one would use the following SQL command to insert data into the table lineitem:

insert into lineitem (discount)

values (encrypt(10,key))

The statement to select the encrypted field is given next:

select decrypt(discount,key)

from lineitem

where custid = 300

In this approach the creator of the encrypted data supplies the key, and the database provides the encryption function. Only those users who are given the key can decrypt the data using the decryption algorithm. Since the key is owned by the creator, and not stored at the site of the database service provider, unauthorized person who may get hold of disk files can not get hold of the key. In fact, even employees of the database service provider do not have access to the encryption key. The full security provided by the encryption algorithm is inherited by the data in the database.

2 Hardware level encryption

Specialized encryption hardware, the IBM S/390 Cryptographic Coprocessor, is available under IBM OS/390 environment with Integrated Cryptographic Service Facility (ICSF) libraries. IBM DB2f or OS/390 provides a facility called ”editproc” (or edit routine), which can be associated

with a database table. An edit routine is invoked for a whole row of the database table, whenever the row is accessed by the DBMS. An encryption/decryption edit routine can be registered for the tables. When a read/write request arrives for a row in one of these tables, the edit routine invokes encryption/decryption algorithm, which is implemented in hardware, for whole row. In[4], they used the algorithm option for encryption hardware.

3 Encryption Penalty

The response time for a query on encrypted data will increase due to both the cost of decryption as well as routine and/or hardware invocations in DB2. This increase is referred to as the encryption penalty. The software field level encryption was found to be particularly CPU intensive. As the number of rows increases, query execution time grows very sharply in software level encryption. On the other hand, hardware level encryption showed almost perfectly linear increase.

Data Privacy 2nd Challenge

The second challenge is that of "total" data privacy, which is more complex since it includes protection from the database provider. The requirement is that encrypted data may not be decrypted at the provider site. A straightforward approach is to transmit the requisite encrypted tables from the server (at the provider site) to the client, decrypt the tables, and execute the query at the client. But this approach mitigates almost every advantage of the service-provider model,

since now primary data processing has to occur on client machines. For this reason the encrypted database is augmented with additional information (index) allows certain amount of query processing to occur at the server without jeopardizing data privacy. The client also maintains metadata for translating user queries to the appropriate representation on the

server, and performs post-processing on server query results. Based on the auxiliary information stored,in [3] Hacigumus et al develop techniques to split an original query over unencrypted relations into (1) a corresponding query over encrypted relations to run on the server, and (2) a client query for post-processing results of the Server query.

1 Relation Encryption and Storage Model

For each relation R(Ai, A 2 , . . . , An), an encrypted relation:Rs (etuple, A1s, A2s, …, Ans)

is stored on the server. The attribute etuple stores an encrypted string that corresponds to a tuple in relation R. Each attribute Ais corresponds to the index for the attribute Ai that will be used for query processing at the server. For example, consider a relation emp below that stores information about employees.

|eid |Ename |Salary |Addr |Did |

|23 |Tom |70K |Maple |40 |

|860 |Mary |60K |Main |80 |

|320 |John |50K |River |50 |

|875 |Jerry |55K |Hopewell |110 |

The emp table is mapped to a corresponding table at the server:

emp s ( etuple, eid s, ename s, salary s, addr s, did s)

It is only necessary to create an index for attributes involve in search and join predicates. In the above example, if it is known that there would be no query that involves attribute addr in either a selection or a join, then the index on this attribute need not be created.

Partition Functions

The domain of values (Di) of attribute R.Ai are mapped into partitions { p1 , . . . ,pi}, such that

(1) these partitions taken together cover the whole domain; and

(2) any two partitions do not overlap.

As an example, consider the attribute eid of the emp table above. Suppose the values of domain of this attribute lie in the range [0, 1000]. Assume that the whole range is divided into 5 partitions:

partition(emp.eid) ={[0, 200], (200,400], (400,600], (600,800], (800, 1000]}

Different attributes may be partitioned using different partition functions. It should be clear that the partition of attribute Ai corresponds to a splitting of its domain into a set of buckets. Any histogram-construction technique, such as MaxDiff, equi-width, or equi-depth, could be used

to create partitioning of attributes.

Identification Functions

The identification function assigns an identifier identR.Ai (pj) to each partition pj of attribute Ai.

For instance, as shown below, an identifier is assigned to each range of emp id’s

|2 |7 |5 |1 |4 |

|[0,200] |(200,400] |(400,600] |(600,800] |(800, 1000] |

Partition and identification functions of emp ID

The ident function value should be unique, so a collision free hash function is a good choice. For example in the case where a partition corresponds to a numeric range, the hash function may use the start and / or end values of a range.

Mapping Functions

The mapping function MapR.Ai takes care of mapping a value v in the domain of attribute Ai to the identifier of the partition to which v belongs: MapR.Ai(V) = identR.Ai (pj), where pj

is the partition that contains v. In the example above, the following table shows some values of the mapping function for attribute emp.eid.

For instance, Mapemp.eld(23) = 2, Mapemp.eid(860) = 4

There are two types of mapping functions:

1. Order preserving: A mapping function MapR.Ai is called order preserving if for any two values vi and vj in the domain of Ai, if vi < vj, then MapR.Ai(Vi) < MapR.Ai (Vj).

2. Random: A mapping function is called random if it is not order preserving.

A random mapping function provides superior privacy compared to its corresponding order preserving mapping. The choice, whether a mapping function is order preserving or not, affects query translation. Query translation is simplified using an order-preserving mapping function.

Storing Encrypted Data

For each tuple t = ( a1 , a2 , . . . ,ai) in R, the relation R s stores a tuple:

(encrypt( {al, a2, . . . , an}), MapR.A1 (a1),MapR.A2 ( a 2 ) , . . . , MapR.Ai (ai))

where encrypt is the function used to encrypt a tuple of the relation. For instance, the following is the encrypted relation emp s stored on the server:

|Etuple |eids |Enames |Salarys |addrs |Dids |

|1100110011110010… |2 |19 |81 |18 |2 |

|1000000000011101… |4 |31 |59 |41 |4 |

|1111101000010001… |7 |7 |7 |22 |2 |

|1010101010111110… |4 |71 |49 |22 |4 |

Corresponding Employee table in the server

The first column etuple contains the string corresponding to the encrypted tuples in emp. For instance, the first tuple is encrypted to "1100110011110010..." that is equal to

encrypt(23, Tom, 7OK, Maple, 40). Any block cipher technique such as AES, RSA , Blowfish , DES etc., can be used to encrypt the tuples. The second column corresponds to the index on the employee ids. For example, value for attribute eid in the first tuple is 23, and its corresponding partition is [0, 200]. Since this partition is identified to 2, we store the value "2" as

the identifier of the eid for this tuple.

In general, the notation "E" ("Encrypt") maps a relation R to its encrypted representation. That is, given relation R( A1, A2, . . . , A,~), relation E( R) is RS (etuple, A1 s,A2 s , . . . , An s). In the above example, E(emp) is the table emp s .

Decryption Functions

Given the operator E that maps a relation to its encrypted representation, the inverse operator D maps the encrypted representation to its corresponding unencrypted representation. That is, D(Rs) = R. In the example above, D(emp s) = emp. That is, D (temp s) will decrypt all of the encrypted columns in temp s and drop the auxiliary columns corresponding to the indices.

2 Mapping Conditions (MapCond)

For each relation, the server side stores the encrypted tuples, along with the attribute indices determined by their mapping functions. Meanwhile, the client stores the metadata about the specific indices, such as the information about the partitioning of attributes, the mapping functions, etc.The client utilizes this information to translate a given query Q to its server-side representation Qs, which is then executed by the server.

Let us consider different query conditions :

Condition ( Attribute op Values (op can be =,,=)

1) Attribute = Value: Here the mapcond function would just map the value to a partition identifier.

eg. MapCond(eid = 860) = eid s = 4 since eid = 860 is mapped to 4 by the mapping function of this attribute.

2)Attribute < or >Value: Depending upon whether or not the mapping function MapAi of the attribute is order-preserving or random, different translations are possible

• Order preserving: In this case, the translation is straight forward:

Mapcond(Ai < v) => Ai s v) => Ai s in Map>Ai(v).

Mapcond(eid < 280) => eid s in {2, 7} since all employee ids less than 280 have two partitions

[0,200] and (200,400], whose identifiers are {2, 7}.

Condition ( Attribute op Attribute (op can be =,,=)

1) Attribute1 = Attribute2: Such a condition might arise in a join or selection . We consider all possible pairs of partitions of Ai and Aj that overlap.

|Partions |Ident(empdid) |Partition |Ident(mgrdid) |

|[0,100] |2 |[0,200] |9 |

|(100,200] |4 |(200,400] |8 |

|(200,300] |3 | | |

|(300,400] |1 | | |

For instance, the table above shows the partition and identification functions of two attributes emp.did and mgr.did. Then condition emp.did = mgr.did is translated to the following condition C1:

C1 : (emp s.did s = 2 AND mgr s.did s = 9)

V (emp s.did s = 4 AND mgr s.did s = 9)

V (empS.did s = 3 AND mgrS.did S = 8)

V (empS.did s = 1 AND mgrS.did s = 8).

2) Attribute1 < Attribute2 can be dealt in the same way

3 Implementing Relational Operators over Encrypted Relations

This section describes how individual relational operators (such as selections, joins and grouping operators) can be implemented in the proposed database architecture. I‘ve shown a few examples below. The strategy is to partition the computation of the operators across the client and the server. Specifically, we will attempt to compute a superset of answers generated by the operator using the attribute indices stored at the server. These answers will then be filtered at the client after decryption to generate the true results. Work done at the client is tried to minimize as much as possible.

The Selection Operator: Consider a selection operation O'c(R) on a relation R, where C is a condition specified on one or more of the attributes A1, A2,.. •, An of R. A straightforward implementation of such an operator in our environment is to transmit the relation R s from the server to the client. Then the client decrypts the result using the D operator, and implements the selection. This strategy, however, pushes the entire work of implementing the selection to the client. In addition, the entire encrypted relation needs to be transmitted from the server to the client. An alternative mechanism is to partially compute the selection operator at the server using the indices associated with the attributes in C, and push the results to the client. The client decrypts the results and filters out tuples that do not satisfy C.

Eg. Selection(eid ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download