In Cassandra there are two - TAU
Project: Distributed DatabasesAccess Control Security vs. PerformanceYosi Barad, Ilia Oshmiansky, Ainat ChervinAbstractOver the last few years, distributed storage systems have found widespread adoption, Cassandra database being one of them. It is an example of a highly scalable and high-performance distributed storage system for managing large amounts of structured data. However, in terms of security, Cassandra supports access control lists (ACLs) only for the column family level. In our, work we extended Cassandra so that it supports ACLs at the cell level, which is defined as the column level in the Cassandra database. Furthermore we benchmarked the performance of Cassandra with our implementation of ACLs in comparison to the original Cassandra database and to the Accumulo table, which can also support cell level ACLs.In contrast to other traditional storage systems, the Cassandra database sacrifices it's consistency in favor of availability and partition tolerance, so that it only guarantees eventual consistency among its nodes and clusters.As we know, according to the CAP theorem, every distributed system can satisfy any two of the three (Consistency, Availability and Partition tolerance)guarantees at the same time, but it is impossible to provide all three. Therefore we wanted to measure the inconsistency window created by the eventual consistency which might undermine and flaw the ACLs mechanism in our Cassandra database implementation. This paper includes the description of these tests mentioned above and our findings.IntroductionMany large organizations today, such as Google, Facebook and Amazon, manage and store massive amounts of data and handle large scale applications that require high performance, availability and scalability. And as more and more applications like these are launched in environments that have immense workloads, such as web services, their scalability requirements change very quickly and grow very large.?Relational databases, which were the customary use of storage and data management, scale well, but usually only when that scaling happens on a single server node. Once the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes and this task gets very complicated with these kinds of databases.?It was this need that distributed database systems came to resolve. Distributed database systems use an architecture that distributes storage and processing across multiple servers, and are therefore the perfect solution to these growing data management demands. However, scalability comes at a price. Having multiple servers handling the data may cause lower consistency and expose security holes. It is this issue of security and consistency versus performance that our project wishes to research.This project focuses on two popular table stores, Cassandra and Accumulo, for analyzing this matter. While the access control of Cassandra is at the level of column family, Accumulo has a higher level of security and allows defining cell-level access control. The main goals of this project are to add support for cell-level ACLs (Access Control Lists) to Cassandra and compare the resulting system to Accumulo, evaluate the performance and measure the security holes. The project will attempt to improve the security by increasing the consistency, while measuring the performance penalty.Our project's testing tool and scenarios were based and influenced by the paper, "YCSB++ : Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores". We aspired to extend the tests that were done there and therefore we used similar testing configurations.This document is divided into two parts:Methodological part – this part gives the details as to how we implemented our solution for ACL support and how we analyzed that solution and the consistency issues as well. It is separated into three sections:Code Implementation - this section describes our implementation of ACL support in Cassandra. It gives information on the Cassandra terminology and structure, code analysis and finally the different versions that were created.ACL Benchmarking - this section describes the practical details of how we tested our implementation and an analysis of our results. Consistency Tests -this section describes the consistency tests that were performed and their analysis.Plan vs Actual – this part describes the plan we first devised and the how well we followed it.Methodological part This part is divided into three main sections:Description of our implementationDescription of the ACL benchmarking tests and resultsDescription of the consistency tests and resultsCode implementationBackgroundIn Cassandra terminology, a "Cell" is actually called a "Column" it is the lowest/smallest increment of data and is represented as a tuple (triplet) that contains a name, a value and a timestamp. Next we have a "Column Family" which is a container for rows, and can be thought of as a table in a relational system. Each row in a column family can be referenced by its key.Finally, a "Keyspace" is a container for column families. It's roughly the same as a schema or database that is, a logical collection of tables. Figure 3 illustrates this structure.Cassandra Code AnalysisWe found that the Cassandra code has a very modular structure and that the security logic is separate from the rest of the code. Basically, there are two security related interfaces:Iauthenticator, which is responsible for authenticating the user that logs in (responsible for the login process). This interface contains the “authenticate” function.Iauthority, which is responsible for authorizing the access of an authenticated user to a specific resource. This interface contains the “authorize” function.Thanks to this structure it is possible to write your own code which implements these functions.In the initial code there were two classes that implemented the security interfaces: "AllowAllAuthentication.java" and "AllowAllAuthorization.java". The first class implements the Iauthenticator interface, and the second class implements the Iauthority interface.Since these classes don't supply any restrictions (as their names imply) their implementation allows anyone to connect and execute any operation.In addition to these, we found a more advanced implementation that contains two more classes: "SimplelAuthentication.java" and "SimpleAuthorization.java". These classes provide a very basic user/password authentication and access control for the Column Family level. The "SimplelAuthentication.java" and "SimpleAuthorization.java" implementations use two additional configuration files: password.properties – This file contains the list of users and their passwords. The passwords can be either in clear text or MD5SUMs of the password.access.properties – This file contains entries in the format:KEYSPACE[.COLUMNFAMILY].PERMISSION=USERS?whereKEYSPACE is the keyspace name.COLUMNFAMILY is the column family name.PERMISSION is one of <ro> or <rw> for read-only or read-write respectively.USERS is a comma delimited list of users from?password.properties.For example the entry "ks1.cf1.<rw>=yosi", means that 'yosi' is allowed to read and write to the ColumnFamily 'cf1' in Keyspace 'ks1'.Our implementationsWe devised our implementations based on the research we did on Cassandra's structure and the analysis of the existing code. Our implementations involve only server side extensions so we could support as many clients as we want. Since no changes were made on the client side, it may be used with its regular client shell while working with our Cassandra implementation.Within the scope of our project we implemented three different solutions for the cell level ACL. These implementations evolved one after another respectively: Simple authentication based solution where the ACL is saved to a configuration file (Will not be described due to its inefficiency) Saving the ACL as part of the values within the Cassandra database.Creating a new column for the ACL to be stored in the Cassandra database.All of the solutions support the client's ability to decide the list of users that would be exposed to their resources – columns or super columns and determine each user's set of permissions. This is done using the following syntax that we support in our implementations –When creating a new column you can specify the users you want to grant permissions for this new column. The format is: <column value>[:<user1,user2,...><permission>] [: ...]. For example: Set Cities[utf8('1234')][utf8('City')] = utf8('Haifa:scottrw:Andrewro');This command creates a new column named "City" with value "Haifa" in column family "Cities" having a row key "1234", giving Scott read\write permissions and Andrew read only permissions.Given a string of value and permissions using this syntax, we can easily parse it and determine whether a user is authorized to execute a certain command or not.Though these solutions have a common goal, they all achieve it in a different manner.The following is a more detailed description of the two relevant versions we came up with:Version 1.1 - Saving the ACL as part of the values within the Cassandra database:This version included the following modifications:We removed the "SimpleAuthority" related functions, namely – HasColumnFamilyAccess and HasKeyspaceAccess so that we do not depend on reading/writing values from/to the access.Properties file.We changed the implementation of the "authorize" function in the SimpleAuthority class to read the ACL from the value within the Cassandra DB rather than from the access.Properties file.Modified the insert operation – once a user tries to insert new column we would like to check if this column already exists. If the column exists we check the ACL given to it and permit this operation only if the user has write permission in that ACL. If the column doesn’t exist we permit the creation of a new one and allow the user to set the ACL of this new column.We added new functionality for parsing the value returned from Cassandra so we could return only the original value without the ACL to the user after verification confirms he has read permissions to it. We added a new "get" function to Cassandra which we used internally to grab and separate the ACL from the actual value of a Column. For example, if the Column Family "Students" had the key "yosi" and within it the Column 'id' with the value '038151098:eran rw:nir ro' and the user ran the command "get Students['yosi']['id'] we would return only the value '038151098' if the user is nir or eran. But if someone wanted to change the value of this field with "set Students['yosi']['id'] = '432432'" we would first have to get the value and parse the ACL and only then allow him to change it should he be in the ACL.We changed the way a column removal takes place. Once a user tries to remove a column from Cassandra, we first retrieve the column value with the ACL from the database and perform access control verification and only if the user has write permission on that cell the operation will be allowed.We edited the batch mutate and get slice operations in the Cassandra server. These functions are used to perform an intense insert and retrieval respectively to and from the database. They may accumulate many columns and super columns from a verity of row keys and insert them at once to the database. Regarding this implementation these functions were modified so they will verify access control for every column or super column one would like to insert or retrieve from the database. This verification will allow the execution of only user permitted operations -write and read accordingly.The following scenario illustrates the above implementation:Consider a two node Cassandra cluster where a column is being inserted into node 1. Afterwards another user tries to retrieve this column from node 2.Node 2 will query node 1 regarding the column; this will trigger a value parse. The ACL on that value will be checked and since Dani is not included in the ACL, the operation is denied.Version 1.2 - Creating a new column for the ACL to be stored in Cassandra database:This version included the following modifications:We modified the insert operation – for each column insertion we saved another column which maintained the ACL of the given value. These two columns are identified by their column names. For example if one tries to insert the column: Set student['yosi']['email']= 'yosi@:yosi rw:odelia ro'; We insert the following two columns into the database:An exception to this behavior happens if the column we would like to insert already exists in the database. In that case we first check for the user permission to write on that cell by invoking the ACL column (which must exist also as it corresponds to the column who initiated its creation) if the user has write permission, the insert operation is allowed and the column + ACL column is updated according to the user request. Otherwise the operation is denied and a message is prompt to the user.We modified the get operation – once a user tries to retrieve a column from the database we first invoke the ACL column that corresponds to it. If the user has the permission to write or read on that ACL the operation is approved and the original column containing only the value is returned to the user. Otherwise the operation is denied and a message is prompt to the user.We modified the remove operation – similar to the above clause when a remove operation is being processed the ACL column is invoked and only if the user has the permission to write on that ACL the operation is approved and both column + ACL columns are removed from the database. Otherwise the operation is denied and a message is prompt to the user.We modified the batch mutate and get slice operations in the Cassandra server. These functions were modified like before, only in this version the write and read permissions are verified by invoking the ACL column for each one of the columns among the mutation.The following section will describe the tests that were run on version 1.1.ACL BenchmarkingAs mentioned in the Code Implementation section, we implemented code that supports cell level access control lists in Cassandra. We tested the implementation version 1.1 where the ACL's are saved as part of the values to the database. The ACL benchmarking tests were done in order to compare the performance of Cassandra with the new implementation to Accumulo, and also to Cassandra without the implementation.First we will describe the installation and configurations needed to run the tests. Next we will describe the test scenarios along with how to perform the tests and finally we will analyze the results.The benchmark tests of the ACLs, consist of the following:A comparison of a three node Cassandra cluster with and without ACLs in different YCSB client and thread configurationsA comparison of Cassandra versus Accumulo using only a one node cluster with and without ACLsInstallation and ConfigurationInstallationsTo run the ACL benchmarking tests we installed the following components:Cassandra database – This includes both with and without the added implementation. The original Cassandra was installed according to the documentation found on the Apache Cassandra website. To be able to add ACL support to the code, we downloaded the source code using the documentation found here: (Instructionsfor how to install Cassandra with the ACL implementation now already added can be found on our website).Accumulo database – this includes installing the Hadoop file system and Zookeeper, a server that provides distributed coordination (installation according to instructions found on their websites)YCSB++ – this is the benchmarking tool used to test both of the databases. Installation and further instructions were done according to this site: configurationsIn order to compare Cassandra's performance with ACL implementation to that of Accumulo's, we configured a cluster containing one node - Configuring a one node clusterTo configure a Cassandra node without our implementation, we simply followed the instructions found on the Cassandra site. In order to use the Cassandra containing the ACL support, we made the following configurations:Set the environment variables so that CASSANDRA_HOME is the path to the folder containing the Cassandra code and JAVA_HOME is the path to the java folder.Edited the passwd.properties file by adding a username and password in the following way: <username>=<password>with which we will login to Cassandra in the later tests.Finally, we edited the log4j-server.properties file by changing the log4j.appender.R.File line to point to the system log file to be created in the Cassandra folder.For comparing Cassandra with and without the ACL implementation, we configured a cluster containing three nodes - Configuring a three node clusterTo get three nodes to work together in a cluster we needed to modify the Cassandra "cassandra.yaml"file on all of the three workstations comprising of the cluster. We needed to change them in terms of the following steps:First we defined the "seed" in the cluster. In the Cassandra database, the node defined as the seed, provides the new nodes information about the ring such as what other nodes are included in it, what are their locations, their token range and so on. After a node joins the ring, it shares ring information through the gossip protocol, and does not make any further contact with the seed node.Second we set the 'Listen' and 'RPC' addresses for each one of the nodes to be their own IP addresses. Since the nodes communicate via the Gossip protocol, we specify the interfaces on which they will listen for client traffic via Thrift and inter-cluster traffic.Third we set the Token value for all of the nodes. Each Cassandra server [node] is assigned a unique Token that determines what keys it is the first replica for. In order to maintain a load balance in the cluster it is recommended to set the nodes with the following token values:Node1- initial token: 0Node2- initial token: 56713727820156410577229101238628035242Node3- initial token: 113427455640312821154458202477256070485The above configuration allows 33% of the keys to be stored on each node and thus enables a load balance between the nodes in the cluster.Once everything was configured and the nodes were running, we used the Cassandra node tool?ring utility for verifying that the cluster is connected properly:Finally, for both configurations, we created the keyspace "usertable" and column family "data" which are mandatory for performing the desired benchmark tests carried out by the YCSB tool.For further details on installing and configuring Cassandra turn to the "Cassandra Acl – Getting Started" guide found on our website: configurationsAccumulo was configured only as a one node cluster in order to compare its performance to that of Cassandra's with ACL support. The configuration of the database was done according to the instructions found on the Accumulo website. YCSB++ configurationsIn order to use YCSB++ for connecting to any server, you must have a binding for it, meaning a java class that adapts YCSB's client interface to that of your server's API. It is called the DB interface layer, which as mentioned, is a java class that executes read, insert, update, delete and scan calls generated by the YCSB client into calls against a database's API. When using the YCSB client you need to specify the class name of the layer on the command line, and the client will dynamically load the appropriate interface class. YCSB++ already contains such a binding for Cassandra that comes built in with the installation link, for Accumulo on the other hand, the binding can be found in the following separate link: and you need to rebuild the project using Ant commands in order to use YCSB++ with this binding.Scenario DetailsAs mentioned above, we ran two main benchmarks:A comparison of a three node Cassandra cluster with and without ACLs A comparison of Cassandra versus Accumulo using only a one node cluster with and without ACLsSet-up:We ran tests on two different set-ups in accordance with the two main benchmarks defined. One set-up was of Cassandra and Accumulo with one node in the cluster, and the other was a set-up of Cassandra with three nodes in the cluster (Cassandra meaning both the database with the ACL support and the original database for comparison).YCSB++:The YCSB++ client, was also configured to run in two ways -Run 1 client with 100 threadsRun 6 clients 16 threads eachWorkloads:We used the core workloads supplied in the YCSB installation along with two additional ones we configured ourselves:Workload A: Update heavy workload - This workload has a mix of 50/50 reads and writes. Workload B: Read mostly workload - This workload has a 95/5 reads/write mix. Workload C: Read only - This workload is 100% read.Workload D: Read latest workload - In this workload, new records are inserted, and the most recently inserted records are the most popular. Workload F: Read-modify-write - In this workload, the client will read a record, modify it, and write back the changesInsert Workload – This workload writes 100 thousand single-cell rows in an empty table.Read Workload – This workload reads 100 thousand rowsTest steps:Using the one node cluster set-up, we ran each workload on each database (Cassandra and Accumulo) with the following number of ACL entries:1,4,5,6,7,8,9,10,11 three times each. To get performance results with 0 ACL entries, we ran the original Cassandra and Accumulo without security, with the same configurations.Next, we repeated the test of increasing ACL's in the three node set-up, only for Cassandra.The final results were calculated as an average of all three runs.Description of running the tests with YCSB++ with increasing ACLs:To use YCSB++ for running ACL benchmarks on Cassandra, you need to run the client with four main specifications. The first is the database binding you wish to use which is a java class that operates as an interface between YCSB and the database you are running. The second is the Workload class used to execute the workloads in this case "CoreWorkload". The third is the specific workload you want to run which is defined by a parameters file that the client reads and contains properties such as the number of operations to execute, the number of threads and so on. We altered the Cassandra client so that it will also read an additional property "cassandra.acl" from this file which is the ACL string containing the different entries. The third specification you need is the username and password with which to connect to the database.Regarding Accumulo, in order to run it with security attributes using YCSB++, you need to use both the Accumulo client class called "AccumuloCientSecurity" and change the workload class to "CoreWorkloadSecurity". This allows you to provide the additional properties related to the ACLs such as: accumulo.username, accumulo.password, security.credential, security.cell.enable, security.cell.entries etc. This means that unlike with Cassandra, here you don't need to specify the username and password in the run command as it is already contained in the workload parameters file.Having all the installations and configurations ready, we ran the benchmarks using a simple script and the results we got and their analysis will be described in the next section.ResultsAll the graphs shown below consist of columns ranging from 0 to 11. The 0 column represents the results of the test running on the "old" Cassandra – meaning the initial code we began working on that does not support cell level ACL (official Cassandra GIT repository files). The other columns represent the results running on our implementation where every number means the number of ACL entries added to each value. For example column 3 means we added something like ":ainat,yosi,ilia rw" to the value. For each column we show the result of the test in operations/sec.Cassandra 1 client 3 nodes:In this scenario we use one YCSB client PC and a configuration of three Cassandra nodes with a load balance configuration (each node responsible for a third of the values).The results we got are the following:These results show no significant difference in performance between the original Cassandra (the 0-acl column) and our implementation (1-11 acl columns in these graphs).Looking more in-depth into the reason for the lack of changes we found that while performance looked the same throughput-wise, the CPU usage indicators with the original Cassandra were generally significantly lower (roughly about twice as low). Attempting to quantify these changes did not prove to be fruitful since the computers we used for these tests weren’t ours exclusively and the results we got varied significantly from run to run (and so did the actual throughput results – but less so).To reach a point where the difference in performance will also be evident in the results we had to increase the load on the server, and so we moved to the next testing phase.Cassandra 6 client 3 nodes:Here we used the same Cassandra configuration (load-balance configuration with three nodes), but this time we used six YCSB++ clients with sixteen threads each, running the workload simultaneously. We ran the same exact tests as with the one client, summed up the results from the clients and got the following:These results show a significant decrease in performance when comparing the no-acl version to our ACL-enabled version. Looking at the CPU indicators we noticed a significant increase in CPU usage with the no-ACL version (original Cassandra) and in both cases CPU usage peaked and seemed to be the limiting factor in the performance of these systems.Now that we have actually found the peak-performance stats of these systems we are able to quantify the actual loss in performance with the introduction of the fine-grained ACLs.Workload A – which is the 50/50 read/write benchmark shows the lowest decrease. When comparing the results with no ACLS (45.6k ops/sec) to the results with only one ACL entry (27.4k ops/sec) we get a total of 40% decrease in performance. This is the smallest decrease in performance we measured and it sits well with our expectations.In our implementation of the column level ACL on the insert/update operation (in Cassandra these two are the same operation) we first fetch the old column, parse its ACL part and then decide whether the user is allowed to change the column. When the column is new the fetching part will return null and we will just insert the column and skip the CPU heavy string-parsing phase. This is why the "insert" operation throughput is hardly affected by our implementation.The Read operation on the other hand is affected and this is where most of this 40% loss comes from.Workload C – which is the Read only benchmark, shows a 61% decrease in performance.Workload F – which is the Update benchmark, showed about the same decrease as the Read benchmark – 61.5%Comparing the results of our implementation with different number of ACL entries we get the following:Test1-ACL11-ACLsTotal decrease50/50 read/write27.4k20.5k25%Read only21.3k15.1k29%Update 15.2k11.1k27%Accumulo 1 client 1 node:Accumulo has a native built-in cell-level ACL so both the 0 ACL test and the 1-11 ACL tests were done on the same system (as opposed to Cassandra).The tests were run on a single node of Accumulo with a single YCSB++ client running 100 threads. The ultimate goal of these tests was to establish a base line against which we could measure the performance of our implementation of ACL in Cassandra. This is what we got:These tests gave us extremely low throughputs compared to Cassandra (even with a one node configuration) so there was no point in comparing the throughput head-to-head but we could still use the data to determine the relative percentage loss in performance with the introduction of cell-level ACLs. We gathered all the data from the Cassandra and Accumulo tests and created the following comparison tables:Total decrease no ACL Accumulo1-ACL AccumuloTotal decrease no ACL Cassandra1-ACL CassandraTest13%6.45.640%45.627.450/50 read/write50%1.60.861%5421.3Read only9%0.80.7361%3915.2UpdateIn the table above we can see that the largest gap in the decrease in performance is in the update operation, showing a 61% decrease in Cassandra compared to only 9% in Accumulo. Next is the 50/50 read/write with 40% compared to 13%. And lastly read with 61% compared to 50% in Accumulo. The read operation seem to have given us the worst results in Accumulo which in return gave us the best relative performance in Cassandra with our implementation of ACL.Total decrease 1-ACL Accumulo11-ACL AccumuloTotal decrease 1-ACL Cassandra11-ACL CassandraTest79%5.61.225%27.420.550/50 read/write36%0.80.5129%21.315.1Read only30%0.730.5126%15.211.2UpdateThe above table compares the decrease in performance when increasing the number of ACL entries from 1 to 11.In this comparison it seems that Cassandra with our implementation is generally doing better than Accumulo at supporting multiple ACL entries. With this being said, it is still clear that Accumulo ACL performance is far superior to our system since the overall decrease in performance from no ACL to 11 ACL entries in Cassandra is still much greater in all but the 50/50 read/write, which as we said before, was the benchmark least influenced by our implementation. To conclude, here is a graphical representation of this analysis:The chart above shows the percentage of decrease in performance of the two systems in two scenarios:The system with no ACLs compared to the same system with only one ACL entry.The system with one ACL entry compared to the same system with eleven ACL entries.*Notice that the chart shows the decrease in performance so the smaller the values the better it is.Consistency TestsWeak Consistency in CassandraIn the first part of this section we will try to briefly explain what consistency is in regard to databases, the different types of consistency and how it relates to our project. Then we will move to showing different examples of the way consistency may be used in real-life situations and how it relates to security. Afterwards we will discuss the different aspects, settings and properties of the consistency (and replications in general) in Cassandra and lastly we will present our work - The testing environment, specs, conclusions and results of our tests and try to evaluate the effect of our implementation on consistency. Part 1: IntroductionConsistency describes how and whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write. The "how and whether" of this system are described by a "consistency model". To understand the term "consistency model" consider the following example:The row X is replicated on nodes M and NThe client A writes row X to node NAfter a period of time t, client B reads row X from node MThe consistency model has to determine whether client B sees the write from client A or not. In practice there are various factors that determine whether he will see the change, from the way the database is set, to the parameters of the insertion and the parameters of the "read". But in general we can distinguish between two types of consistency models that would determine the answer to this question:The "strong" consistency model: in this model consistency is a must. Every replica of the data will be exactly the same at all times and no two clients will ever get different data from different nodes. The "weak" (or eventual) consistency model: in this model we will allow the data not to be 100% consistent on all replicas and in this case client B (from the example above) might not see the read."CAP" (Consistency, Availability, Partition) theory talks about the trade off in distributed computer systems (Cassandra is such a system). It states that there are three fundamental guarantees in these systems: Consistency, Availability and Partition Tolerance. According to this theory, only two can be satisfied at one time. In our case, partition tolerance is a given in our system and we can play with the two remaining parameters by choosing an appropriate consistency model. With the strong consistency we choose consistency over availability, while the value is being updated on all the nodes the resource will not be available. And with weak consistency it’s the other way around.In our project we had to implement an extra layer of security and we chose to implement it by saving the Access Control List (ACL) in the same way we save values, so the same rules that apply to the values will also apply to the ACL. This means that if you choose to use our implementation of ACL and wish to have a system with good "availability", then the ACLs will also be subject to the rules of eventual consistency and there might be inconsistencies between the ACLs of the replicas of the same data on different nodes.To understand better what this all means we will show several examples.Part 2: Examples of eventual consistency and how it relates to securityExample one: Assume you want to change your Facebook profile picture, as soon as you apply the change you would like to see the affect take place but you don’t mind if your friend in the US sees the update 10 seconds later. In this scenario we would want "strong" consistency for the local change (if, for example, we can contact three different nodes we would like all three to have the latest copy of the profile picture) and "weak" consistency for the rest of the network.This example also demonstrates the tradeoff between availability and consistency because if, for example, you would set "strong" consistency for this change, it would take the user much more time to perform the change and while the change is being propagated through the network no one will be able to see the new profile picture and the user will either think something went wrong or would have to wait. So we will pay with availability for this added consistency.Example two:A Company uses a distributed database to store their sensitive data. When they fire an employee they wish to remove his access to the databases. As soon as they restrict the employee from having access to the database, assuming that the database is configured with Weak consistency, it will take a while until this change is propagated through the network to the other replicas of the data. During this period of time the employee can still log in to a different node which didn’t yet get updated and copy the sensitive information of the organization.Part 3: The consistency and replication settings of CassandraCassandra is designed as a distributed system, for deployment of large numbers of nodes across multiple data centers. Cassandra is robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy, for failover and disaster recovery.Cassandra stores copies, called replicas, of each row based on the row key. You set the number of replicas when you create a keyspace using the replica placement strategy. In addition to setting the number of replicas, this strategy sets the distribution of the replicas across the nodes in the cluster depending on the cluster's topology.The following is a technical explanation of replication and consistency settings in Cassandra:The total number of replicas across the cluster is referred to as the replication factor. A replication factor of one means that there is only one copy of each row on one node. A replication factor of two means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. In Cassandra there are two default available replication strategies:SimpleStrategy: This strategy is the default replica placement strategy. SimpleStrategy places the first replica on a node determined by the partitioner (which we talked about in an earlier chapter). Additional replicas are placed on the next nodes clockwise in the ring without considering rack or data center workTopologyStrategy: if your cluster is deployed across multiple data centers, this strategy specifies how many replicas you want in each data center. The NetworkTopologyStrategy determines replica placement independently within each data center as follows: The first replica is placed according to the partitioner (same as with SimpleStrategy).Additional replicas are placed by walking the ring clockwise until a node in a different rack is found. If no such node exists, additional replicas are placed in different nodes in the same rack."Rack" and "Datacenter" are properties that are given to nodes by the Snitch – A Component that maps IPs to racks and data centers. It defines how the nodes are grouped together within the overall network topology.There are many different "snitches" such as:- SimpleSnitch which does not recognize data center or rack information.- RackInferringSnitch which infers (assumes) the topology of the network by the octet of the node's IP address.- PropertyFileSnitch - determines the location of nodes by rack and data center. This snitch uses a user-defined description of the network details located in the property file cassandra-topology.properties.The snitch is configured pre-hand within the Cassandra.yaml file which stores misc properties of the Cassandra server.The ReplicationStrategyis configured upon creation of a Keyspace as one of its parameters.Examples:1. To define a simple cluster with several nodes with three replicas of each row we will just run the node, connect with the CLI and run: `create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_option s= }replication_factor:6{;`2. If we have three datacenters named DC1,DC2 and DC3 (some of them with several racks), we would like to have at least one replica in each of these datacenters and perhaps more than one in the major DC with several racks. To accomplish this we would have to use NetworkTopologyStrategy and define the proper snitch. In our case the IPs are not indicative of the layout so we will use the PropertyFileSnitch. To do that we will open our Cassandra.yaml and configure the endpoint_snitch property to PropertyFileSnitch.Next, we would want to open Cassandra-topology.properties, and configure it according to our network setup in our case it is going to look something like this:Finally, we would run the nodes, connect to them with our CLI and run:`createkeyspaceusertable with placement_strategy = 'org.apache.cassandra.workTopologyStrategy' and strategy_options ={DC1:1,DC2:2,DC3:1}`This would mean that each row would be replicated exactly once in DC1, twice in DC2 – one replica on RAC1 and another in RAC2 and once in DC3.Consistency settings:In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For any given read or write operation, the client application decides how consistent the requested data should be.Consistency levels in Cassandra can be set on any read or write query. This allows application developers to tune consistency on a per-query basis depending on their requirements for response time versus data accuracy. Cassandra offers a number of consistency levels for both reads and writes.Write Consistency:When you do a write in Cassandra, the consistency level specifies on how many replicas the write must succeed before returning an acknowledgement to the client application.The following consistency levels are available, with ANY being the lowest consistency (but highest availability), and ALL being the highest consistency (but lowest availability). QUORUM is a good middle-ground ensuring strong consistency, yet still tolerating some level of failure.A quorum is calculated as (rounded down to a whole number): (replication_factor / 2) + 1For example, with a replication factor of 3, a quorum is 2 (can tolerate 1 replica down). With a replication factor of 6, a quorum is 4 (can tolerate 2 replicas down).Cassandra 1.0 supports the following write consistency settings:LevelBehaviorANY Ensure that the write has been written to at least 1 node, including HintedHandoffrecipients (we will not get into what this means). ONE Ensure that the write has been written to at least 1 replica's commit log and memory table before responding to the client. TWO Ensure that the write has been written to at least 2 replica's before responding to the client. THREE Ensure that the write has been written to at least 3 replica's before responding to the client. QUORUM Ensure that the write has been written to N / 2 + 1 replicas before responding to the client. LOCAL_QUORUM Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within the local datacenter (requires NetworkTopologyStrategy) EACH_QUORUM Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenter (requires NetworkTopologyStrategy) ALL Ensure that the write is written to all N replicas before responding to the client. Read Consistency:When you do a read in Cassandra, the consistency level specifies how many replicas must respond before a result is returned to the client application.Cassandra checks the specified number of replicas for the most recent data to satisfy the read request (based on the timestamp).The following consistency levels are available, with ONE being the lowest consistency (but highest availability), and ALL being the highest consistency (but lowest availability).Cassandra 1.0 supports the following read consistency settings:LevelBehaviorONEWill return the record returned by the first replica to respond. A consistency check is always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is called * HYPERLINK "" ReadRepair) TWOWill query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will be checked in the background. THREEWill query 3 replicas and return the record with the most recent timestamp. QUORUMWill query all replicas and return the record with the most recent timestamp once it has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background. LOCAL_QUORUMReturns the record with the most recent timestamp once a majority of replicas within the local datacenter have replied. EACH_QUORUMReturns the record with the most recent timestamp once a majority of replicas within each datacenter have replied. ALLWill query all replicas and return the record with the most recent timestamp once all replicas have replied. Any unresponsive replicas will fail the operation. *Read Repair - For reads, there are two types of read requests that a coordinator can send to a replica; a direct read request and a background read repair request. The number of replicas contacted by a direct read request is determined by the consistency level specified by the client. Background read repair requests are sent to any additional replicas that did not receive a direct request. To ensure that frequently-read data remains consistent, the coordinator compares the data from all the remaining replicas that own the row in the background, and if they are inconsistent, issues writes to the out-of-date replicas to update the row to reflect the most recently written values. Read repair can be configured per column family (using read_repair_chance), and is enabled by default.Part 4 – Our Testing environmentIn our tests we aimed to achieve a dynamically configurable environment in which we could configure the number of replicas and control which nodes receive a replica. So we chose to use NetworkTopologyStrategywith the PropertyFileSnitch.We created a cluster consisting of 6 nodes, and configured the Cassandra-topology.properties as follows:132.67.104.156=DC1:RAC1132.67.104.154=DC2:RAC1132.67.104.152=DC3:RAC1132.67.104.153=DC4:RAC1132.67.104.151=DC5:RAC1172.17.136.200=DC6:RAC1Each node was a "Datacenter" on its own. Now, if we wanted to have a setup with 3 active nodes all we had to do was create the appropriate keyspace:`createkeyspaceusertable with placement_strategy = 'org.apache.cassandra.workTopologyStrategy' and strategy_options ={DC1:1,DC2:1,DC3:1}`To expand this to 5 nodes would just mean create another keyspace and use it instead of usertable:`createkeyspace usertable1 with placement_strategy = 'org.apache.cassandra.workTopologyStrategy' and strategy_options ={DC1:1,DC2:1,DC3:1,DC4:1,DC5:1}`Our testing platform was a slightly modified version of YCSB++. YCSB++ by default includes a read-after-write delay test. This works as follows:There are two YCSB++ clients with different roles, one (called producer) inserts values to NodeA and the other (called consumer) reads them from NodeB. The consumer and producer sync the values and insertion times via a 3rd party db (zookeeper) the exact algorithm is:The producer keeps inserting keys with 10 values (with our unique ACL pattern suffixes). After finishing the insert (getting an OK from the Cassandra node) he will insert the name of the key into zookeeper and continue to the next key.Meanwhile the consumer will attempt to read (pop) values from the common zookeeper DB. As soon as he sees a new key he will attempt to read it from Cassandra.If he managed reading it on the first attempt he will ignore this result, report a time lag of 0 and wait for the next key.If he failed reading the key from Cassandra on the first attempt he will keep trying over and over until he is successful. The output will be the DELTA between the timestamp of the key from zookeeper and the current time (which is exactly the time between the end of the insertion and the time of the first read).The results we got from this were a list of "TimeLags" – the delay between write and read. To generate more meaningful results from this list we created a python script which takes the data and outputs different stats such as averages, median, percentiles and different ranges (The script is available on our site).Running this test we produced the following result: This graph shows the average of multiple tests (since the lab we are using is highly unstable we have to run each test several times to iron-out any instability). The y axis is latency in milliseconds; the x axis represents the number of replicas. So with 2 nodes we got an average of 118ms delay on average and the 95th percentile (call it worse-case latency) was 542ms.In these results you can notice a general increase in latency the more replicas we use, which is to be expected since more replicas means more network traffic. With 6 replicas the main node which we insert to has to send all the values to 5 different nodes. Since we are running the test on maximum throughput any extra operation means reduced performance. So in addition to the increased latency we also notice a general decrease in throughput with this increase in replicas. With 2 nodes we had an insert throughput of around 11k ops/sec but with 6 it dropped down to about 7k.Our original plan was to isolate any limiting factors and run the tests with varying network conditions, but due to the limitations of our lab we had to run Cassandra on our personal laptop and connect it to the cluster via WIFI to perform these tests.We also found trial software which allowed us to increase the latency to a specific IP. So the mentioned delays are to the other Cassandra nodes only and not to the YCSB++ clients.From these tests we got the following results:As you can see, with WIFI we got an average of 14 seconds(!) delay even without latency. When increasing the latency to 100ms it skyrocketed to an average of 35 sec (and a worse case of over 2 minutes)This goes to show how fragile these systems actually are and that no matter how great your system is eventual consistency will still be mostly a factor of your networking specs and that the system is only as strong as its weakest link.Again, these test were actually performed on a slightly different platform (the laptop rather than the lab PCs) so it is difficult to say the exact toll a limited bandwidth, high latency and packet loss will have on the performance of Cassandra – but still this gives a pretty decent idea.Note that these results reflect the situation in a real-world network far better than the results in the lab configuration where the latency between machines is under 1ms and the bandwidth is over 100Mb.Next we moved to testing our read-after-update latency. Since our implementation of ACLs uses the actual values of the cells to store the ACLs the read-after-update latency is equivalent to "update-ACL" latency – and here is where the security risks come into play.To perform these tests we had to modify YCSB++ which didn’t come with this test so we had to improvise. What we did was add to YCSB a flag which states which keys it should insert. Then we also added a flag which allows us to add a suffix to the values inserted to Cassandra. With these changes we performed the test in 2 steps:Run a regular read-after-insert latency test (can run only the producer this time) with a specific set of keys (let's say 1-100k)And add the following ACL suffix: ":ilia,yosi rw".Let it finish inserting the keys and move to step 2.Run the Consumer with the user "ainat" which does not yet have permissions to read the values.Run the Producer on the same set of keys (this means it will update the keys with a new value), this time with the ACL suffix ":yosi,ainat rw" giving ainat access to these keys.Running these tests we got the following results:This graph is the same as the insert graph with regard to update. In this graph we can also notice a clear deterioration in performance with each extra replica but when compared to the read-after-insert results these look better, the latency has dropped significantly (about 10 folds).Looking closer into what the difference was between the two tests, we found something interesting – the throughput of the update was always around 1.8k but the throughput of the insert was around 10k.So we decided to check whether that had something to do with it.Playing with the thread count in the read-after-insert test we managed to get it down to about 2k ops/sec, looking at the measurements we noticed that the delays were almost all close to 0 and we got even better results than with the update.When running the tests with 4k ops/sec we got worse results and as higher we went the worse results we got (unfortunately we do not have the result of these tests).The conclusion is again, same as with the WIFI tests, the factor which is most crucial with eventual consistency is Network specs. Less ops means less traffic, less traffic means faster propagation of data among nodes and this results in reduced latency.We ran the same tests with our WIFI configuration and got:These results are also better than the insert. The throughput on these tests was around 1.6k (compared to 6k in the insert) and the results reflect it.For our final test we wanted to measure the number of values that Cassandra returned on the first read attempt (let's call them latency=0). We added this to YCSB++ and got:As you can see, the vast majority of the values are updated "right away" (or too fast for us to measure).The averages shown in all the previous graphs are only in regard to the values which did not return on the first attempt, that’s why with update we saw greater instability in the results – the sampling base was only 5% of the values. These results give us some more insight into the performance and display data which was not initially considered by the YCSB++ consistency test. Yet, there is a reason for why it was not considered – the test had its limitations and saying we were able to read the value on our first attempt only gives a "top border" on the possible delays of these values. We also ran these tests with WIFI and got these terrible results:These graphs compare the lab-setup where all the machines are on the same LAN segment, to the WIFI setup where one node is connected to the cluster via WIFI.With a delay of 100ms close to none of the values was consistent right away, which is to be expected since the read has no delay but the write has a delay of 100ms.Part 5- Improving the consistencyImproving the consistency can be achieved in various ways. As part of our conclusions we will discuss several ways that we have tested to be working which are:Increase the read/write Consistency. Level from ONE to higher:By increasing the Consistency.Level of operation we can fine-tune the desired consistency on every read/write operation. In our Employee getting fired example we can solve the problem by increasing the consistency level of the "Remove permissions" operation (in our case it's an insert operation) to ALL. This way the data will not be available to him as soon as we perform the change.The trade off here is that while the data is being updated through the network there is going to be a period of time when no one can access the data so this solution doesn’t scale very well.With the Facebook profile picture update example we can set the user's "read" consistency level of his own profile picture to "LOCAL_QUORUM" which would significantly reduce the chance for receiving the wrong picture.We have run the read-after-write and read-after-update tests each with ConsistencyLevel.ALL once on read and once on write and in both cases the result was the same – 0 time lag.Reduce the load on the server by better load-balancingDuring our testing we noticed that the latency has everything to do with throughput. Running the test several times with different thread counts (translates to different throughputs) confirmed this theory. So by monitoring the load on nodes and balancing it can really make a difference with consistency Use a 3rd-party synchronization mechanism:Using a 3rd party software that would work on a higher level than Cassandra and create a mid-layer between the user and the DB can also do the trick. Software that would manage the reads could guarantee that the data is consistent by reading the values from multiple nodes along the cluster and comparing timestamps. We tested this method by getting our "Consumer" client to read the data from multiple sources. The results we got were significantly better.Monitor and Improve the quality of your network:As our test clearly demonstrated –for the data to be consistent there is nothing more important than a good and stable network. Keeping track and monitoring your network is a must if consistency is of the essence.In conclusion, Distributed Systems are highly complex and have many points of failure. When planning such a system you will need to consider a variety of factors and variables ranging from the performance of your servers, the stability of the network, the bandwidth and latency between nodes and more. We hope the results mentioned above contribute in making the appropriate decision when it comes to consistency.Plan vs ActualThe following description goes through the plan we first devised and details of how well we followed it at each stage:System set-up: This stage included the installation and configuration of Cassandra, Accumulo and YCSB++. Our milestone one goal was to complete the installations and run initial performance tests. In actual, we did manage to install Cassandra and Accumulo. We also managed to install YCSB++. Also we ran the initial tests on Cassandra using YCSB. So all in all this stage was completed successfully according to the plan.Implementation of cell-level ACLs and performance comparison:These stages included coming up with ideas for implementation and testing these implementations with comparison to the original Cassandra and to Accumulo. Our milestone two goal was to finish the implementation of the cell-level ACL and to complete the evaluation of the performance on a more advanced configuration of Cassandra and Accumulo. We wanted to create a final version of the implementation based on the test results by the time we got to milestone three. In actual, when we got to the presentation of milestone two, we had two implementations ready. The first implementation was the simple authentication based solution and the second implementation simply saved the ACL as part of the values within the Cassandra database. We also tested both of the implementations.Later on we also added a third solution for the cell level ACL in Cassandra that saves two columns: column + corresponding ACL column. Unfortunately we only got to perform very basic tests (throughput benchmark with insert and read) on it from lack of time, these tests showed an approximate 50% decrease in performance compared to v1.1.Analysis of the security holes and improving the security through stronger consistency: This stage included measuring the security holes that may exist due to the inconsistency of the ACLs configuration and attempting to improve the security of ACLs by providing a solution with higher consistency guarantees. Our third and final milestone goal for this stage was to analyze the security holes, finish implementing the security improvement and measure the performance penalty.In addition to these goals, we also had to finish up the benchmarking tests that we didn't complete in the previous milestone. This included testing Accumulo for the first time. We did manage to test Accumulo with YCSB++ on a one node configuration but when we tried to extend Accumulo into three nodes in a cluster, we couldn’t manage to do so mainly due to synchronization problems that occurred between the Accumulo nodes. Since our time for exploring this issue was limited, we decided to settle for the one node tests and focus on the consistency analysis that was more important for this stage.We used YCSB++ in order to test and measure the inconsistency window that might occur between the nodes regarding the access control lists saved in the database. Ultimately, we were able to complete these inconsistency tests as planned. Regarding our wish to improve the security of both databases through stronger consistency, we came up with several solutions that use mechanisms in Cassandra.In Cassandra you can configure the consistency level of the read/write. Since tradeoffs between consistency and latency are tunable in Cassandra, we can achieve stronger consistency in Cassandra with an increased latency as a penalty. Therefore in order to minimize the inconsistency windows between the ACLs in the different nodes we can set a write and/or read consistency level to ALL in our Cassandra implementation depending on the circumstances as was explained in the Consistency Tests section.In conclusion, the majority of this stage was completed successfully.More information, tutorials and examples may be found on our site: ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- home molecular cancer therapeutics
- x brief bibliography mary ann liebert inc home
- using genetic algorithms as a controller for hot metal
- this publication was made possible in part through the
- home crowville school
- install a few packages that are required github pages
- in cassandra there are two tau
- science enhanced s s biology virginia department of
- gann 2 bioinformatics
- r course exercises uk
Related searches
- there are not enough words
- there are there is grammar
- vlookup if there are multiple matches
- vlookup when there are multiple matches
- there is there are exercises
- there is there are esl
- another way to say there are many
- there are two types of people game
- there are two types of people
- synonyms for there are many
- there s two types of people
- how many there are you