Incorporating transaction scheduling with concurrency ...



Priority Assignment Heuristic to Cohorts executing in Parallel

Udai Shanker M. Misra A.K.Sarje

Department of Electronics and Computer Engineering,

Indian Institute of Technology, Roorkee, Uttaranchal,

India

Abstract: -Transaction scheduling plays an important role in deciding the performance of a distributed real time database system (DRTDBS). It has been demonstrated that the priority based scheduling enhances the performance of a DRTDBS. Generally, the priority of a transaction is determined on the basis of its deadline. Previous work on priority assignment focuses either on centralized database or on distributed databases where subtasks (cohorts) of a transaction are executed sequentially. Several priority assignment heuristics have been proposed both for centralized and distributed databases. However the heuristics proposed for DRTDBS where subtasks of a transaction are executed sequentially may not suit well when the subtasks of a transaction are executed in parallel. In this paper, we consider parallel execution of cohorts and propose a heuristic to determine their priorities. The heuristic proposed in this paper improves the system performance by favoring cohorts demanding lesser number of locks. This minimizes the number of blocked/aborted cohorts. In our scheme a cohort’s temporary priority is also calculated every time when a data contention occurs to favor near completion cohorts. This reduces a large amount of wasted work. We compare the proposed scheme with EDF priority assignment policy. Simulation results show that the proposed scheme combined with intermediate (temporary) priority assignment policy gives better performance than EDF based scheme.

Key-words: -Priority, deadline, priority assignment heuristics, parallel execution, temporary priority

1 Introduction

A considerable amount of research [1,2,3,4,6] has been done in recent years on transaction scheduling and related areas. However, it is not very much suited for distributed cohorts executing in parallel. Incorporating transaction scheduling with concurrency control methods has a marked impact on the effective level of concurrency, especially, when data contention is high.

Concurrency controllers block or restart transactions when conflicts are detected. The traditional no priority based concurrency control algorithm penalizes the transaction that requests the lock last. The cohort remains blocked until the conflicting lock is released. The real time priority of the cohort is not considered in processing the lock request. Recent studies show that the system performance can be improved considering by using priority based scheduling. One of the most popular scheduling algorithm is earliest deadline first (EDF) [1], in which, the cohort with closest deadline is assigned highest priority. The problem with this approach is that near completion cohort is equally penalized, resulting in a surplus of wasted work. This, in turn, reduces the effective level of concurrency as well as overall success percentage.

Most of the existing real time transaction scheduling algorithms have been designed and evaluated for centralized real time database [2,3]. The problem of scheduling transactions with the objective of minimizing the number of tardy transaction was first addressed in [2]. In this paper, the authors studied the performance of three priority assignment techniques i.e. first come first serve (FCFS), earliest deadline first (EDF), and least slack first (LSF). The performance of several concurrency control protocols, such as serial execution, high priority, high priority conflict resolution policy and conditional restart, have been evaluated through simulation.

In a real-time database system, an application may assign a value [3] to a transaction to reflect the return it expects to receive if the transaction commits before its deadline. When transactions are assigned different values, the goal of the system shifts to maximizing the sum of the values of those transactions that commit by their deadlines. J. Haritsa et al. addressed the problem of establishing a priority ordering among transactions characterized by both values and deadlines that results in maximizing the realized value.

All of the work discussed above focuses on centralized real time database. In distributed real time database system, a transaction is generally divided into several subtasks (cohorts). These cohorts execute on different sites. The system performance is heavily dependent on the local scheduling of the cohorts at different sites. It has been found that the system performance, in term of meeting task deadlines, can be improved by assigning appropriate priority [4] to the sub tasks of a task. B. Kao et al. suggested four different heuristics i.e. ultimate deadline (UD), effective deadline (ED), equal slack (EQS) and Equal flexibility (EQF) for assigning the deadline to sub-tasks. These heuristics consider only real time constraints and may not be suitable for distributed real time database systems. The article [6] examines the performance of these four heuristics and suggests three other alternatives that take into consideration the impact of data conflicts. These alternative priority assignment strategies are the Number of Locks held (NL), Static EQS (SEQS) and Mixed method (MM). The NL assigns the priority to cohorts on the basis of number of locks being held by its parent transaction while other two heuristics are improved version of heuristics discussed in [4]. However, both of these studies consider sequential executions of task/cohorts. These heuristics, except ultimate deadline (UD), are not suitable for cohorts executing in parallel.

To reduce the degree of data contention, we considered a new heuristic as a priority assignment strategy which is based on the number of data items required by the cohorts at a particular site as compared to inheriting the priority from its parent. We observe that, if we can schedule active cohort in a way that favors the one that has finished most of it work (holding locks) over those cohorts that have just begun or are waiting for locks, then we can increase the effective level of concurrency and success ratio. This necessitates the need of evaluation of a cohort’s priority at the time of contention in order to schedule them more effectively. One way to achieve this goal is to employ new intermediate (temporary) priority assignment policy. In this paper, we study the effects of this scheduling policy incorporated with 2SC [7]. However, the gain achieved as shown by the simulation results is not limited to only 2SC and the same performance gain can be achieved while using 2PC or any other commit protocol.

Next in section 2, we summarize our transaction model and basic assumptions. Section 3 briefly describes 2SC distributed real time commit protocols. In section 4, we discuss the deadline estimation method, new heuristic and intermediate priority assignment policy for cohorts. Section 5 presents performance evaluation model parameters and simulation results. Section 6 concludes the paper.

2 Distributed Real Time Database System Model

In a distributed database system model, the global database is partitioned into a collection of local databases stored at different sites. A communication network interconnects the sites and all sites communicate via messages exchange over the communication network.

[pic]

Fig. 1 Distributed Real-time Database System Model

Each site consists of a transaction generator, a transaction manager, a concurrency controller, a CPU, a ready queue, a local database, a communication interface, a sink and a wait queue. The transaction generator creates transactions with inter-arrival time and is independent of the generators at other sites. The transaction manager generates cohorts on remote site on behalf of the coordinator. Before a cohort performs any operation on a data object, it has to go through the concurrency control component to obtain a lock on that object. If the request is denied, the cohort is placed into the wait queue for that particular data item. The waiting cohort will be awakened when the requested lock is released. If the request of all the locks is granted, the cohort will access the memory and perform computation on data items. Finally, cohort may commits/aborts and releases all the locks it has been holding. The sink component of the model is responsible for gathering the statistics for the committed or terminated transactions. In addition, a network manager models the behavior of the communications network.

We assume that the transactions are firm real time transactions. Each transaction in this model exists in the form of a coordinator process that executes at the originating site of the transaction and a collection of cohorts executing at remote sites, where the required data items reside. If there is any local data in the access list of the transaction, one cohort is executed locally. Before accessing a data item, the cohort needs to obtain locks on the data item. We also assume that:

• The processing of a transaction requires the use of CPU and data items located at local site or remote site.

• Arrivals of transactions at a site are independent of the arrivals at other sites and use Poisson distribution.

• A distributed real time transaction is said to commit, if the coordinator has reached to the commit decision before the expiry of the deadline at its site. This definition applies irrespective of whether cohorts have also received and recorded the commit decision by the deadlines.

• The database is in main memory at all sites.

Each cohort makes a series of read and update accesses.

3 2SC Commit Protocol

In this section we briefly describe the 2SC [7] protocol that we have used in our simulation to study the effect of the optimization proposed in this paper. 2SC allows two transactions to share the data by allowing a transaction (borrower) to borrow the data from a transaction in its commit phase (lender) [7]. Sharing of data items in conflicting modes creates dependencies among the conflicting transactions and constraints their commit orders. A transaction table at each site maintains the followings information for each local active transaction or cohort T:

Abort dependency set (ADS (T)): Set of transactions that are abort dependent on transaction T. ADS (T) = {Ti │ Ti AD T}.

Commit dependency set (CDS (T)): Set of transactions which are commit dependent on transaction T. CDS (T) = {Ti │ Ti CD T}.

When data conflicts occur, there are three possible cases of conflict.

Read–Write Conflict. If T2 requests a write-lock while T1 is holding the read lock on the same item, a commit dependency is defined from T2 to T1. After the transaction id of T2, id2, is added to the CDS (T1), T2 acquires the write-lock;

Write–Write Conflict. If both locks are write locks, a commit dependency is defined from T2 to T1. After the transaction id of T2, id2, is added to the CDS (T1), T2 acquires the write-lock;

Write–Read Conflict. If T2 requests a read lock while T1 is holding a write-lock, an abort dependency is defined from T2 to T1. If HF(T1) ( MinHF, the transaction id of T2, id2 , is added to the ADS (T1) then T2 acquires the write-lock; Otherwise, If HF(T1) < MinHF, T2 is blocked.

When T2 had accessed the locked data, three situations may arise.

1. T1 Receives Decision Before T2 Has Completed Its Local Data Processing:

If the global decision is to commit, T1 commits. All transactions in ADS (T1) and CDS (T1) will execute as usual and the set of ADS (T1) and CDS (T1) will be deleted.

If the global decision is to abort, T1 aborts. The transactions in the dependency set of T1 will execute as follows:

(i) all transactions in ADS (T1) will be aborted;

(ii) all transactions in CDS (T1) will execute as usual;

(iii) the set of ADS (T1) and CDS (T1) will be deleted.

2. T2 Completes Data Processing Before T1 Receives Global Decision:

T2 is not allowed to send a WORKDONE message to its coordinator. So the coordinator cannot initiate the commit processing. Instead, it has to wait until either T1 receives its global decisions or its own deadline expires, whichever occurs earlier. In the former case, the system will execute as the first scenario. In the latter case, T2 will be killed and will be removed from the dependency set of T1.

3. T2 aborts before T1 receives decision:

In this situation, T2’s updates are undone and T2 will be removed from the dependency set of T1.

4 New Priority Assignment Heuristic & Intermediate Priority Assignment Policy

The new priority assignment heuristic is based on the number of locks required by the cohort. The initial priority of cohort (Ti) of transaction (T) is computed as below.

P = 1/N where, N[pic]1 —― І

N is the number of locks required by the cohort Higher the value of P, more is the priority of cohort. As this may favor short cohorts, we also calculate an intermediate (temporary) priority whenever a data contention occurs (described later).

The life time a cohort is divided into two phases i.e. execution phase and commit phase. During execution phase, the cohorts lock the data items, do some necessary computation and, finally, send the Workdone message to their coordinators. If there are dependencies then the sending of Workdone message is deferred till removal of dependencies [7]. In our model, the arrival of higher priority cohort may aborts the locks holding low priority cohort, if it has not done the Yes Voting. The abort of lock holding cohort is done on the basis of intermediate priority which is assigned to the arriving cohort and lock holding cohort. It is a temporary priority assignment without affecting the initial priorities of the two cohorts and is based on the remaining execution time needed by the lock holding low priority cohort (TL) and the slack time available with the newly arrived higher priority cohort (Ta) [2]. It also solves the problem of starvation of long cohorts that arises due to the high probability of their access conflicts. The slack time is the amount of time the cohort can afford to wait in order to complete before its deadline. The remaining execution time of the lock holding cohort (TL) is given as:

Tremain = Ri - Telapse

where, Ri is the minimum transaction response time ( discussed in section 5.1).

Tremain is remaining execution time needed by TL;

Telapse is elapsed execution time of TL.

If Tremain(TL) is less than the slack time(Ta) then the priority of TL is greater than Ta; otherwise, Ta deserves for higher priority.

Complete pseudo code for scheduling algorithm

TCH is cohort holding the CPU.

TA is cohort requesting the CPU.

Algorithm for CPU scheduling

if(New Arrival:)

Assign priority to Ta based on І;

if (CPU is not free){

if (priority (TCH) > priority (Ta)

insert Ta in ready queue;

else{ abort TCH ;

allocate CPU to Ta ;

call lock_acquire_function ( );

}}

else { allocate CPU to Ta ;

call lock_acquire_function ( );

}}

Algorithm to resolve data contention

lock_acquire_function ( )

{ for each data item:→

if(no lock conflict)

allocate data item to Ta;

else { check the priority of conflicting cohorts TL;

if(initial priority (Ta ) < priority(TL) )

insert Ta in wait queue;

else

{

if(slack time(Ta) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download