Introduction



Project Number: CS-EAR-0404

LINEAR ROADS: A REAL-TIME TRAFFIC MONITOR

A Major Qualifying Project Report:

submitted to the Faculty

of the

WORCESTER POLYTECHNIC INSTITUTE

in partial fulfillment of the requirements for the

Degree of Bachelor of Science

By

Griffin Bryant

Timothy Chen

David Gordon

Date: April 27, 2005

Approved:

Professor Elke A. Rundensteiner, Major Advisor

Table of Contents

Abstract 3

Chapter 1: Introduction 4

Chapter 2: Background 8

Chapter 2.1: CAPE 8

Chapter 2.2: Linear Roads Traffic Stream Benchmark 11

Chapter 3: Data Generator 18

Chapter 4: Query Plans 22

Chapter 5: Operators 25

WindowGroupBy 25

WindowDistinct 28

StreamFunctionProjectOperator 29

Chapter 6: Graphical User Interface 32

Chapter 6.1: CAPE Interfaces 32

Chapter 6.2: Application Architecture 33

Chapter 6.3: Implementation Details of the Main Process Flow 41

Chapter 7: Conclusions 51

Chapter 7.1: Accomplishments 51

Chapter 7.2: Future Work 53

Chapter 7.3: What We Learned 54

Chapter 8: Appendices 56

Appendix A - moe.out 56

Appendix B - trajectory.out 57

Appendix C - vehicle.out 58

Appendix D - incident.dat 59

Appendix E - cardatapoints.out 60

Appendix F - out.dat 61

Appendix G - script.pl 63

Appendix H - List of Required Operators 64

Appendix I - GeneralQueries.java 66

Appendix J - MyTableCellRenderer.java 68

Appendix K - ResultPlotter.java 70

Appendix L - Average Speed Per Segment 78

Appendix M - Max Speed Per Segment 80

Appendix N - Volume Per Segment 82

Appendix O - Accident Report 84

Appendix P - Current Active Cars 86

Chapter 9: References 88

Abstract

The goal of this project is to create an application that will demonstrate the usefulness of the CAPE system in a real-life situation. An online traffic monitoring application was developed based on traffic data and queries from the Linear Roads Benchmark. A Graphical User Interface for a real-time traffic monitor application facilitates easy entry of queries, as well as offers intuitive monitoring displays. Several enhancements were made to CAPE to meet the needs of this application, in particular, the addition of new query operators.

Chapter 1: Introduction

New technologies are emerging every day, pushing the bounds of what was previously possible and redefining the world around us. This is especially true in the world of computing where much work goes into the design and development of new systems, tools, and packages. Often, these technologies are not created with a singular use in mind; rather, they can interact with many other systems, thereby being able to provide a wide range of functions. Also, the services a technology can provide are regularly developed in isolation, which can make it difficult to prove that a system or tool set has real-world use. For these reasons, it can be a good idea to develop an application that showcases the functionality of a system. Though the application may not be complete enough to be used in the field, it can serve to demonstrate the worth and viability of a technology. By applying a technology to a real-world problem, the possibility and utility of a system can be seen. Doing so can be vital when trying to obtain funding, to sell a product to a consumer, or to distinguish a technology from other similar technologies.

Worcester Polytechnic Institute’s (WPI) Database Systems Research Group (DSRG) has in development one such emerging technology. The DSRG’s Constraint-exploiting Adaptive Processing Engine (CAPE)[1] is a continuous query system which allows for querying over live streaming data. Querying, or asking questions of, data is not new and is done all the time in today’s relational database systems. The data in these databases is static however; it has already been collected and its size is finite.[2] In contrast, CAPE can query live data which is being input constantly and in real-time.

The possible uses of CAPE are numerous. It could be used to provide data results for such technologies as vital sign monitors in hospitals, stock market applications, traffic pattern monitors, or any other application that would benefit from extracting answers from continuous data feeds. One of the largest demands for continuous data applications lies in sensor networks. Such networks include global positioning and other systems where information from sensors is constantly being updated and altered.

Unfortunately, CAPE was not previously being used in any such application, real or purely for demonstrative purposes. At the same time, CAPE was being demoed at conferences such as the 2004 session of the Very Large Databases (VLDB) International Conference. There it was demonstrated by highlighting its strategies such as constraint-aware operators, adaptive scheduling, and query plan migration.[3] It was also run with a few, small, toy application examples. A larger application did not exist to convincingly illustrate the real-world usefulness of CAPE.

Therefore, the goal of this project was to create a real-world application that could interface with CAPE. The application chosen was a highway traffic monitor. A traffic monitor is a good example of a use of a continuous query system. It is easy to understand the live nature of the input; cars are always moving on highways. Also, it is easy to see the many benefits of such an application. Emergency rescue personnel could use such a monitor to be alerted that an accident has occurred and to quickly respond to the scene. Police could use the application to identify speeders or to locate criminals. Highway planners could watch traffic flow patterns over time for better construction of new highways or modification of old highways. The monitor could even be used by citizens to be warned of traffic jams or accidents.

Though the application will not actually be used by any of these groups; it does support the input of live traffic data, the querying of such data, and the output of results via appropriate visualizations. It interfaces with CAPE, using the CAPE system to handle and execute the specified queries on the live input and to output result streams. It therefore provides a concrete, understandable demonstration of the real-world usefulness of CAPE. This is an important first step in promoting CAPE so that it may one day be used by an authentic traffic monitor or by any other practical application.

Before the application was developed, a source of live traffic data had to be found. Several sources of data were looked at, including real traffic data streams, but the source of data that was ultimately adopted came from the Linear Roads Benchmark[4] project. This benchmark and what it provided for the project is explained in the section 2.2, but the reason it was chosen was that it supplied both a generator for input traffic data streams and queries which could be run over the streams. One limitation of the Linear Roads Benchmark data and queries is that it assumes, as the name implies, a linear set of highways. All expressways are perfectly straight and are aligned in parallel with each other. This is as opposed to any real highway system where roads twist and bend to fit the contours of the land and to navigate around cities and towns. Real roads also intersect with each other as some run east to west and others run north to south, connecting with each other for convenient travel. However, this limitation did not affect the goal of the project, as the application is to only be used for demonstration purposes.

An additional challenge of this project was that a Graphical User Interface (GUI) had to be designed appropriately for the chosen application. In particular, displays would be needed to reflect current traffic patterns on the highways. CAPE, initially, was missing some functionality needed to make this possible. This then became a secondary product of the project; functionality was added to CAPE so that it could support the chosen queries and produce desired results for the application. So, the workload of the project fell into five categories: finding live data and readying it for input into CAPE; finding appropriate queries and translating them into a format recognizable by CAPE; extending CAPE by adding new operators that could provide the desired results for the GUI; developing the GUI application itself; and connecting CAPE to the GUI to form the final working application.

The end result is an application which can be used to showcase the usefulness and real-world possibilities of CAPE. As an added benefit, CAPE was extended and functionality was added in the process of developing the application. In the future, this application can be further extended and, in doing so, can continue to help CAPE develop and to show the progress of that development.

Chapter 2: Background

Here the CAPE system is explored in more detail. CAPE is a large piece of software with many unique features such as query plan optimization, but only the aspects of CAPE that most pertain to this project are discussed in depth. Also, more information is given on the Linear Roads Benchmark, including its intended purpose, its role in this project, and the data and queries which it provides.

Chapter 2.1: CAPE

CAPE is a continuous query system. Such systems are often thought of as database systems, but there are several key differences. A database usually implies storage of huge volumes of data on some persistent medium. Data may be added over time, even frequently, but still all data is stored in the database until a user deletes it. Querying is then done over the stored data, with the output being extracted data from the database.[5] A continuous query system, on the other hand, does not store the data. Often, it does not make sense in the context of the data to do so. In the example of a traffic monitor, cars are constantly moving on highways. For some applications, where the cars are at the current time is what is important, not where they were five days ago. So, data to a continuous query system takes the form of live data streams. These streams must be queried in real-time. Results must thus also be in real-time. This general functionality can be seen in Figure 2.1. Streaming data is input into the continuous query system. Users submit queries to the system; these queries are then run by the system over the input. The system then outputs streams of result data which each correspond to a query that was run. These functions must occur simultaneously; the input is continuous and the output must be in real-time as well.

[pic]Figure 2.1: An overview of the function of a stream query engine. Taken from VLDB presentation slides, created by Rundensteiner, et. al.

The challenge, therefore, that continuous query systems must face is how to provide real-time results when so much processing must be done within the system. CAPE speeds up query processing using several innovative methods. The focus of this project is not on the performance of CAPE, so only a brief description of these methods follows. CAPE exploits any constraints known about the data to reduce resources used and to speed up response time[6]. Multiple scheduling algorithms are available in CAPE and the algorithm that is most optimal for current conditions is chosen dynamically[7]. Also, CAPE employs query plan optimization; query plans are restructured to allow for the quickest possible output[8]. And, finally, CAPE is capable of distributing work over a cluster of machines to improve performance[9].

The system architecture of CAPE can be seen in Figure 2.2. No changes to the architecture were required by this project, so in-depth research of the system architecture was not done. One key component that had to be understood was the Query Plan Generator. A query plan specifies, using some format such as XML, what operators, in what order, and with what parameters are to be run on the input data stream. An operator performs a singular function on the data. For example, a join operator combines two streams on a field common to both streams. The query plan file is parsed by the query plan generator. After distribution and optimization is performed, the execution engine runs the data through the operators as specified in the optimized query plan. Output is then sent to the user. For this project, new operators had to be added to the execution engine. Hence, parsing methods for the new operators had to be added to the query plan generator as well. Further discussion of the work done with query plans and operators can be found in Chapters 4 and 5.

[pic]

Figure 2.2: CAPE system architecture. Taken from VLDB presentation slides, created by Rundensteiner, et. al.

Chapter 2.2: Linear Roads Traffic Stream Benchmark

The Linear Roads Benchmark specification has been proposed and used by researchers at Brandeis University, Brown University, Stanford University, and MIT. These institutions use data management systems other than CAPE, such as Aurora and STREAM.[10] But, the benchmark, as proposed, can be universally applied to any continuous query system. Linear Roads is designed to “measure how well a system can meet real-time query response requirements in processing high-volume streaming and historical data.”[11] In this project, benchmarking is not the concern however. Still, what Linear Roads provides is a source of streaming traffic data and queries that can be run over the data. For these reasons, Linear Roads became an integral part of this project even though its intended purpose, benchmarking, was not our goal.

The Linear Roads Benchmark envisions its highways as existing in a Linear City. In a Linear City, the layout of the highways is quite simple. Highways are always completely straight, running horizontally and in parallel with each other. Each individual highway has multiple lanes in each direction: east to west or west to east. Figure 2.3 depicts a layout of ten highways.

[pic]

Figure 2.3: A Linear City with ten highways. From Linear Road: A Stream Data Management Benchmark by Arvind Arasu, et. al.

Each highway contains 100 one mile long segments. These segments are used in calculating tolls for each car and accident detection, as well as representing general traffic patterns along each highway. There are on-ramps at the beginning of each segment and off-ramps at the end of each segment. The lane numbers for each highway segment are significant, because they specify whether the car is entering, exiting, or merely traveling through the given segment. In this application, lane 0 is the on/off-ramp for the westbound segments, lane 7 is the on/off-ramp for the eastbound segments, lanes 1-3 are the traveling lanes for the westbound segments and lanes 4-6 are the traveling lanes for eastbound segments.[12] A picture representing a highway segment is given in Figure 2.4.

[pic]

Figure 2.4: A Linear City highway segment. From Linear Road: A Stream Data Management Benchmark by Arvind Arasu, et. al.

One big idea of the Linear Roads Benchmark, though not directly used at all in this project, is variable tolling. There are no toll booths in Linear City, rather drivers are tolled based on the conditions of the highways. Factors that are considered are time of day of travel, amount of traffic congestion on a road, and proximity to an accident.[13] The rules for calculating tolls are as follows: for each segment that a car enters, it is charged a certain amount depending on the average speed and total number of cars in that segment. If there is an accident in that segment, the cars in that segment are not tolled while they are in it.

Variable tolling systems can employ either smart highways and dumb cars or dumb highways and smart cars. In the former, sensors are placed at regular intervals along the highways to collect data as cars pass them. In the latter, every car must have a sensor in order to use the highways; data is collected directly from these sensors. The benchmark assumes a smart car system. In Linear City, there are 1,000,000 registered vehicles.[14] Being a registered vehicle means that the vehicle is equipped with a sensor that provides constant information to the Linear Roads Highway Monitor. However, only vehicles in the stream of active cars are actually on the highway. The others are assumed to be somewhere else, but are still equipped with sensors. Each vehicle, when on a highway, is assumed to be reporting its location and speed every 30 seconds.

The Linear Roads Benchmark comes with a data generator to generate traffic data for Linear City. It creates four input streams which can be fed to a continuous query system. The main input stream, and only stream used for this project, is the car location stream. A row of data in this stream is a report from each single car. While on a highway, a car submits reports every 30 seconds. When it reports, it gives its location and speed. So a row of data contains the following fields: the current time; the unique identification number of the car; the current speed of travel; current highway number; current lane number; direction of travel; and current x coordinate position on the highway.

The other three streams are queries by drivers which would be answered by the continuous query system. The queries allowed are: current account balance of the driver; total toll expenditure for the day for a driver; and expected time of travel for a driver based on road conditions and accidents. These streams were not used, and tolling queries were not implemented because they rely both on live data and historical data. They rely on the live data of the current conditions of the roads and on stored data such as account balances and tolls charged earlier in the day. Because CAPE does not have a special faculty for persistent storage it cannot work with stored data or historical queries. Therefore, tolling and driver queries were not handled.

The Linear Roads Benchmark specification also includes a handful of queries which can be run over the live data streams. The queries were analyzed for viability with the CAPE system. Any that relied on previous results or historical data had to be discounted. The queries ultimately chosen were: current active cars in Linear City; average speed of all cars per segment; volume of cars per segment; accident detection; and speeding cars detection. Current active cars is accomplished by displaying distinct car IDs that have been found in the past 30 seconds. Since a car must report its location every 30 seconds, it can be assumed that a car is no longer active if it has not been heard from in the past 30 seconds. For average speed and volume per segment, cars are grouped by their segment and then either all speeds are averaged or the cars are counted. Accident detection is done by watching for cars that have reported the same location four times in a row. When this is found the location of the car involved in the accident is displayed. For the speeding report, cars are grouped by their segment and a max speed is calculated for each segment. If this max speed is above a limit then a car is assumed to be speeding in that segment.

Below are the queries used in our application, written out in the Continuous Query Language (CQL). CQL was developed in conjunction with the STREAM project.[15] As CAPE did not support CQL at the time this project was done, these queries then had to be translated into XML query plans by hand. It should be noted that a project has since been completed that allows for the automated translation of CQL queries to XML query plans.[16] For the CQL below, CarSegStr is the main input stream used. It provides the following fields: time, car_id, speed, exp_way, lane, dir, seg, and x-pos.

Queries in CQL:

Current Active Cars

SELECT DISTINCT car_id

FROM CarSegStr [RANGE 30 SECONDS];

Average Speed per Segment

SELECT exp_way, dir, seg, AVG(speed) as speed

FROM CarSegStr [RANGE 5 MINUTES]

GROUP BY exp_way, dir, seg;

Volume per Segment

SELECT exp_way, dir, seg, COUNT(*) as volume

FROM CarSegStr [NOW]

GROUP BY exp_way, dir, seg;

Accident Detection

SELECT car_id, AVG(x-pos) AS acc_loc

FROM CarLocStr [PARTITION BY car_id ROWS 4]

GROUP BY car_id

HAVING COUNT DISTINCT (x-pos) == 1;

Speeding Cars Detection

SELECT exp_way, dir, seg, MAX(speed) as maxspeed,

FROM CarSegStr [NOW]

GROUP BY exp_way, dir, seg;

The work involved in getting the Linear Road Benchmark data generator to work with the CAPE system follows in Chapter 3. In Chapter 4, the translation of the above CQL to the XML query plans is described. Also, the new operators that had to be added to make the above queries possible in the CAPE system are described in detail in Chapter 5. In the discussion of the GUI development in Chapter 6, it can be seen how the simulated data and queries all came together to model a highway traffic monitor for Linear City.

Chapter 3: Data Generator

The Linear Road Benchmark package includes a data generator which was used to create the data for the application. The data generator, called MITSIMLab, takes in some parameters such as the number of highways, number of lanes, length of segments, as well as the duration of the simulation and outputs multiple files representing the simulated highways and cars. It is note worthy that since CAPE does not include a relational database, historical data such as previous tolls are ignored.

System Requirements

There are some basic requirements for the machine that will run the MITSIMLab simulation:

|Operating System: |Linux only |

|RAM: |512MB |

|Persistent storage space: |3 GB per expressway per hour |

|Perl: |5.0 or later, DBI module |

|PostgreSQL: |7.0 or later |

It was decided that the stream2 machine from the DSRG would be sufficient for a small test set during development. While stream2 does not have enough storage space for a very large dataset, it was able to provide a satisfactory set for the purposes of the project.

Configuring MITSIMLab

MITSIMLab has many configurable options that affect the output data. An initial test set consisted of only half a highway and it ran for 3 minutes. This was a very tiny set and the main goal was to guarantee that the output generated was convertible to by CAPE. Other configurations were also tried to ensure that the output from MITSIMLab was usable.

Configurations used:

|Number of expressways |Number of lanes per |Simulation length |Purpose |

| |expressway | | |

|0.5 (one direction |1 |1 minutes |Test data conversion from MITSIMLab output |

|only) | | |to CAPE stream generator input |

|1 |4 |3 minutes |Minimal requirements for Linear Road |

| | | |benchmark queries |

|3 |4 |5 minutes |Test capability of multiple highway support.|

| | | |Also lengthened test set for more complete |

| | | |results for some queries |

MITSIMLab Output

MITSIMLab outputs a few files upon completing its simulation. A sample of each output file is located in the appendix.

• moe.out (Appendix A)

This file contains information about the general highway system such as number of vehicles.

• trajectory.out (Appendix B)

This file contains information about the locations of cars for each second.

• vehicle.out (Appendix C)

This file contains information for cars that have successfully reached their destination. This information includes its origin, destination, departure and arrival times, mileage and speed.

• incident.dat (Appendix D)

This file contains information about the accidents and their severity.

• cardatapoints.outN (Appendix E)

This file contains information about of the cars in the Nth highway, including car ID, speed, lane number, highway number, x position, segment number, incidents and query requests. Each car is guaranteed to have 1 row per 30 seconds if it is still on the highway.

• cardatapoints.out (Appendix E)

This file contains the same information as cardatapoints.outN, except with every highway in the system.

Converting MITSIMLab Output to CAPE Input

The output from MITSIMLab could not be directly fed into CAPE’s stream generator because of some inconsistencies such as containing unrelated tuples and because of differences in format. CAPE’s stream generator requires the data to be sorted by time. This was true when the data only included one highway. However, when multiple highways were involved, they were concatenated instead of interwoven. When ran with this data, the CAPE stream generator would only be outputting streams from the first highway, and would not output any other until the first highway has no more data. Another issue is that cardatapoints.out also includes data such as accidents or query requests. Such data is not needed as part of the stream because we will be using our application to detect accidents or handling query requests. There were also conflicting car IDs between the highways.

A Perl script was written in the end to address both these issues (Appendix G). It opens all the cardatapoints.outN files and outputs them all into a new file while sorting all tuples with their timestamps. This allowed the data to be streamed all at once according to their timestamps instead of their highway number in CAPE’s stream generator. An additional check was also implemented to make sure that the tuple was in fact a car reporting its location instead of an incident report or query request. Finally, the delimiter was changed from a comma to " | " to match other stream input file formats CAPE supports. However, this change was not necessary, since the delimiter is configurable in the streams configuration file. The CAPE stream generator then uses the outputted file to stream data across a network to our stream application.

MITSIMLab was a highly configurable data generator that provided the group with the needed data based on different situations and requirements. While a script was still needed to convert its output to a readable format by CAPE, it fulfilled all our data needs. A sample of the outputted data is located in Appendix F and the script is included in Appendix G.

Chapter 4: Query Plans

Queries for relational databases are generally specified in Structured Query Language (SQL). SQL, however, is not capable of querying streaming data, so other languages or means are necessary. The STREAM project, which developed the Linear Road Benchmark, also developed the Continuous Query Language (CQL), a language derivative of SQL that now allows for queries over data streams. All of the queries in the benchmark are written in this query language. As STREAM and CAPE were not developed together, CQL is not shared between them. Instead, CAPE uses query plans specified at the algebraic level, similar to Aurora[17], to run queries over data streams.

These plans are algebra trees with each node being a relational algebra expression. For the XML query plans, the tree structures are complete with parent and children nodes, as well as a root. The raw data streams are supplied to the top-most nodes first, and then each node represents an operator in CAPE such as a join or a project. Several data streams can be used as inputs, but there can only be one output stream. The root node is the final operator over which the stream will be run before it is output. Each node, dependent on the operator that will be used, also requires variables to be supplied. These may be the size of a time window, the attributes in the data that should be projected or joined on, etc. The plans are specified using XML. These XML files are then parsed by CAPE, so the format is fully definable.

Each query the application needs to run requires a separate query plan to be written. In the application, when the user selects their query and runs it with the options they have chosen, the stream generator is started up along with the query processor. The query processor is given the corresponding, pre-generated configuration files as input. The configuration file supplied is dependent upon the query chosen by the user as each configuration file must point to the name of the query plan file. Once the query processor has access to the query plan, it parses the XML and instantiates an execution plan. That plan then executes the operators over the data stream as specified in the plan. From there, the generated output stream is read into the application where it is processed for the displaying of results as necessary.

Work had to be done to convert the CQL queries from Section 2.2 into XML query plans. For current active cars, a project operator was used to project car IDs and their locations. A distinct with a window of 30 seconds was then used to ensure that only active cars were reported and that they were only reported once. For average speed per segment, a project was used to project car ID, speed, and location. A groupby was then used to cluster the cards by segment. An average of the projected speeds per segment cluster was taken over the past five minutes. For volume, the plan was similar, but a count of car IDs was done rather than an average of speeds. Also, the groupby window was 30 seconds so that cars would not be counted twice. This takes advantage of the fact that cars only report themselves every 30 seconds. To do accident detection, cars were grouped by ID and segment. The window was two minutes, so that four location reports for each car would be considered. A count was then done of car IDs. As the clustering was done on both ID and segment, a count of four would reveal an individual car reporting the same location four times in a row, which is the signal that an accident has occurred. For speeding detection, the plan was similar to the volume per segment query. Cars were grouped by segment and a maximum value was calculated over the speeds. The window was 30 seconds. These query plans can be seen in Appendix I.

Chapter 5: Operators

CAPE already has a large number of operators. These operators most often resemble the operators used in a relational database system. Examples of such operators are: SELECT, JOIN, SUM, COUNT, PROJECT, etc. There are also some operators that are specific to CAPE. For instance, any operator that uses a time window to select data from the queues, average a column of a tuple over time, etc. However, there are a few operators used in the Linear Road benchmark that are not supported by CAPE. In particular, CAPE is missing certain arithmetic operators, DISTINCT, and GROUPBY. The following section will describe why and how these operators were developed and added to CAPE.

Operators Required:

In order to execute the query plans, we had to determine what operators already existed in CAPE and what operators would need to be developed for CAPE. During the earlier phases of the project, a list of operators that were required by the Linear Road Benchmark was compiled. These included some arithmetic functions, count, partition by, distinct as well as some other operators. From that list, operators were isolated for later development. To view the full list, see Appendix H.

WindowGroupby:

CAPE already has a group-by operator implemented; however, it was necessary to now add a windowed version. A group-by operator by definition requires all tuples to be received before doing any calculations on the grouped buckets. In the case of streaming data, however, there is no definite end to the data since data is constantly being received. The group-by operation would thus never be executed because the operator will constantly be waiting for the “last” tuple and will never know when it will arrive. The window is necessary for any queries to do group-by on the fly. With a time window, the operator will then have a known start and end to work with.

[pic]

Figure 3: Group-by with a time window

The idea of a window for group-by is illustrated in the following figure 5.1. The squares represent the tuples that are located within the time window w. They are grouped into their respected buckets, their color in this case, and the group-by could evaluate the result from these buckets. The example above shows a window group-by operator with a count grouping on the color. Since the group-by operator already has the function evaluation code implemented, extending the code to include a time window was fairly straightforward. The main new add-on for window group-by operator was the removal of expired tuples.

The Window Group-by operator is part of the WindowStreamOperator package. To use it, one must specify the group_pos, function, function_pos, window_size, and increment parameters. Here is a description of what each of them represent:

|group_pos |Defines which position(s) of the tuples this operator will group by. |

|function |Defines which stream function will be applied to the tuples. These can be found in the streamoperators |

| |package. |

|function_pos |Defines which position the function will be performed on. |

| |i.e. doing a stream function MAX on the first column of the tuple. |

|window_size |Defines how large the time window of the operator is. |

|increment |Defines how often to evaluate the results to the output queue. |

Here is an example query plan for a Window Group-by:

In this example, the window group-by operator will group tuples together based on the second column then perform the StreamFunctionCount on the first column as specified by the function_pos attribute. It will then report each group with their counts every one second of all tuples in the past 5 minutes.

WindowDistinct:

Similarly, a time window for CAPE’s distinct operator was also added. While it is possible to come up with a list of distinct tuples without receiving the last tuple, some query requires operations such as “report all cars on the highway in the past 30 seconds”. Implementing the time window allows for the operator to know how far back to look for tuples for the distinct. Creating a time window for the distinct operator was very similar to the window for group-by. A new check was executed every time a tuple is received to ensure that all tuples in the distinct list is not only unique, but also within the time window specified.

It is notable that the distinct operation compares the entire tuple, and does not compare only certain fields. For example, if the input stream generated 2 tuples with the same ID, but different locations, both tuples would be included in the distinct list. To select distinct car IDs, one must first run the stream through a project operator that only projects car IDs before feeding it into the distinct operator.

The Window Distinct operator is part of the WindowStreamOperator package. To use it, one must specify the window_size and increment parameters. Here is a description of what each of them represent:

|window_size |Defines how large the time window of the operator is. |

|increment |Defines how often to evaluate the results to the output queue. |

Here is an example query plan for a Window Distinct:

In this example, the distinct operator will report all distinct tuples from the input stream in the past 30 seconds every 10 seconds and output it to the output stream.

StreamProjectFunction:

The project operator already existed in CAPE, but was missing some functionality. A project is simply an echo of a subset of the fields in the input tuple. Tuples may have many fields, but for a specific result all fields might not be required to be returned. So, a project is used to send just the fields that are wanted through to the next operator or to the output. For example, if a tuple contains fields for time, car ID, speed, and location, but a simple report of car IDs and their speeds, without worrying about location, is wanted then a project could echo just the car ID and speed fields.

However, the project operator was missing the option of applying functions to the projected data. For example, in a stream of data that includes attributes for length and width measurements expressed in feet, one may wish to project these measurements in different metric units, such as meters. This would require division, or perhaps multiplication, by a constant. This functionality was added to CAPE as well as the ability to perform arithmetic between attributes (length times width to project area would be possible in the given example).

This new operator became known as StreamProjectFunctionOperator in CAPE. It not only required an extension of the StreamProjectOperator code, but also the expansion of the functionality for parsing the now extended expressions. A system was in place already for returning true or false given an equality operator between attribute data, so this code was now extended to also parse arithmetic expressions. This allowed for addition, subtraction, multiplication, and division either between attribute data and a constant or between two attribute data values. The extension of the code also allowed for arithmetic functions between results of previously computed expressions, thereby allowing for complex expressions.

To make this possible, a new tag was created that would be recognized by the XML parser. This was the expression tag. Within an open and close expression tag, lines can be included to indicate some arithmetic that should be performed. Each line either defines an operand or indicates an operation that should be performed. Operands can be defined as columns, constants, or results. A column is a field from a tuple, a constant is a number, and a result is the output from an operation specified previously. Operations allowed are addition, subtraction, multiplication, or division. An operation can only occur between two operands. To form complex expression, subsequent lines must perform arithmetic between a result and a field, constant, or other result. The project can then echo result and/or just unmodified columns from the data stream.

Figure 5.2 depicts an example query plan for a Stream Project Function.

Figure 5.2: XML query plan fragment for a StreamProjectFunction operator.

In this example, two fields, segment and x-pos, are each divided by the constant 5200. It is setup in another file that segment is at position four and x-pos is at position five. Then they are multiplied together and projected along with the car ID. The first line specifies the first field, segment, to be used, while the second line declares the constant 5200. The fourth line specifies the second field to be used, x-pos. Lines three and five divide the first and second fields, respectively, by the constant. Line six multiples the two results together. The result of line six is projected along with car ID.

Chapter 6: Graphical User Interface

Currently, CAPE offers several graphical interfaces. These are the CAPE GUI, the CAPE Initial GUI, and the Query Result Window. The first part of this section deals with describing the capabilities and limitations of each of these interfaces. The second part will describe the needs the Linear Roads GUI must meet. Finally, the last section will present the development and final implementation of the Linear Roads GUI.

Section 1: CAPE Interfaces

The first of the three major interfaces is the CAPE GUI itself. Its purpose is to display the queues being queried by the Queue Manager and show tuples moving through the system. It does this by displaying a query graph, allowing the user to look at certain data while it passes through the queues.

Second, there is a CAPE Initial GUI. This interface is designed to allow a user to specify the query plan and stream source to be used when running a query. Such an ability to specify different query plans is crucial because the Linear Roads system will need to run queries chosen from a list of preset queries.

Lastly, and connected to the previous interface, is the Query Result Window.[18] Although this is not so much of an interface per se, it is a good example of how to access and display query results. The difference between this and the CAPE GUI is that this window is at the Application Level of the CAPE system, where anyone writing their own application using CAPE would write their software. It is independent of the API of the Query Engine, Stream Generator, and other system components.

How this works is that it listens to a port where the results from a query run in the Query Processor are sent. When it gets tuples from the stream, it grabs them and outputs them in the window. They are also written to an excel file.

Linear Roads Application

Now that the proper additions have been made to the CAPE system, an application can be created that will utilize its streaming capabilities. Our application is a real-time traffic monitoring application, that runs using data generated using the data generator from the Linear Roads Benchmark. The first thing to look at in planning the structure of the application is what functionality the software must have, as well as any limitations CAPE might have with respect to the particular queries included in the Linear Roads Benchmark. The second decision to be made is one involving the way CAPE will integrate with the application. Since the application will be a test of what CAPE can provide for real-time applications in the future, this integration should ideally be one with some level of abstraction, one which future applications can mimic. Once these decisions are made, the complete GUI software model can be developed, and the coding can begin.

Section 2: Architecture Requirements

As mentioned earlier, the way CAPE will integrate with this application is very important. There are two ways the traffic monitor can interact with CAPE. The first option is for the application to interface directly with the Query Processor part of CAPE, as had been for the query plan monitoring GUI. This would allow the application to know exactly how the tuples are being processed, as well as would give the application more access to changing and updating queries.

The other option is to develop the application completely separate from the CAPE system, and position it at the afore-mentioned Application Level of the CAPE system. What this actually means is that there will be a layer of abstraction such that the only interactions the two systems will have is that the application can start up the CAPE system with certain parameters (configuration files) and it will listen to a specified port for query results.

Figure 6.1: Application Integration with CAPE Options

The second option is the more practical one for a number of reasons. Most importantly, any good piece of software should not be hard coded into the system it uses, such as a simple SQL database or other application interface. Also, CAPE is a fairly complex system. In the future, applications that use it will not necessarily know exactly how CAPE works or how it handles query plans, input data, etc. Therefore, since this application’s goal is to prove the ability of CAPE to integrate and work with many types of applications, this initial software should set the groundwork for future integration standards.

Functionality Requirements

There are a number of requirements that a real-time traffic monitoring system needs to fulfill in order to be deemed successful. These requirements depend mostly on what an end user of such a system in a real world situation would actually benefit from. There are two main categories of requirements: query specification and result display requirements. Each of these categories involves several aspects that must be considered, such as Human Computer Interaction characteristics, CAPE interaction specifications, and overall function.

The basic flow of this application is shown in Figure 6.2:

[pic]

Figure 6.2: Highway Monitor Sequence Diagram

As the figure illustrates, the Linear Roads Application is responsible for starting the result window. However, before it does this, the user chooses what query they would like to run on the traffic data. Depending on what the user is expected to know about queries, CAPE, and traffic data, they could input these queries in a number of ways. One option is for the user to type in a query they think might be valid. This raises many concerns. First and foremost, there is no way to translate either from SQL or any other high level query language into the query plans which CAPE requires. The user most likely will not know exactly which operators are available for them to use. There is a good chance they could make a mistake and enter a query that will not produce valid results. This option affords too much chance for user error, as well as requires a system of translating high-level queries into valid CAPE query plans.

Another option is for the user to create a query plan in the form of an XML file and select that configuration file to run. However, this option requires the user to have a good deal of previous knowledge of CAPE and XML, as well as about the format in which queries must be written, what operators are available, and what specifications are necessary for the application to receive accurate result data. Since the goal is an application that might be used in a real scenario, this option is not feasible.

The last option is to provide the user with a list of available queries which are tested and proven to provide useful application-level services. This option eliminates the possibility of user error, as well as ensures that they are provided with the results they are looking for. Each query can be directly associated with the appropriate query plan and matching configuration file that CAPE needs to run the query. They can also be linked to descriptions of the query results, so that users have an idea of what the results will actually show them.

The second area of requirements for the Highway Monitor deals with how it will display the streaming results provided by the CAPE query engine. Using the query specification method above, it is a given that the results are accurate. The issue is deciding what a user would expect to see as far as viewing live highway data. In addition, displays that can represent results in a useful and unique way are also desirable. Multiple types of displays were considered for this application, but only a few were chosen. In the first phase of the development, actual maps were considered. The issue was that since the data being used was simulated traffic data for linear highways with a perfect length and exit intervals, such a map was impossible to find. In addition to this problem, representing cars on the road in a meaningful way was somewhat ambiguous. For instance, should every car be represented? Since each car only reports every thirty seconds, the display might appear cluttered and unintuitive. Most users would expect to view cars as they are continuously traveling, not jumping around the screen.

A solution to this issue was to create simulated highway maps with lanes and segments corresponding to the stream generator schema. For instance, each segment can be color-coded for what the query results for cars in that particular segments are. As a quick example, imagine a user wants to find areas that have heavy traffic flow. They can simply run a query asking for the volume of traffic in each segment or each highway. Then as results stream in, the user can easily pick out the colored areas that correspond to the highest traffic volume ranges. The actual implementation of this display will be described later in the chapter.

The CAPE system contains a package for graphing data called PtPlot.[19] This package particularly lends itself to CAPE because it has the capability to graph data dynamically, updating the graph whenever it receives more points to plot. However, since only the query plan monitor application had preiviously utilized CAPE’s dynamic result stream, this feature had never been fully tested. It turns out that this graphing package was an especially good choice for several reasons. In addition to having the ability to plot live data, PtPlot allows the graph to be resized and labeled dynamically, depending on what query results need to be displayed. Other more subtle features will be described in section three of this chapter.

The last display a user might find useful is a data table which dynamically updates the results of the query they have specified. Although the overall visual appearance of the table is not as impressive, the actual data could be useful depending on what the user was looking for. If they wanted to look for particular car speeds or other raw numerical data, this simple display certainly meets that need.

Handling Location Reports

Each query in this application uses the same format to produce results. Result tuples always arrive in two rows, the first containing the location report and car id, and the second is the result of the query. The location report is the information containing the particular higheay, lane, direction, and segment corresponding to each car. This is because the GroupBy operator, which every query plan in this application uses, only provides two columns in each tuple. Thus, the location report and car id must be combined (i.e. concatenated), and then parsed by the result window part of the application. The following is a description of how the location report is parsed.

The first column contains a numerical value which has the car id, highway number (0,1,2), direction number (0,1) , lane number (0,4), and segment number (0 - 99). To parse this, the following steps are executed:

num = value in column 1

car id = (int) (num/100000);

if (temp >= 10000) {

hwy = (int)((temp-(carid*100000))/10000);

}

else {hwy = 0;}

if (temp >= 1000) {

lane = (int)((temp-(carid*100000)-(hwy*10000))/1000);

}

else {lane = 0;}

if (temp >= 100) {

dir = (int)((temp-(carid*100000)-(hwy*10000)-(lane*1000))/100);

}

else {dir = 0;}

seg = (int)(temp - (car id * 100000) - (hwy * 10000) - (lane * 1000) - (dir * 100));

For example, if the first column contains the value 34603154.0, the number divided by 100,000 yields a car id of 346, a highway number of 0, a lane of 3, a direction of 1, and a segment number of 54. This data is then used by the display to assign the value of the second column in the tuple to the correct segment on the graph, highway simulation, or data table.

GUI Software Architecture

The Highway Monitor application contains two primary elements: the query selection window and the result display window. Each one of these windows is a completely separate and standalone window. This is for several reasons. Originally they were both included in the same window, just in separate panels. When the result panel updates, it becomes frozen for a second or so while it refreshes the graphical display. If the user wants to select some other query to run in that time span, the query selection panel is also refreshing. This makes it difficult for the user to select a new query or display. Also, although CAPE does not allow multiple queries to be run simultaneously, the dual window application will support such functionality, should it ever be added to CAPE. This is be possible because the result window is run in a separate thread from the main application window, and simply listens to the output stream from the query engine. The thread that the second window runs in is controlled by the application in the first window.

One of the class structures developed for this application is a GeneralQuery class. This class allows the application to create a query object. This object is very useful because it assigns a specific query name and description to a query object. The most useful part, however, is that it allows this specific query information to be linked to the XML configuration files for that specific query plan. When the query is selected from the list, the proper files can be loaded and run with the Stream Generator and Query Processor. This makes entering new queries into the application quite easy. All that is required is to create a new GeneralQueries object and set its fields, and add the query name to a list. The rest is handled by the program itself.

There were some limitations of the CAPE system that affected the way this application could operate. First, only one query could be run at a time, and once a query is running, it cannot be stopped. This means that a user can only view the results of one query at a time. This application works around this limitation by starting the CAPE system when the user tells a query to run. However, should concurrent queries be supported by CAPE in the future, this application is designed in such as way that as long as the results from all the queries end up coming out of the same result stream, the result display will automatically display the results. If they come out in different result streams, then each display window can have their own separate input stream, and will all display results simultaneously.

Section 3: Implementation Details of the Main Process Flow

Here is an example of what actually occurs when the application is run. When the user opens the application, the main function of the program creates a new LinearRoadsGUI object and initializes it. This causes the GUI to appear on the screen and propagate all of the selection lists with their proper elements. The GUI is displayed in Figure 6.3. When the user clicks on a query from the first list, global variables in the LinearRoadsGUI class are assigned values depending on what query object has been selected. These variables determine which query plan files to start the Stream Generator and CAPE Standard Run processes with. They also provide the proper query description to be filled into the box below the list of queries.

Initial Linear Roads GUI

[pic]

Figure 6.3: First GUI Window

When the Run Query button is pressed, the application starts up three processes, passing them the appropriate configuration files provided by the query objects. These processes open in separate system windows which should be minimized to the task bar. After these processes are started, the system creates a socket which is bound to the port that the query results will be routed to. This socket, the query name, and display type are all passed to the new result window. This result window is actually a thread of the Thread extended class called ResultWindow started by the main Linear Roads Application. The result window appears to the right of the initial window, with the appropriate display type label. The corresponding query name and any other pertinent information are displayed in this window as well. Depending on the display type selected, the user will see either a graph in the form of an updating bar chart, a simulated highway display, or a result table.

Dynamic Graphical Display

The graph is a PtPlot[20] graph, with the variable bars set to true, or “on”. This creates a better visual comparison between highway segments than simple points. Points appear randomly as the data streams in, so to see a bar height is more intuitive and visually stimulating than watching dots appear. The graph is created using two classes. The first is PlotResults. This class creates the grid and axis labels. Once the size of the graph is set, it creates a new ResultPlotter which is the actual live plotting application. This ResultPlotter class is simply an extension of the PlotLive class, with a special overloaded method addPoints(), which is the method that continuously updates the graph as new streaming results arrive. Some of the addPoints() method is included here:

public void addPoints() {

int tupleCount = 0;

lastTime = System.currentTimeMillis();

boolean enable = true;

boolean haveSchema = false;

int hold = 0;

double temp;

String identifier;

String queryName;

XATTuple t;

try {

while (enable) {

if(System.currentTimeMillis() - waitTime > 3000){

if(System.currentTimeMillis() - lastTime > 15000){

lastTime = System.currentTimeMillis();

System.out.println("Received a total of: " +tupleCount+" tuples.");

}

//Receive the new Packets

Object o = in.readObject();

System.out.println("incoming: " + in.toString());

NetworkPacket np = (NetworkPacket) o;

Packet p = null;

for (int j = 0; j < np.size(); j++) {

p = np.getPacket(j);

if(p.getControlMessage().equals("schema") && !haveSchema){

System.out.println("Received The Schema " + p.getObject().toString());

haveSchema = true;

}

else if(p.getControlMessage().equals("schema")){

}

else if (p.getControlMessage().compareTo("newquery") == 0) {

identifier = p.getIdentifier();

queryName = (String) p.getObject();

}

//Take the tuple and show the result

else {

t = (XATTuple) p.getObject();

tupleCount++;

temp = t.getCell(0).convertToDouble().doubleValue();

if(temp >= 0){

carid = (int) (temp/100000);

if (temp >= 10000) {hwy = (int)((temp-(carid*100000))/10000);}

else {hwy = 0;}

if (temp >= 1000) {lane = (int)((temp-(carid*100000)-(hwy*10000))/1000);}

else {lane = 0;}

if (temp >= 100) {dir = (int)((temp-(carid*100000)-(hwy*10000)-(lane*1000))/100);}

else {dir = 0;}

segment = (int)(temp-(carid*100000)-(hwy*10000)-(lane*1000)-(dir*100));

tableSeg = segment / 2;

if(gridHeight < t.getCell(1).convertToDouble().doubleValue()){

gridHeight = t.getCell(1).convertToDouble().doubleValue();

setYRange(0,gridHeight + 1);

}

//Depending on which highway the segment is in, display it as a different color bar

//on the graph.

if(hwy == 0){

addPoint(0,new Double(segment).doubleValue(), t.getCell(1).convertToDouble().doubleValue(),false);

}

else if(hwy == 1){

addPoint(1,new Double(segment).doubleValue(), t.getCell(1).convertToDouble().doubleValue(),false);

}

else{

addPoint(2,new Double(segment).doubleValue(), t.getCell(1).convertToDouble().doubleValue(),false);

}

this.updateUI();

}

… the rest of the code is included in Appendix *. An example of results from the query Volume Per Segment is displayed in the graph window in Figure 6.3:

Result Window – Graph Display

[pic]

Figure 6.3: Volume Per Segment Graph Results

Here, each bar represents a different segment on a highway. The color of each bar distinguishes which highway that segment is in. The height of the bar in this example represents the actual number of cars in that highway segment. Also, notice that both the axes and graph title correspond with the current query.

Simulated Highway Display

A second display for this application is more specific to the Linear Roads highway system. It displays a highway for each highway in the system, each one with different lanes and segments. Each simulated highway display is actually a set of JTables. Depending on how many lanes and highways are specified in the data generator, tables can be easily added, removed, or updated (change the number of rows, columns, etc) to reflect these changes. What happens in this display is that for each tuple the HighwayDisplay receives, for example, results from the Average Speed Per Segment query, it will update the color of each segment. Since volume and speed are going to produce different values for each segment, each category of query has its own class that determines what result values should be represented by different colors. For instance, since volume is usually low (between 1 and 15 cars per segment), there are more colors towards the lower values, and less to represent higher volumes. A segment with a volume of 2 is a different color than one with 5 and one with 10 is a different color still. A segment with a volume of 31 is the same color as one with 40. However, since most segments seldom have more than 20 cars, this scale provides a close to accurate reflection of the true volumes of the segments. On the other hand, since car speeds vary greatly, the distribution of value ranges for colors in this category is much more evenly spread out. This also results in a meaningful representation of the actual speeds of cars in each segment of the highways.

As demonstrated by Figure 6.4, the three highways, complete with lanes and directions, are represented by three JTables. The colored cells are actually updating every five seconds, but can update more or less frequently, as specified in the query plan. To change this, simply change the increment variable, as displayed here:

out.dat");

@f0=;

@f1=;

@f2=;

$i0=0;

$i1=0;

$i2=0;

$s0=@f0;

$s1=@f1;

$s2=@f2;

while(($i0 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download