Solar Thermal Plant Design and Operational Suite of Tools



Data Resolution and Integration Component

Project Report

Version

Faculty Advisor

Dr. Kwok-Bun Yue

Mentor

Mr. Todd Guillory

[

Team Members

Deo Abhisheik

Gowda Sanjeev

Mehendale Amruta

Paniwala Aiyaz

Acknowledgement

We would like to thank our instructor Dr. Kwok Bun Yue and mentor Mr. Todd Guillory for their support and guidance throughout the semester. We would also like to thank the people who helped us in this project and whom we might not have mentioned here.

Abstract

Engineers of Chevron Information Technology corporation need to work with data derived from multiple data sources, which may be SQL databases, Access, Excel or Comma Separated Value (CSV) plain files. This project aims to build a tool for them to visually view rows from various data sources and select appropriate rows to be included in the resultant datasets in different formats, such as Excel, CSV, etc. The tool is called the Data Integration and Resolution (DI&R) Component Application. DI&R Component Application is a new, self-contained product, which deals with the scenario where the user is required to work with data distributed across different data sources. The application enables the user to view, compare, integrate data and build a trusted resultant dataset. DI&R Component Application can handle more than two data sources simultaneously. The data sources selected for comparison have the same relational schema and the tables do not have any child parent relationships.

Table of Contents

1 Introduction 1

2 Building DI&R Component Application 1

2.1 Requirements Phase 3

2.2 Design Phase 7

2.3 Implementation Phase 8

2.4 Verification Phase 12

3 Product Features 13

3.1 Select the data source 13

3.2 Data source of type flat file 14

3.3 Data source of type Access 15

3.4 Data source of type Excel 16

3.5 Select data rows 18

3.6 Clear Selection 18

3.7 Export Data 20

4 Technologies Selected 24

5 Implementation Issues 24

5.1 Exporting the data to Access files 25

5.2 Exporting data to Excel files: 25

5.3 Use of delimiters in flat files 26

6 Conclusion 26

7 References 27

8 Appendix A: Team Information and Roles 28

9 Appendix B: Task Distribution 31

10 Appendix C: Project Timeline 34

11 Appendix D: Acronyms and Abbreviations 36

Project Report

Introduction

The application is developed for Chevron Information Technology Corporation, Global Consulting & Applications Solutions/Data Access/Exploration. The department deals with a huge data repository which is of different structures distributed across different data stores. DI&R Component Application helps the user to integrate data from different sources and perform visual comparisons of the data and manually build a trusted dataset or decide which data elements they would like to keep for a new, resolved version. The application deals with the scenario in which the customer may have to access data of different structures distributed across different data stores.

Building DI&R Component Application

The main focus of the project is to develop an application capable of extracting data from different data sources and storing it to a desired destination data file. Most of this data is gathered by the geophysicists working at Chevron plants. The problem is that, while gathering this data geophysicists store it in various kinds of data sources like Excel sheets, Flat Files and MS Access Database. Most of the data gathered is same, but the useful data gets distributed across different data sources. Many times this information is shared between, compared and analyzed by them to get resultant useful and consistent data. Due to the use of different data sources, a custom tool was required to assist the analyst to extract the meaningful data from these sources, integrate it and store the resolved version for further usage. This application will improve the productivity and solve the problem of manually merging data from all the different data sources at Chevron. The requirements of the project are subject to change and might be modified depending on the use of the application. While developing this application the team had to ensure that it was broken down into separate components. This would allow future enhancements to be developed by replacing the components as and when required.

The team has followed the waterfall model for the product development. The waterfall model is a sequential software development model (a process for the creation of software) in which development is seen as flowing steadily downwards (like a waterfall) through the phases of requirements analysis, design, implementation, testing (validation), integration, and maintenance.

[pic]

Fig 1 Waterfall Model

1 Requirements Phase

This was the initial phase of the project, where in we gathered the requirements for the project, what exactly needed to be done, what were the resources to be used, what was the output the project had to produce. In order to gather the requirements we conducted meetings with our mentor, his colleague in chevron and our instructor. The requirement helped us identify the basic application features like opening the different data sources, selection of the data from these opened sources and the export of the final result. This phase had to be repeated again in the later stages of this project as the client made many more considerations after the first draft of the implementation. However, the basic requirements still remain valid in view of these new considerations.

The major requirements as identified during the initial mentor meetings were:

1. To develop a tool capable of displaying data from data sources like MS Access 2003, MS Excel 2003 and flat files containing delimiters.

2. The tool should display the data such that the user can differentiate the data from different data sources and select the required data from each of them.

3. It should allow the user to then merge the selected data and export the merged data to another destination data file.

4. The basic merge operation involves the user simply choosing themselves which rows from the data sources are required and then storing them to a resolved version

5. The user can also merge data using a join operation between the different data sources by selecting the candidate key with which the join may be performed. This candidate key may or may not be the primary key of an existing data source.

6. The tool should provide visual cues for matching data with the help of which user can know to what extent data from one data source matches that from another.

During the corresponding discussions and meetings the complex issues identified by the team in meeting these requirements were:

1. Some of the data sources like Flat files and excel sheets do not have any way of specifying the primary keys. Initially it was determined that the first row will be considered as a primary key for such data sources. In the subsequent discussions it was decided that if at all a primary key is required it will be solely specified by the user.

2. The data sources may have different structures and formats with different names for the same columns, with same columns in different positions or different number of columns. This affects the merging of data as there will be a different outcome in each case. Initially it was decided to consider that all the data sources will be of same structure and format and no such conflicts exist.

3. There may be different possible candidate keys for different data sources depending on their structure. This would have to be considered while performing join operations.

4. The exportable versions may contain duplicate records or records with same primary key. The exported version should ideally contain records that are unique and free from conflicts that may result in errors. It would be left up to the user for now to decide if they want to delete duplicate records.

To solve each of these problems it was decided that an initial prototype will be developed that would be able to meet the requirements that are not affected directly by the complex issues identified so far. Depending on the feedback from the end user about the working of the prototype the complex issues will be dealt with one by one and resolved.

The major requirements that were then required to be fulfilled by the prototype are:

1. The application should be capable of displaying data from data sources like MS Access 2003, MS Excel 2003 and flat files containing delimiters.

2. It should display the data such that the user can differentiate the data from different data sources and select the required data from each of them.

3. The user should be able to select the rows from the data sources that are opened and subsequently merge them into a resolved version. Here the merging takes place by manually selecting the rows from each of the data sources and without consideration for the existing candidate keys of each of the data sources.

4. The final resolved version can be exported to a destination file which will be an MS Excel sheet.

[pic]

Fig 2 Use Case Diagram

2 Design Phase

The next stage in the development of this project was the design based on the requirements we had gathered initially. The first step of this phase was constructing a use case diagram that met the basic scenario solved by the application. Once the use case diagram was confirmed to meet the requirements identified so far we could move to the next stage involving the construction of the architecture of the application.

In order to keep the project flexible for changes in requirements we followed the M-V-C architecture where M stands for Model, V stands for View and C stands for Controller. Furthermore each of the entities Model, View and Controllers were split into components which gave more flexibility and helped in integrating the work done by every individual of the team. As a result we came up with an architecture diagram which consists of View built with Windows based user forms, a Controller with the logic to control client GUI and the database operations and a Model which consisted of the database connectors for the different data sources.

Fig 3 Architecture Diagram following MVC Pattern

3 Implementation Phase

After successfully completing the design phase, with a clearly designed architecture of the application, we distributed the work and started to code the implementation. Several classes belonging to the different components of the architecture were identified in a class diagram. The classes were distributed for implementation and the work done by the team was integrated into one whole project. The project is divided into two modules. First part of the class diagram consists of classes which form the database connectivity. Second part of the class diagram describes the user interface.

[pic]

Fig 4 Class Diagram showing the Business Logic

The description of the classes in the above diagram is as follows:

• BaseConnector.cs: This is the abstract class which contains abstract and non abstract methods which help in connecting to and retrieving data from the various data sources

• ExcelConnector.cs: This class extends the BaseConnector class for connecting to, retrieving the data and storing data specific to an MS Excel 2003 spreadsheet.

• AccessConnector.cs: This class also extends the BaseConnector class for connecting to, retrieving the data and storing data specific to MS Access 2003 database.

• FlatFile.cs: This class also extends the BaseConnector class for connecting to, retrieving the data and storing data specific to MS Access 2003 database.

• DataBaseLogic.cs: This class takes care of the business logic part of the project. It takes input from the UI and based on data source selected will invoke the right class to deal with that type of data source.

[pic]

Fig 5 Class Diagram for the User Interface

The description of the classes which respond and process the user input is as follows:

• frmMainForm.cs: This is the MDI parent form which provides the user all the options from selecting the data source to exporting the resolved data to a destination file.

• frmDataTables.cs.

This form has a method which lists the tables and returns the text on all the forms which will help in differentiating the names of the tables.

• frmSelectDataSource.cs: This windows form gives the user the option to specify the type of data source from MS Access 2003, MS Excel 2003 or a Flat File. The form also lets the user select the path to the source data; selecting the table in case of MS Access, sheet in case of MS Excel 2003 and the delimiter character in case of a flat file. The user can then click on the “Connect” button to view the data present in the data source selected.

• frmSelectDataTables.cs: This form gives the list of data sources which are currently being viewed by the user and lets the user check/uncheck the data sources from which he can preview the rows selected.

• frmDataTables.cs: This form displays the data contained in the data source selected by the user in frmSelectDataSource.cs form.

• frmExportDataPreview.cs: This form displays the resolved data as selected by the user the frmSelectDataTables.cs form. Here the user can also select multiple rows and delete rows if he/she does not want them to be exported in the final resolved table. Finally the user can click on “Export Data” button, specify the destination file name and export the resolved data.

4 Verification Phase

The last phase of the project was to test and validate the code to see if there were any unhandled errors or exceptions due to improper user input. The team successfully tested the code using manual testing to achieve it. Some potential problems were identified and enlisted as part of this phase. They are as follows:

1. If the data source files are password protected the application may not be able to open the files. In such a case the application will give an error.

2. Flat files may contain special characters and the application may have a problem if the content contains the same character which is used for delimitation. In this case the user will have to change the delimiter for the file.

Product Features

This section discusses the various features provided by DI&R starting from data source selection to export.

1 Select the data source

Our prototype system allows the user to specify the location of the data source, which can be a delimiter-separated flat file, an Excel spreadsheet or an Access database. The user is then prompted to provide the parameters depending on the type of data source selected.

[pic]

Fig 6 Select the type of the data source

3 Data source of type flat file

For a flat file the user is prompted to specify the location of the data source file and the delimiter used to separate the fields within a record. The user can also specify whether the first row in the file has the column headers. Each record is assumed to be stored in one line by itself. The user can specify the delimiter and whether the first row in the flat file contains the column headers.

[pic]

Fig 7 Select a flat file as a data source

4 Data source of type Access

For Access database the user is prompted to specify the location of the data source file. All the tables in the selected database file are then displayed and the user can select one data table at a time as the data source and can view the data present in that table. The primary key of the selected table is displayed to the user.

[pic]

Fig 8 Select Access Data Source

5 Data source of type Excel

For Excel file the user is prompted to specify the location of the data source file. All the worksheets in the Excel file are displayed and the user can select one of the worksheets as the data source and can view the data present in that worksheet. User will have to specify whether the first row in the file is the column header. If multiple sheets are present in single Excel file, the list is displayed for user to select any one sheet. User can specify if the first row contains column headers

[pic]

Fig 9 Select Excel Data Source

6 Select data rows

The user can select rows from the various data sources. The user can toggle on the selection of rows with a mouse click.

[pic]

Fig 10 Select data rows from open data sources

7 Clear Selection

The user can select or unselect rows by clicking on the rows.

Selected rows in a data grid can be unselected by clicking the “Clear Selections” button present in the respective grid form. A global button is provided to clear the selected rows in all the open data sources.

8 Export Data

The user selects Data Operations>> Export Data Preview from the menu.

[pic]

Fig 11 Select Data Operations >> Export Data Preview

A list of the open data sources is displayed. The user can check/uncheck the data sources he wishes to preview before exporting.

[pic]

Fig 12 Select data sources for export preview

The data rows selected from the above checked tables will be displayed for export preview.

[pic]

Fig 13 Final preview of export data feature

Finally the user can export the finalized selection to a desired Excel file. The final preview can be exported to an Excel file at any desired location. The user can also delete unwanted rows. On deletion of a row the preview is refreshed.

[pic]

Fig 14 Select destination to export data

[pic]

Fig 15 Export data successful

Technologies Selected

Following the requirements of the mentor, we developed our prototype in .NET 2.0 and C# using Visual Studio 2005 under Windows XP SP2.

Implementation Issues

The implementation issues encountered by the team were mostly related to the limitations of the data sources and the ambiguity resulting from these limitations. As a result we have identified two problems we had while implementing the export feature and opening a flat file.

1 Exporting the data to Access files

During the testing of the product it was discovered that queries used for creating tables in an access database file are limited by length. When a data set with limited number of columns with short names is exported, the query gets executed and the resultant dataset is stored as a table in the Access database file. However, if a dataset with many columns and/or having columns with long names is being exported, the database throws an exception for a query with a longer length. In this case, the export feature of the application will fail resulting in an error.

Solution:

The problem being an inherent part of Access database, the team discussed with the Mentor and reached the conclusion that it is not feasible to include exporting of data to Access files in the application. The use of data sources with many columns, by the client, makes it even more practical to keep this feature pending till an alternative is provided in Access for using larger queries.

2 Exporting data to Excel files:

While exporting the data from the resolved dataset to an Excel by creating a table and inserting the values row by row the application was generating an exception that the file is being used by another process.

Solution:

This problem was solved by using the WriteXml () method of the System.Data.DataSet class which writes the contents of the dataset as an XML file. This XML file can be saved with the Excel Sheet file extension.

3 Use of delimiters in flat files

The delimiters in flat files may get repeated as special characters or there may be an error in the input from user for the delimiter to be used while opening the flat file data. In either case the application will be unable to process the data from flat file properly.

Solution:

Since this problem is related to formatting of the flat file itself and the users choice for the delimiter, the application will process the flat file with the consideration that the user has provided the correct delimiter for the properly formatted file. If any problems are encountered, the application will stop the processing of the flat file and report an error to user. The user is prompted to provide a file that contains the exact specified delimiter, removed if present in the data.

Conclusion

In this project, the application developed accesses data from different data sources and exports it into an excel file. During this operation the user can merge two or more tables and create a new set of data to export. A well defined, Model-View-Controller architecture is followed in the development of the application. Also, each entity like Model, View and Controller is split into components. This helped the team to cope up with the changes in requirements and to integrate the work done by other team mates. Good coding standards are followed while writing the code along with proper documentation, which helped other project mates to understand the scope, type and reference of the variables declared and also understand the flow of the application.

The application has a well defined user interface which can be used by a user who is not a professional. The buttons and menus used are self explanatory and simple colors are used, to make the application legible to all users.

SRS and meeting minutes were updated and documented when ever needed, it helped the team to keep track of the work and changes required in the application.

The team has successfully completed the requirements for the project. The further enhancement of the application is documented and discussed in SRS document.

References

[1] Wikipedia



[2] Documentation



[3] MSDN forums



Appendix A: Team Information and Roles

Amruta Mehendale - Team Leader, Programmer, Technical Writer

❖ Team Leader

Being the team leader her role was to identify the project tasks and distribute the tasks evenly among the team, organize and monitor team meetings and communicate with the instructor and mentor regarding the progress of the project.

❖ Programmer

In the role of a programmer the tasks involved were to setup the environment required to develop, run and test the application, coordinate with other programmers in the team and develop the modules assigned. Testing and debugging was also done in coordination with other team members. She was mainly involved in the development of the user interface and worked on the forms like frmMainForm.cs, frmSelectDataSource.cs, frmSelectDataTables.cs, frmDataTables.cs

❖ Technical Writer

As a technical writer she has worked on documenting the requirements, design and implementation of the application. Communication with all members of the team as well as the team mentor was done and the information provided by them was used to detail the tasks performed.

Abhisheik Deo - Architectural Designer, Programmer, DBA

❖ Architectural Designer

He was involved in the architectural design of the project and used the set of requirements gathered by the team and developed models and diagrams for designing the architecture of the application. He designed the structure of classes for the business logic and user interface. He coordinated with the team leader to ensure that the implementation of the application confirms to its architecture.

❖ Programmer

On the programming side, he was involved in developing the business logic. He mainly worked in the classes of BaseConnector.cs, ExcelConnector.cs and AccessConnector.cs

❖ DBA

We have used an already existing public member’s database from Indian Students Association (ISA). This database was required to be cleaned up and formatted as per the test cases designed to execute the project. The role of the DBA is limited in our project as there was no requirement to design and maintain a database.

Sanjeev Gowda - Architectural Designer, Programmer, Technical Writer

❖ Architectural Designer

He was also involved in the architecture design of the project. He used the set of requirements gathered by the team and developed the high level architecture diagram and use case diagram.

❖ Programmer

On the programming side, he was involved in developing the business logic. He mainly worked in the classes of FlatFile.cs and DataBaseLogic.cs

❖ Technical Writer:

As a technical writer he has worked on documenting the requirements of the project, Waterfall model and its phases.

Aiyaz Paniwala - Web Master, Programmer, Technical Writer.

❖ Webmaster

He was the webmaster for the project. He created and maintained the team website. The website can be viewed at

His tasks also involved uploading all the project documents on the website and maintaining the versions of these documents.

❖ Programmer

In the role of a programmer he was involved in the development of the user interface and worked on the following forms like coordination with the team leader: frmMainForm.cs, frmSelectDataSource.cs, frmSelectDataTables.cs, frmDataTables.cs.

❖ Technical Writer:

As a technical writer he has worked on documenting the requirements of the project and the first version of the project report.

Appendix B: Task Distribution

The project was divided into solid tasks depending upon the phase of development. The tasks were then assigned to the team members depending upon the skill sets possessed by each member.

|Task |Description |Individual Contribution in % |

| | |Amruta |Abhishek |Aiyaz |Sanjeev |

|Team setup |Forming the team and gathering initial |40 |25 |15 |20 |

| |information about the project. | | | | |

|Project website |Create and update the project website. |25 |5 |65 |5 |

|Project requirements |Understand the project requirements through |25 |25 |25 |25 |

|and roles |discussions with the mentor and defined roles. | | | | |

|Environment setup |Setup the environment for .NET framework. |5 |65 |5 |25 |

|Software Requirements |Build the Software Requirements Specifications |50 |10 |10 |30 |

|Specification |document based on the client requirements. | | | | |

|High level architecture|Create a high level architecture diagram based |0 |0 |0 |100 |

|diagram |on the requirements specified by the mentor. | | | | |

|Sample database |Collect different types of data source samples |0 |30 |70 |0 |

| |for analysis | | | | |

|Project plan |Based on the specifications of the client and |100 |0 |0 |0 |

| |the available resources prepare a project plan | | | | |

| |to define the tasks. | | | | |

|Prototype |Develop a prototype with possible user |25 |25 |25 |25 |

| |interface features. | | | | |

|Selecting data from MS |The user can view the data sources in separate |50 |0 |50 |0 |

|Access 2003, MS Excel |windows. He can even re-arrange the windows to | | | | |

|2003 and Flat File. |have a better view. | | | | |

|Select multiple rows |The user can perform operations like selecting |0 |100 |0 |0 |

|from a data sources. |multiple rows without holding the ‘Ctrl’ key; | | | | |

| |clicking a row will toggle the selection. | | | | |

|Clear selection |The user can clear selections by clicking |0 |0 |0 |100 |

| |“Clear Selections” on each panel. The user can | | | | |

| |clear the selected rows in all the open data | | | | |

| |sources by clicking Edit > Clear All | | | | |

| |Selections. | | | | |

|Export data to an Excel|The user specifies the Excel file to which the |0 |0 |100 |0 |

|sheet |data has to be exported | | | | |

|Preview data from |A dialog box will be displayed showing a list |0 |50 |0 |50 |

|selected tables. |of all open data sources. The user can then | | | | |

| |check/uncheck the data sources he wishes to | | | | |

| |preview. | | | | |

|Show the preview of the|The user can view the selection made by him on |0 |45 |55 |0 |

|selected rows. |all the data sources checked in the above step | | | | |

|Delete unwanted rows |This feature lets the user remove the rows from|0 |50 |0 |50 |

|from the preview |the preview. The user can select the row and | | | | |

| |then click on the ‘Delete’ button | | | | |

Table 1 Task Distribution

Appendix C: Project Timeline

One of the most common tasks in project development is scheduling. There are some common problems which are required to be taken care of. One such problem is to deal with events which depend on one another in different ways and to take care of the tasks dependent on one another. Resource scheduling was done which includes scheduling people to work on tasks, and resources required by the various tasks. The team had to deal with uncertainties in the estimates of the duration of each task and had to rearrange tasks to meet various deadlines. We as a team were able to overcome the above issues and scheduled and prioritized the task accordingly. The timeline of our project is given in table 1

|Tasks |Assigned To |Week |

| | |# 1 |#2 |#3 |#4 |

|Project Website |Aiyaz |Week #1 – Week | |

| | |#3 | |

|High level Architecture |Sanjeev | |Week #2 – | |

|Diagram | | |Week #5 | |

|Software Requirements |Amruta, Sanjeev | |Week #2 – | |

|Specification | | |Week #5 | |

|User Interface |Abhisheik | | | |

|Maintaining the project |Aiyaz |Week #1 – Week #15 |

|website | | |

|Connector Logic |All | |Week #5 – Week #9 | |

|Data resolution |Abhisheik, Amruta | |Week #4 – Week #10 | |

|Data Selection and Export |All | |Week #9 – Week #13 | |

|Documentation |All | |Week #2 – Week | |Week #13 – Week #15 |

| | | |#6 | | |

Table 2 Project Timeline

Appendix D: Acronyms and Abbreviations

DS: Data Source

DI&R: Data Integration & Resolution.

DB: Database

IDE: Integrated Development Environment

GUI: Graphic User Interface

-----------------------

Model

Controller

View

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download