Data Documentation and Retrieval in a UniVerse Environment ...

[Pages:32]Data Documentation and Retrieval in a UniVerse? Environment

ABSTRACT

Mergers, acquisitions, and good business practices constantly contribute to the growth of small companies into larger conglomerates. This growth means that software and systems that were sufficient to run a company one day may be insufficient the next. Database systems that were poorly designed or documented during a company's incubation period may become obstacles to additional growth. In this paper, we discuss methods for improving access to poorly documented databases in a UniVerse? environment. We detail approaches involving ODBC access to the UniVerse environment and the usage of UniVerse programs to improve access to the same data using examples from a grocery distributor's current database system.

INTRODUCTION

Even as some companies lose market share and stock value, others remain strong

and even grow. This growth means that software and systems that were sufficient to run

a company one day may be insufficient the next. Database systems that were poorly

designed or documented during a company's incubation period may become obstacles to

additional growth. This paper will discuss automated approaches to documenting the

tables and columns of a database system in order to improve access and overcome these

obstacles. This paper will also cover methods used to access data that is poorly

documented.

Motivation

During the growth cycle of most companies exists the inevitable need to

implement better software that fulfills new requirements. At the turn of the century, the

requirement driving many companies was year 2000 compliancy. The need to modify

programs to handle four digit rather than two digit years was an essential part of making

it to the new century as a viable company. Given that a major expenditure was to be

made on upgrading software, some companies, such as the one discussed in this paper,

decided to acquire software that would do more than simply make the company year 2000 compliant.

The software required to run the day-to-day operations of a grocery distributor with products numbering close to 50,000 and ranging from single lipsticks to pallets of soda, and thousands of customers ranging over much of the Midwest, is not at all the type of "install and run" software you might use for handling office documents. Software for running a distribution center is only useful after extensive setup and customization. If any customization or development is to take place after the implementation of this type of software, then the company developers must thoroughly understand the system and the underlying software.

In an ideal situation, a software development company creating a piece of software for sale to one of these companies would document it and understand it before selling it. In the same ideal situation, a software development company that creates and sells a suite of software using a database with hundreds of tables would understand and document the complex interactions of these programs and tables. Each column of every table would be documented and, hopefully with little effort, the developer would be able to tell which programs required each column. Not all programmers are software engineers, however, and not all companies that create software do it using appropriate methods. The type of documentation described above takes time, and time is money, especially in environments where hourly rates charged to customers are in the hundreds of dollars and projects range in the hundreds of thousands of dollars. Given a limited budget, a client company might choose additional functionality over complete documentation and therein lies the root of the problem. When it comes to software

systems of any magnitude, incomplete documentation is equivalent to no documentation. Any additional changes to the software or database system must be researched heavily, and the only testing that will suffice is full integration with the "live" suite.

Once a client company has made a decision for functionality over documentation, it is up to the client company's programmers and analysts to take up the challenge of documenting the suite of tables and programs and to integrate them with other business tools. Depending on the size of the database involved, it may be necessary to develop tools for automating documentation of and access to the data stored within it.

Outline of Paper

This paper deals specifically with the documentation and access efforts of a grocery distributor. In order to describe the obstacles they faced and the methods they used to overcome them, it is first necessary to describe the database product they use. Once the details of their situation are clear, we will review the problems they faced in accessing the information in their database from outside the programs they purchased. We will examine why standard methods existing for accessing their data worked poorly and study methods they used in their stead. The final part of this paper discusses the results of the distributor's efforts and their plans for further development. Although this paper presents a specific documentation scenario, the insights and approaches presented are applicable to other environments.

Throughout this paper we will examine: 1) The methods for data documentation and access to information stored in a UniVerse? database. 2) The shortcomings of using UniVerse ODBC connectivity as a method for accessing a multi-valued database such as UniVerse.

3) Methods that improve ODBC client access to UniVerse by using documentation to compensate for naming and ODBC access challenges.

4) The use of a semi-automated documentation and access system called Unity on the UniVerse database.

5) The development of UniVerse source extraction tools that a. Improve ODBC access to UniVerse. b. Dramatically increase the efficiency in constructing reports and data access programs on the host system.

The contribution of this work is an evaluation of the shortcomings of standard

access technologies such as ODBC, and a description of methods for annotating and

documenting data that improve programmer efficiency when manipulating and extracting

data from complex systems.

BACKGROUND

Business environment

The software environment used by the grocery distributor, which we will call GD

for short, is a database system under UniVerse comprising over 500 data tables with a

total size approximating 80 gigabytes. Each of these data tables has anywhere from 0 to

over 200 data columns. The number of virtual data columns in each table is not limited

in any significant way. The particular business software being used, like most others,

evolves on a day-to-day basis as a result of shifting priorities and business goals. Much

of the development takes place on the UniVerse system and is done by company

programmers, so any improvements to data documentation should be available to users

and developers using UniVerse as well as those using Open DataBase Connectivity

protocol (ODBC) to connect to the database.

UniVerse Databases

UniVerse is a relational database environment developed by IBM [1] with built-in ODBC connectivity and its own programming language called UniBASIC. Each database under UniVerse is composed of a number of tables. UniVerse tables are not defined using schemas; therefore they are not constrained to hold a given number of columns at the time they are created. UniVerse separates the data portion of the table from the access portion of the table. In order to link the two aspects of the table, each table has a dedicated table dictionary that defines the columns available for the access layer. In comparison to a relational database system, the dictionary can be seen as a "view" into the table. While relational views can only be defined using SQL statements, dictionary items provide methods for performing joins and other database algebra without using SQL. An entry in the table dictionary contains the information required to retrieve and display the data from the column it pertains to in the table. This information can include formatting instructions, conditional statements, and a label for the data, among other things. Each dictionary item describes a column of the table. Each column available for reporting is a virtual column because of the separation between the access layer and the data layer. These columns can be calculated based on data internal to the table, calculated independently of the data in the table, or based entirely on data from another table. The table dictionary does not always contain an entry for each column in the table but may contain more than one entry for each of the columns it does reference.

There are several different types of dictionary entries. Table 1 provides a sampling of dictionary items for the dictionary of the customer table used by the grocery distributor.

Dictionary col col 2 Record ID 1

col 3

CM.NAME30

S

CSTM.NAME.ADDR I

3

NAME

TRIM(&

lt;1,2>:'

':<1,1

>)

col 4 col 5

S

PNET CONTACT 25L

CUST.NAME EU.CUS.NAME FNAME FULLNAME

A0 S3 D8 S2

NAME

S

CUSTOMER

NAME

S

NAME

XREF

6L

BILLTO NAME

S

col 6

S

M

col 7

M7

col 8

col col 9 10

T 20

A;IF 100 # "" THEN 100 ELSE N(NAME) R 20

T 30

A;01:"*":02 TCM;C;3;3 L 20

Table 1: A sample listing of dictionary records from the customer file.

The columns labeled "col 1" through "col 10" represent attributes of the UniVerse

dictionary items. Attributes in UniVerse correspond to columns of a data table. The type

of the dictionary entry is determined by the first attribute of the entry (col 1). This type

determines the structure of the rest of the entry. Appendix 1 gives a series of structures

for the most common dictionary entry types in UniVerse. Column 2 for "S" type

dictionaries determines which data column is referenced by the dictionary entry. Notice

that CM.NAME30 and EU.CUS.NAME reference the same column. The other columns

of the dictionary entry determine the difference between the resulting data when the

dictionaries are used.

The native programming language of UniVerse is UniBASIC, a compiled version

of BASIC with extensions used to work with the databases directly. Programmers do not

need to know the details of the hashing algorithm used to access the data portion of the

table, but they do need to know in what column number the particular piece of

information they are working with is stored. Generally, dictionary items are not used

within UniBASIC programs. Along with UniBASIC, UniVerse also utilizes several

procedural languages for automating data flow and user interfaces. UniVerse also provides a powerful query language for generating reports. This query language is called RETRIEVE [2]. Programmers can design and execute RETRIEVE reports within UniBASIC by painstakingly concatenating strings that define a RETRIEVE statement using dictionary items. There is no built-in utility in UniVerse to assist in this process. Along with RETRIEVE, an SQL variant is also available within UniVerse for executing local queries. This SQL variant is used by the ODBC server to fulfill client requests. All of these tools are primarily used through telnet clients, which means there is no graphical interface for them. Even so, these tools make the UniVerse environment a powerful tool for building enterprise software.

UniVerse presents several different methods of accessing the information in a table. A simplified record from the customer file of the grocery distributor might look like this:

01~005267~JOSE JIMENEZ~IA~~O~01~JJ|52241|ST*IA

The delimiting characters in a UniVerse record are nonprintable ASCII characters so they have been changed. The end of attribute marker has been replaced with a "~" and the end of value marker has been replaced with a "|". If the given record had the key of "01*005267" and the name of the table was "CM," then the data could be displayed by entering a RETRIEVE command that uses two of the example dictionary entries listed previously. A simplified form of the statement is

LIST [] []

An example command would be:

LIST CM 01*005267 CM.NAME.30 EU.CUS.NAME

Executing this command would result in the following output:

CM........ Name.......................... CUSTOMER NAME......................

01*005267 JOSE JIMENEZ

JOSE JIMENEZ

Note that while the information printed in each of the two name columns is the same, the

length of the data presented is different and so is the column header. The details of that

transformation are in the dictionary entries. Accessing the same information via

UniVerse BASIC would be done with the following code:

OPEN "CM" TO CM ELSE STOP READ CMR FROM CM, "01*005267" ELSE STOP PRINT "CM........ ": PRINT "Name.......................... ": PRINT "CUSTOMER NAME......................" PRINT "01*005267" 'R#10':" ": PRINT CMR 'T#20': PRINT CMR 'T#30' STOP

UniVerse BASIC uses the convention of using angle brackets, < and >, to designate

accessing information within a variable that has multiple attributes and values. CMR references the third attribute in the variable CMR, which is a record read from the CM file using record id "01*005267." The code then prints out a header and the data similar to the output of the RETRIEVE statement. Using SQL, the same could be accomplished by the statement:

SELECT CM.NAME30, EU.CUS.NAME FROM CM WHERE KEY = '01*005267';

Unlike relational databases, UniVerse has native support for accessing multi-valued information. In order to do this using RETRIEVE, it is only necessary to add the name of a dictionary entry that references a multi-valued attribute such as FNAME. Listing that value in the same file would return:

CM........ Name..

01*005267 JJ 52241 ST*IA

1 records listed.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download