Unit 5: Introduction to XML



Unit 5 Introduction to XML

Structure

5.1 XML definition and goals

Introduction

Objectives

HTML vs. XML

Importance of XML in Business

Applications of XML

Advantages and Disadvantages of XML

5.2 Simple XML document

Well-formed XML document

5.3 XHTML and X Secure

Definitions and differences

5.4 Using XML for Analysis

Data Warehousing

Data Marts and Operational Data Stores

XMLA (XML for analysis).

5.5 Summary

5.6 Glossary

5.7 Terminal Questions

5.8 Answers

5.1 XML definition and goals

Introduction:

In the previous units you would have learnt about HTML. With HTML you can display the contents of an html file in a visually appealing way on the web browser. However, if you want to send data across the web like to automatically update a data base at a remote site on internet, you cannot do it by HTML. This is because of the fact that there is no way to know the meaning or semantics of the text contained between the HTML tags.

For example:

100

The above does not convey what the content 100 means. Is it a quantity of some product code or rate or what does it denote?

Instead, if you can mark up as:

123456

100

In the above example it is very clear as to what the number 123456 and 100 denote. You may then write a program at the remote computer to parse and programmatically ‘understand’ these tags and update the data base.

You can do ‘transporting and sharing’ of data using XML. You can also invent your own tags like and to mark up your data in XML. By now, its use in e-commerce applications becomes evident to you.

XML is an acronym for eXtensible Mark-up Language(XML), a data format used primarily for sharing data. It looks similar to HTML, but has a much tighter syntax than HTML. XML is designed to transport, store, share and ‘use’ data, with focus on what the data is about.

HTML is about how data looks like, whereas XML is about transporting and sharing information.

Objectives:

After studying this unit you will be able to:

• Describe the basics of XML

• Explain the differences between HTML and XML

• Describe how to create XML documents

• Discuss the different applications of XML

• Explain the role of XML in the Data Warehousing and Data Analysis

History and goals of XML:

XML's design goals emphasize simplicity, generality, and usability over the Internet. Standard Generalized Mark-up Language (SGML), the international standard for marking up data, has been in use since the 80s. It is the precursor to HTML and XML.

XML was developed by an XML Working Group of the World Wide Web Consortium (W3C) in 1996. Finally, in 1998, the W3C approved Version 1.0 of the XML specification and a new language was born.

The important design goals of XML are:

1. XML shall be straightforward and easily usable over the Internet.

2. It shall support a wide variety of applications.

3. It shall be compatible with SGML.

4. It shall be easy to write programs which process XML documents.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6. XML documents should be human-legible and reasonably clear.

7. XML documents shall be easy to create.

For more details, you may refer

HTML vs. XML

XML and HTML are designed with different goals.HTML was designed to display data, with focus on how data looks. On the other hand, XML is designed to transport and store data, with focus on what data is.

XML and HTML are different and they both have different goals. They are designed for different purposes. Some people think that xml is an advanced version of html and it has come to replace html. However, that is not the case. Both will be exist as they are used for different purposes.

XML defines rules to mark-up a document in a way that allows you to express semantic meaning in the mark-up. XML does not necessarily restrict you to certain tags (elements) as HTML and XHTML does. XHTML stands for Extensible Hypertext Mark-up Language. It is, as you are aware already is XML compliant.

Like HTML, XML is a simplified derivative of SGML. Unlike HTML, XML is extensible. In that you can create your own elements and attributes, with which you create structural mark up. This allows you to choose which aspects of the content to describe.

XML files are meant to hold data and data in an XML file is well described. If you look at an XML file you can say what it holds. For example if you find a number in an xml file you can find out easily what that number identifies, whether it is the number of products, or the price of a product, etc. In HTML it is not the case.

XML looks similar to HTML, but has a much tighter syntax than HTML.

HTML is used to display the data in a formatted way. You can apply styles and use different layouts to display the data in an HTML file. The data that is displayed in an HTML file could come from an XML file also.

HTML is about displaying information, while XML is about carrying information.

To summarize:

1. HTML is presentation language, whereas XML is used to transfer data between applications and databases.

2. HTML is not case-sensitive, whereas XML is case-sensitive.

3. In HTML, you cannot define your own tags, where as in XML you can define your own tags.

4. In HTML, it is not required to close all tags. You have to use only the defined standard tags, where as in XML it is mandatory to close each and every tag with a matching closing tag.

5. XML describes the data for sharing where as HTML only defines the data for display.

6. HTML files have either.html or .htm extensions. XML files have .xml extension only.

From html 4.01 and above, the tags are defined in accordance with XML syntax. Current XHTML / HTML documents use a subset of tags defined by W3c XHTML standard XML DTD. This makes all your html files as XML compliant documents. XHTML files also have .html extension only.

Let us now look at an XML file. A simple XML document can be created using any text editor like ‘Notepad’. It has to be stored with the file name extension.xml for example myfirstfile.xml

The following table 5.1 gives XML example file name letter.xml

| |

| |

| |

|Peri Sastry |

| |

| |

|Alliance |

| |

| |

| |

|Dear Sir, XML is a language understood both by people and computers. |

| |

| |

|Thank you very much |

| |

| |

| |

Table 5.1 Simple XML document1 file name letter.xml

Almost all browsers can read and validate XML files and display them. But they will not be able to process the XML file. For that you need programs in Java or etc. But, you can always open any .xml file in a web browser like Internet Explorer. The file letter.xml looks as given below when ‘opened’ in the Internet Explorer browser.

[pic]

Figure 5.1 Browser view of simple XML document1 file name letter.xml

Importance of XML in Business:

There are already hundreds of serious applications of XML on the internet. Businesses can make their websites useful in different ways, and make it easier for their clients to interact and use it online. XML makes the applications on the web faster and easier to transact with. We can now submit resumes and the ‘server side programs’ can update their databases easily. E-commerce transactions can be executed on the web using HTML in collaboration with XML, and Java / ASP. XML enables the businesses to communicate with each other and share data on orders and supplies etc; through standard XML and XML based mark-up languages.

ebXML is one of the XML based mark-up languages. It was started in 1999 as a joint initiative between the United Nations Centre for Trade facilitation and Electronic Business (UN/CEFACT) and Organization for the Advancement of Structured Information Standards (OASIS).

Electronic Business can be built using XML, and ebXML (pronounced ee-bee-ex-em-el) as it is typically referred to. It is a family of XML based standards sponsored by OASIS and UN/CEFACT whose mission is to provide an open, XML-based infrastructure that enables the global use of electronic business information in an interoperable, secure, and consistent manner by all ‘trading partners’.

The ebXML architecture is a unique set of concepts; part theoretical and part implemented in the existing ebXML standards work. The ebXML work stemmed from earlier work on OO EDI (Object Oriented EDI), UML / UMM, XML mark-up technologies and the X12 EDI ‘Future Vision’ work sponsored by ANSI X12 EDI.

You can observe the importance of XML in business by referring to the fact that now there is XML/EDIFACT. XML/EDIFACT is an Electronic Data Interchange format used in Business-to-business transactions. It allows EDIFACT message types like purchase orders etc; to be used by XML systems.

EDIFACT is a formal language for machine readable description of electronic business documents. It uses syntax close to delimiter separated files. This syntax was invented in the 1980s to keep files as small as possible. Because of the Internet boom, XML started to become the most widely supported file syntax. But for example, an invoice is still an invoice, containing information about buyer, seller, product, due amount. EDIFACT works perfectly from the content viewpoint, but many software systems struggle to handle its syntax. So, combining EDIFACT vocabulary and grammar with XML syntax makes XML/EDIFACT very useful for internet based E-commerce transaction processing by computers automatically.

Thus, XML has become highly valuable and important to Business.

Applications of XML:

XML Applications:

XML is a general-purpose specification for creating custom mark-up languages. The term extensible is used to indicate that a mark-up language designer has significant freedom in the choice of mark-up elements. XML's goals emphasize representing documents with simplicity, generality, and usability over the Internet. XML has been used as the basis for a large number (at least hundreds) of custom-designed languages. Some of these, for example RSS, Atom, and XHTML, have become widely used on the Internet. XML dialects (often packaged in archive files) are becoming the default file format for office-productivity software packages, including Microsoft Office, , and Apple's iWork.

BizTalk, eCo and

• BizTalk framework

o loose grouping of many XML technologies by Microsoft

o describes how to publish schemas in XML and to integrate programs using XML messages

o includes a vocabulary for wrapping XML documents in an "envelope" which manages message routing and security

o no pre-defined document types such as purchase order

• eCo framework

o developed by CommerceNet, a business consortium

o allows e-commerce systems to describe themselves, their services and their interoperability requirements

o will take account of and complement other specifications

• registry

o aids interoperability by publishing a range of specifications, schemas and vocabularies

o attempts to prevent the duplication or overlapping of work that already exists

cXML

• cXML (Commerce XML) developed by Ariba and 40 software companies

• a proposed standard for catalogs and purchase orders

• defines request/response messages for purchasers and sellers

• simplified, small fragment of the cXML DTD

o

o

o

o

o

EbXML

• initiative undertaken by:

o UN/CEFACT: United Nations body for Trade Facilitation and Electronic Business

o OASIS: Organization for the Advancement of Structured Information Standards

• vision is a global electronic marketplace, where enterprises of any size can:

o find each other electronically

o conduct business by exchanging XML-based messages

o use off-the-shelf business applications

Open Financial Exchange (OFX)-

• Is a technical specification created by Intuit, CheckFree and Microsoft

• allows financial institutions to communicate account transactions between themselves and their clients

• originated as an SGML application

• supported by accounting packages such as Microsoft Money and Intuit's Quicken

• supports 4 kinds of services

o banking

o bill presentation

o bill payment

o investment

Wireless Mark-up Language (WML)

WML is based on XML. It is a mark up language intended for devices that implement the Wireless Application Protocol (WAP) specification, for example: mobile phones.

Some e-commerce initiatives

• Trading Partner Agreement Mark-up Language (tpaML): defines the contractual terms and conditions for trading partners

• RosettaNet: electronic business interfaces for supply chain management

• Extensible Business Reporting Language (XBRL): for the preparation and exchange of business reports and data

• Business Process Modeling Language (BPML)

• Business Rules Mark-up Language (BRML): part of IBM investigation into rule-based business processes for e-commerce

• Visa XML Invoice Specification

Many domain specific industry initiative organisations have created XML subsets for use in data formatting, transporting and sharing through out the internet. Many are driven by standards bodies like W3C. Some of them are ready to use and others are under development and standardisation process. Some are listed below:

Example XHTML document:

Hello world!

xhtml tags has three DTD, Strict, Transitional and

Frameset

CML

Chemical Mark-up Language. Example CML document snippet:

C O H H H H

-0.748 0.558 -1.293 -1.263 -0.699 0.716

WML

Wireless Mark-up Language for WAP services:

Hello World

ThML

Theological Markup Language:

Having a Humble Opinion of Self

EVERY man naturally desires knowledge

Aristotle, Metaphysics, i. 1.

;

but what good is knowledge without fear of God? Indeed a humble

rustic who serves God is better than a proud intellectual who

neglects his soul to study the course of the stars.

Augustine, Confessions V. 4.

There is a long list of many other XML applications.

You can get more information from:

Advantages and Disadvantages of XML

Advantages:

1. XML is an easy to learn, structured mark up language for HTML users

2. XML document content is generally ‘text’. So it is universally usable for data sharing

3. XML is a true subset of SGML designed for use on the Internet

4. It is easy to develop programs which can process XML documents

5. XML, is a standard mark up language maintained by w3c

6. It is both a human-readable and machine-readable language

7. It supports Unicode, allowing almost any information in any written human language to be communicated

8. It can represent the most general computer science data structures: records, lists and trees

9. It has a self-documenting format that describes structure and field names as well as specific values

10. The strict syntax and parsing requirements make the necessary parsing algorithms extremely simple, efficient, and consistent

11. XML is heavily used as a format for document storage and processing, both online and offline

12. It allows validation using schema languages such as XSD and Schematron, which makes effective unit-testing, firewalls, acceptance testing, contractual specification and software construction easier

13. The hierarchical structure is suitable for many types of documents

14. It is platform-independent, thus relatively immune to changes in technology

15. Forward and backward compatibility are relatively easy to maintain despite changes in DTD or Schema

16. XML is commonly known as self-documenting.

Disadvantages of XML:

1. XML syntax is redundant or large, relative to binary representations of similar data

2. The redundancy may affect application efficiency through higher storage, transmission and processing costs

3. XML syntax is too verbose relative to other alternative 'text-based' data transmission formats

4. No intrinsic data type support: XML provides no specific notion of "integer", "string", "boolean", "date", and so on

5. The hierarchical model for representation is limited in comparison to the relational model or an object oriented graph

6. Expressing overlapping (non-hierarchical) node relationships requires extra effort

7. XML namespaces are problematic to use and namespace support can be difficult to correctly implement in an XML parser

8. XML requires a processing application to be useful for ‘data processing’.

Self-Assessment Questions

1. XML is designed to transport and share data over internet. (True / False).

2. XHTML is ________ compliant HTML.

3. XHTML is for displaying data. (True / False).

5.2 Simple XML document:

A simple XML document will have a Document Type Definition (DTD) that defines the structure in terms of a hierarchy of ‘elements’, starting from ‘root’ down through ‘children’ ‘siblings’ and other lower levels of nodes. The content of elements is the ‘CDATA’ or character data.

DTD will define the document or root level element and all other elements. An inline DTD is a DTD where the XML document follows the DTD immediately in the same file.

You can observe the structure of the DTD at figure 5.2, and XML file of a Bltr1.xml XML file as given below in table 5.2 and the browser view of Bltr1.xml at figure 5.3

[pic]

Figure 5.2 Structure of Bltr1.xml Document Type Definition (DTD)

XML code of Bltr1.xml is in the table below.

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

|]> |

| |

| |

| |

| Dr Sastry |

| |

| |

|8th Block Koramangala |

| |

| |

| |

|Bangalore |

| |

| |

| |

|Karnataka State |

| |

| |

| |

|560034 |

| |

| |

| |

| 555-4321 |

| |

| |

| |

| U.B.S.Traders |

| |

| |

|201, MG Road |

| |

| |

| |

|Bangalore |

| |

| |

| |

|Karnataka |

| |

| |

| |

|560001 |

| |

| |

| |

|555-1234 |

| |

| |

| |

| |

| |

|Dear Sir / Madam |

|It is our privilage to inform you about our new databse managed with xml. This new sytems allows you to reduce the|

|load on your inventory server by having the client machine perform the work of sorting and filtering the data. |

| |

| In an XML |

|generally the content of an element is both machine and human understandable. so plain text can update data bases.|

| |

|Sincerely |

|Dr Sastry |

| |

| |

| |

| |

Table 5.2 Bltr1.xml with Document Type Definition (DTD)

Well-formed XML document:

All well-formed XML documents must:

• Have only one root element

• All elements must have start and end tags

• Nesting of elements / tags must be in order and not jumbled-up.

• The declaration is generally the first line

• White spaces in XML file occupy blank space unlike in html

• XML tag are case sensitive

• Be created as per structure defined by DTD

The browser view of business letter xml file Bltr1.xml and the XML validation output are shown in the figure below.

[pic]

Figure 5.3 Browser view of Bltr1.xml

The DTD can be stored separately and can be referred in the xml file. Generally the DTD must be accessible in the same folder as the xml file for validation. Otherwise it has to be accessible from a web site as in xhtml documents. An example xhtml document with external DTD in a web site ‘name space’ "en"%20lang="en“ is given below in the table 5.3

The key word PUBLIC in DOCTYPE reference means that the DTD is on a public resource like a web site.

The key word SYSTEM in DOCTYPE reference means that the DTD is on a system resource like local directory where the xml file resides.

| |

| |

| |

| |

| Transitional DTD XHTML Example |

| |

| |

|This is a transitional XHTML example |

| |

| |

Table 5.3 an xhtml document refers to XML DTD for tag interpretation

This xhtml file is saved as xhtmlExampleWithDTD.html. Note that xhtml files also are stored with .html extension only.

Now you can infer that for all html 4.01 and above, the tags are defined in the namespace referred above. Current xhtml / html documents use a subset of tags defined by this DTD which makes all your html files as XML compliant documents.

The browser view of the above xhtml file xhtmlExampleWithDTD.html is given below.

This is a transitional XHTML example

• The following are simplified rules for creating inline DTD. The syntax is exactly as given in table 5.2.

A DTD must Start with

• A child element which has no further child elements is a leaf node and has data format declared. Example

• Elements can have data as #PCDATA which means the type of data values are character data

The data types can be different like pictures etc;

• Empty elements have no data type and can be place holders for future inclusion of images etc;

• Element names are case sensitive-

• Elements can be post-fixed. (post fix "+" denotes one or more occurrences; "?" denotes zero or one occurrence; and "*" denotes zero or more occurrences.

Any text editor can be used to create XML documents. Dreamweaver is not an xml editor. Still Dreamweaver provides for creation of xml documents.

In dreamweaver. You use menu

Open → new → Basic Document → XML to create the same.

Dreamweaver also provides for creating XML documents from the editable regions of templates. You can also import and export XML documents.

You can save MS Excel worksheets as XML files by simple save as ‘file type’ choice.

Many Relational Data Base Management Systems(RDBMS) provide for conversion of table data as XML documents and vice versa.

Self-Assessment Questions

4. XML files have ____________ extension.

5. XML documents can be validated by browsers. (True / False).

6. XML tags are defined in ______________________________.

5.3 XHTML and X Secure

Definitions and differences

EXtensible HTML (XHTML) is a mark-up language for Web pages from the W3C.

XHTML combines HTML and XML into a single format similar to XML, XHTML can be extended with proprietary tags. Also like XML, XHTML must be coded more rigorously than HTML. It is because the Web browser software was originally written to tolerate many variations in HTML coding, you may not close some tags in normal HTML documents. Also you may nest tags even wrongly. All these are tolerated in HTML. But these are not tolerated in xhtml.

With XHTML, you must conform to the rules. xhtml. tags must be closed and nesting of tags must be proper. Generally xhtml tags are in lower case. But if the DTD of html is ‘Transitional’ both upper case and lower case tags are accepted. Also ‘strict’ tag syntax is not checked. It is almost like OLD HTML. This feature is provided for backward compatibility of browser support.

There are three DTDs specified by W3C for XHTML. They are:

Strict: DTD/xhtml1-strict.dtd

Transitional: DTD/xhtml1-transitional.dtd

Frameset: DTD/xhtml1-frameset.dtd

You can see more details of these DTD at

Already you have seen one XHTML document in table 5.3.

Example of another XHTML document is given below in table 5.4.

| |

| |

|Hello world! |

|xhtml tags must be closed and |

|nesting of tags must be proper |

| |

Table 5.4 Example 2 of xhtml document, file name xhtmlExample2.html

The browser displays the above xhtml file xhtmlExample2.html as below:

[pic]

X Secure:

X Secure is an XML security specification based on XML Signatures. The XML signatures are applied to the digital content (data objects) via an indirection. Data objects are digested ie; a message digest is created, the resulting value is placed in an element (with other information) and that element is then ‘digested’ and cryptographically signed. XML digital signatures are represented by the Signature element. Table below gives the structure:

(

()?

)+

()?

()*

Table 5.5 XML secure Signature element

Note: "?" denotes zero or one occurrence; "+" denotes one or more occurrences; and "*" denotes zero or more occurrences.

You can know more about XML Key Management Specification (XKMS) , Public Key Infrastructure (PKI) and Digital Signatures from text books on data encryption.

Self-Assessment Questions

7. XHTML files have .xhtml extension (True / False).

8. XHTML documents follow tags defined in the _______ standardised by W3C.

9. Security in XML is achieved by XML digital signature scheme.(True / False)

5.4 Using XML for Analysis

Metadata representation (Metadata means Data about data) is one of the strengths of XML. As you might have observed, DTD is a means of defining meta data. When you build data warehouses (large, organized, past data archives of Databases) you need to do ‘Extraction’ of data from multiple data banks, ‘Transformation’ of the same into the ‘data warehouse formats’ and ‘Loading’ the same to the data ware house database. This step in data warehouse building is a vital one and is known as ETL (Extraction, Transformation and Loading). All this involves use of XML formats and custom tags and XML file processing programs, for automatic creation of versions of data warehouse data bases.

Metadata and XML are also used for OLAP. OLAP is an acronym for On Line Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling.

Thus, XML is becoming the preferred tool for enabling data analysis.

Data Warehousing:

Data warehouse is a collection of integrated departmental-domain oriented databases.

A data warehouse:

1. supports the Decision Support System (DSS) function of management

2. Analyzes large historical data and

3. Helps to discover trends in sales, customer preferences etc; over time and

4. Helps to arrive at strategic decisions to increase productivity and profitability of organizations

5. Provides an integrated and total view of the enterprise

6. Makes the enterprises current and historical data available for decision making

7. Makes decision support transactions possible without affecting operations

8. Renders the organization’s information consistent

9. Presents a flexible and interactive source of strategic information

Definition: A data warehouse is a subject oriented, integrated, non-volatile, time variant collection of data designed to support management DSS requirements. Data warehousing is a process that involves delivery of information for improving the decision-making and analytical business processes.

Simply stated a data warehouse is a large, organized, past data archives of business databases, to support management decision making and insight.

The characteristics of data in the data warehouse are:

• Subject-oriented: refers to the data that is organised for different departments like sales, purchase etc;

• Non-volatile: refers to the data that is not changed after the data warehouse is loaded with the data. ( Refer ETL above)

• Time variant: refers to the data that maintains the records of both historical and the current data.

• Integrated: refers to the data that is linked to various ‘subjects’ and dimensions like Year, Sale, Product etc;.

The three components of a data warehouse are:

• Data warehouse Database,

• Query tools and

• Database interfaces for ETL, OLAP and Metadata.

Database stores the data that is retrieved from the existing operational databases, such as forecasting, production, sales and accounting. Query tools allow the end users to query data from the data warehousing database. Database interfaces enables you to update the data warehouse database on a daily, weekly or monthly basis.

Metadata refers to the information about the data stored inside the data warehouse. Metadata provides the overview of the data inside a data warehouse to end users. The components of the metadata in a data warehouse are:

• Data warehouse table structures

• Data warehouse table attributes

• The system of records used by the data warehouse

• Mapping from the system of record to the data warehouse

• Data model used in the data warehouse

• Descriptions of data inside the data warehouse

• Relationship of one unit of data to another in the data warehouse

• Procedures for access of data inside the data warehouse

XML is used in many ways in the above tasks.

The figure 5.4 that is given below presents a pictorial representation of a data warehouse.

[pic]

Figure 5.4 Data warehouse Architecture diagram

Data Marts and Operational Data Stores:

Data marts:

Data marts are independent department wise data ware houses. Example: Sales data mart etc;

We can first build data marts and then integrate them into organization wide data ware house. This approach is known as bottom-up design of data ware house. However, data marts must be designed within the ambit of organization wide vision.

A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a historical snapshot of data that helps businesses to strategize based on analysis of past trends and insights. The key difference is that the data mart is for a specific, predefined need for a certain grouping and configuration of department / departments. A data mart configuration emphasizes easy access to relevant information.

Multiple data marts can exist inside a single corporation; each one relevant to one or more business units for which it is designed. Data Marts may or may not be dependent on, or related to, other data marts in a single corporation. If the data marts are designed using overall design of facts and dimensions, then they can be later integrated as a data warehouse for the entire organization easily.

Data marts enable each department to use manipulate and develop their data in the way they deem fit; without altering information inside other data marts or the data warehouse.

Reasons for creating a Data Mart are:

• Easy access to frequently needed local historical data

• Ownership and collective use by a group of local users as opposed to global users.

• Better end-user response time

• Ease of creation

• Faster turnaround time

• Lower cost of implementation as compared to a full Data warehouse

• Easy identification of potential users than in a full Data warehouse.

Operational Data Stores (ODS):

These are current data bases in any organization. They are used for data collection, management, and day-to-day operational data processing functions. Data warehouse data bases (DWH-DB) which are historical and summarised data bases, are built from these ODS.

Table below summarises and compares ODs and ‘DWH-DB’

|Feature |Operational Data Stores(ODS) |Data warehouse data bases (DWH-DB) |

|Data Content |Current Values |Archived, derived, summarized |

|Data Structure |Optimized for transactions |Optimized for complex queries |

|Access Frequency |Very High |Low to Medium |

|Access Type |Read, Update |Read |

|Usage |Predictable, repetitive |Ad Hoc, Random, heuristic |

|Response time |Sub seconds |Several seconds / minutes |

|Users |Large Numbers, Operational staff |Relatively small numbers, Top Management |

Table 5.6 Comparison of ODS and DWH-DB

XMLA (XML for analysis):

XMLA is a proprietary product of Microsoft Corporation. We have to first acknowledge their copyrights by the following table.

|[pic] |

Figure 5.5 XMLA mandatory Copyright acknowledgements

XML for analysis (XMLA):

XML for Analysis is a Simple Object Access Protocol (SOAP)-based XML API, designed specifically for standardizing the data access interaction between a client application and a data provider working over the Web. API stands for Application Programming Interfaces. API is a set of routines, protocols, and tools for building software applications. Examples are Java API for XML processing, and Microsoft XMLA etc.;

The specification is built upon the open Internet standards of HTTP, XML, and SOAP, and is not bound to any specific language or technology. The specification references OLE DB so that application developers already familiar with OLE DB can see how XML for Analysis can be mapped and implemented. OLE DB (Object Linking and Embedding, Database, sometimes written as OLEDB or OLE-DB) is an API designed by Microsoft for accessing data from a variety of database resources like Oracle, MySQL etc; in a uniform manner.

The figure below succinctly presents the XMLA design architecture.

[pic]

Figure 5.6 Microsoft’s XMLA architecture

XMLA Works efficiently with standard data sources, such as relational OLAP and data mining.

For more details on programming with XMLA, refer

Self-Assessment Questions

10. Data warehouse contains historical and summarized data. (True / False).

11. Data Mart can be the starting point of building organisation wide

__________________________.

12. XMLA is used for web services and sharing remote data base access. (True / False).

5.5 Summary

In this unit we have learnt about the basics of XML. XML is the most powerful data sharing technology on the internet. Inventing new tags for semantic use by programming languages is the best part of XML. Its use in ‘Web Applications’ of E-commerce and others of that ilk is made possible by the XML parsers, API and tools. Almost every computer language, data base management system and data transport and data processing tool has seamless integration capabilities with XML.

You have observed how easy it is to create XML documents. You had a glimpse of the various XML based applications. XML based XHTML is very useful to web site creators. All current browsers are XML enabled.

You also had a brief introduction to the topics like security, data analysis and data ware housing and use of XML in these domains XML is both human understandable and computer ‘understandable’ language. This is due to its custom tags and the semantic binding capabilities through DTD design.

5.6 Glossary

API: API stands for Application Programming Interfaces. API is a set of routines, protocols, and tools for building software applications. Examples are Java API for XML processing, and Microsoft XMLA etc;

Attribute: A mark-up construct consisting of a name/value pair that exists within a start-tag or empty-element tag. In the example given here the element img has two attributes, src and alt: . Another example would be Connect A to B. where the name of the attribute is "number" and the value is "3".

CDATA : CDATA describes one type of content value of elements in XML. It is the ‘CDATA’ or character data. All legal characters are valid. Refer to for more details.

DTD: DTD stands for Document Type Definition (DTD) defining the structure in terms of a hierarchy of ‘elements’, starting from ‘root’ down through ‘children’ ‘siblings’ and other lower levels of nodes. It assigns ‘meaning’ to xml documents.

Element: A logical component of a document which either begins with a start-tag and ends with a matching end-tag, or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content, and may contain mark-up, including other elements, which are called child elements. An example of an element is Hello, world. . Another is .

ETL: ETL stands for Extraction, Transformation and Loading, an important data cleaning work before loading it to a Data Warehouse Data base.

OLAP: OLAP is an acronym for On Line Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modelling. It is a technology for Data warehouse based data mining and analysis and presentation.

Tag: A mark-up construct that begins with "". Tags come in three flavours: start-tags, for example , end-tags, for example , and empty-element tags, for example .

XHTML: XHTML stands for Extensible Hypertext Mark-up Language. It is, XML compliant. HTML 4.01 and above are all XHTML only.

XML: XML is an acronym for eXtensible Mark-up Language(XML).It looks similar to HTML, but has a much tighter syntax than HTML. XML is designed to transport, store, share and ‘use’ data on the internet, with focus on what the data is about.

XMLA: XMLA stands for XML for Analysis. It is a Simple Object Access Protocol (SOAP)-based XML API, owned by MicroSoft.

5.7 Terminal Questions

1. Discuss the importance of XML in E-commerce.

2. List the rules to create a well-formed XML document.

3. Compare and contrast XML and XHTML.

4. What are the features of a Data Warehouse?

5. Define the external DTD named emp.dtd for the XML document named emp.xml shown below

| |

| |

| |

| |

|001 |

|Yathi |

| |

|23,4th main, 5th cross,7th block,Koramangala,Bangalore-560095 |

| |

|25539913 |

| |

| |

| |

|002 |

|Peris |

| |

|#123,15th cross,4th block,Koramangala,Bangalore-560034 |

| |

|25536789 |

| |

| |

| |

5.8 Answers

Self-Assessment Questions

1. True

2. XML

3. True

4. .xml

5. True

6. Document Type Definition(DFD)

7. False

8. DTD

9. True

10. True

11. Data Warehouse

12. True

Terminal Questions:

1. XMl enables transport of E-commerce data like product code and quantity bought and value etc.; on internet. Specific tags defined in DTD bind semantic meaning to character data of elements. API of Java or MS enable remote data base updating easy. (Also refer Importance of XML in Business in section 1

2. Tags must follow DTD spec. All opened tags must be closed. Nesting of elements must be correct Etc.; refer well-formed XML document in section 2

3. XML is for data transport and sharing on internet. XML is for internet based data processing. XHTML is xml based html. XHTML displays information in browsers etc.; Refer Definitions and differences in Section 3 and HTML vs. XML in Section 1

4. Refer the 9 points listed under the section Data Warehousing of Section 4

5. emp.dtd file as per which emp.xml is drawn-up is shown below:

| |

| |

| |

| |

| |

| |

| |

| |

References:

1. Ivan Bayross, Practical XML with Java , BPB Publications

2. Deitel, and Deitel et al, XML How To Program, Pearson Education

3. Joseph W Lowery , Dreamweaver Bible, IDG

4. Alex Berson & Stephon J. Sin, Data warehousing, Data mining and OLAP; Tata McGraw Hill

E-References:

1.

2.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download