Distributed Object Computing using XML-SOAP



Distributed Object Computing using XML-SOAP

Kevin White

James Kebinger

December 9, 2000

1. Introduction

Many technologies exist to connect the components of distributed systems together, but as distributed computing become more and more pervasive, a key problem arises in connecting dissimilar systems. The most popular high-level communications layers are Java Remote Method Invocation (RMI) and Windows Distributed Component Object Model (DCOM). These technologies, both widely supported, have an Achilles’ heel: they are both limited to their respective domains, Java and Windows, respectively. Technologies are needed that will allow interoperability between all platforms.

The only means of communication accessible between these and other platforms was TCP/IP sockets, which are acceptable and even fast for passing simple messages between systems. The problem arises when sending complex objects over the wire. How to serialize an object or objects with potentially tens or hundreds of fields so that the both systems, which are potentially using different Operating Systems, and implemented in different programming languages can understand it?

The last couple of years have seen a new document interchange standard emerge in XML (eXtensible Markup Language). XML basically uses HTML-like tags to represent hierarchical data in a flat-ASCII file. The use of simple ASCII or Unicode encoding makes XML accessible on all platforms, from supercomputers to embedded CPUs. XML parsers are available for many platforms, and where not available, the ASCII can be read and written without the aid of an XML parser if need be.

Now an answer appears, if computers of all shapes and sizes can exchange XML documents, why not embed messages between distributed components in XML documents? Objects, with their naturally hierarchical nature, can be serialized to a standard XML format. This is the idea behind XML-SOAP (Simple Object Access Protocol).

In this paper, we will look at distributed computing technologies including DCOM, CORBA, and RMI. We will show how SOAP works and how it is different from the others. We will build a simple proof of concept client-server application in Java using SOAP, and then compare the performance

2. Background and Overview

2.1 Introduction to XML

The following section is a brief overview of XML.

2.1.1 What is XML?

XML it s a text-based markup language that is becoming a standard to store data. In XML you used markup tags to identify pieces of data embedded in this data format. These XML tags tell you what the data means, rather than how to display it.

In the same way that you define the field names for a data structure, you are free to use any XML tags that make sense for a given application. Naturally, though, for multiple applications to use the same XML data, they have to agree on the tag names they intend to use.

Other than elements, XML objects can also consist of the following items:

• Attributes: Name-value pairs that occur inside start-tags after the element name.

• Entity references: Created to allow entity to be created and used in places where multiple instances of the same text will be use in many places.

• Processing instructions: used to provide information specific to applications.

• Comments: User comments

• CDATA: A section of character data that will not be interpreted by the XML parser.

The following is a XML example of a pizza item:

The Texan

barbeque brisket

dill pickles

onions

mozzarella cheese

tomato sauce

Put the lone in lone star state!

2.1.2 Why is XML important?

Here are a number of reasons why using XML as a data format is important.

Plain Text

When data is stored in XML is it stored as plain ASCII text. This allows you to edit or view the XML data with any text editor. This allows easy debugging of programs and allows for a useful tool to store small amount of data. XML also allows an easy way to create a front end to a database and store large amounts of XML data. This shows that XML is extremely scalable and can be use in any type of data storage situation.

Data Identification

XML tells you what kind of data you have, not how to display it. Because the XML file is broken up into data parts any program can parse through the data easily by the markup tags. This allows multiple programs to use the same XML data and display it in infinant amounts of ways across multiple applications.

Display styles

Again, XML is only a way to store data. A separate file can be created to display this data. This allows the creation of generic files to show XML data to be used on any XML file that follows the same XML data format. An example of one way to show XML files is to create an XSL transform. This allows all XML files with a reference to this XSL file to display its data the same way.

Hierarchical

Finally, XML documents benefit from their hierarchical structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents. They are also easier to rearrange, because each piece is delimited. In a document, for example, you could move a heading to a new location and drag everything under it along with the heading, instead of having to page down to make a selection, cut, and then paste the selection into a new location.

2.2 Present Distributed Object Models

This section will discuss an overview of the most popular current distributed object models. These models are Java RMI, DCOM, and CORBA. The purpose of this section is not to describe every little detail of the models but to highlight the main points so that they can be used as a comparison to SOAP.

2.2.1 Java RMI

The design goal for the RMI architecture was to create a Java distributed object model that integrates naturally into the Java programming language and the local object model. RMI architects succeeded in creating a system that extends the safety and robustness of the Java architecture to the distributed computing world.

The RMI implementation is essentially built from three abstraction layers. The first is the Stub and Skeleton layer, which lies just beneath the view of the developer. This layer intercepts method calls made by the client to the interface reference variable and redirects these calls to a remote RMI service.

The next layer is the Remote Reference Layer. This layer understands how to interpret and manage references made from clients to the remote service objects.

The final layer is the transport layer and is based on TCP/IP connections between machines in a network. It provides basic connectivity, as well as some firewall penetration strategies.

A working RMI system is composed of several parts.

• Interface definitions for the remote services

• Implementations of the remote services

• Stub and Skeleton files

• A server to host the remote services

• An RMI Naming service that allows clients to find the remote services

• A class file provider (an HTTP or FTP server)

• A client program that needs the remote services

Assuming that the RMI system is already designed, you take the following steps to build a system:

1) Write and compile Java code for interfaces

2) Write and compile Java code for implementation classes

3) Generate Stub and Skeleton class files from the implementation classes

4) Write Java code for a remote service host program

5) Develop Java code for RMI client program

6) Install and run RMI system

2.2.2 DCOM

DCOM is an acronym that stands for Distributed Component Object Model. It is Microsoft’s solution for distributed computing. It allows one client application to remotely start a DCOM server object on another machine and invoke its methods. So, functionally it is similar to CORBA and RMI. Unlike RMI however, which is Java dependent, DCOM is language and platform independent.

Because the vast majority of personal computers are PCs running Windows, the primary target of DCOM is such. DCOM provides the ability to use and reuse components dynamically, without recompiling, on any platform, from any language, at any time. This is perhaps the single most important aspect of DCOM.

One widely criticized aspect of the DCOM model however, is that there is no absolute way of addressing an object instance – everything is done through object interfaces. As such, it can be difficult to manage a large set of worker object instances or temporarily disconnect and reconnect at a later time.

Another problem DCOM is facing is that currently there is no good solution to the problem of keeping track of possibly thousands of objects spread over thousands of computers on the network. The user has to supply the network address of the host machine for the server object, or that address must be hard-coded in the client application itself.

2.2.3 CORBA

CORBA, the Common Object Request Broker Architecture, is a standard for distributed computing. It provides a language neutral and location transparent way for applications to interact. In other words two CORBA-enabled programs can talk to each other regardless of both the language they were written in or even whether they are running on the same machine.

CORBA specifies how object have to be deliver over a network regardless of the client and server operating systems or programming languages. A CORBA Object Request Broker (ORB) provides a way to connect a client application with an object that it needs. On a CORBA distributed object platform, the client has no need to worry about were the object resides. The client is only required to know the name of the object and to know how to interact with the object interface.

When creating CORBA applications, two main classes, a stub and a skeleton, are created along with several helper classes. The stub resides on the client side, and acts as a proxy for the server. It takes care of shipping the request out. The skeleton resides on the server side, receives the requests and returns values to the client.

The ORB is the glue that connects the stubs and skeletons. The ORB is often described as an object bus, allowing all the objects plugged in to it to communicate. The ORB is the low level infrastructure that hooks everything together.

2.3 SOAP Object Model

The following sections will describe the SOAP Object Model.

2.3.1 Basis for SOAP

There are many defined XML-protocols listed on the W3C XML protocol comparison web-page, but we wish to focus on, in the words of the W3C, “protocol explicitly supports sending requests to a remote system to execute a designated function, method, or procedure defined by an application using the protocol rather than functions defined in or by the protocol itself.” Many of the others protocols are designed for information exchange, rather than explicit RPC. The information passing capabilities can be exploited for the RPC-like behavior of message passing. For example, OMG, the group behind CORBA, is pushing an information exchange protocol called XMI primarily to exchange information between different UML tools. The leading contenders for now seem to be just XML-RPC, and SOAP. The United Nations is working on a protocol called ebXML, but there is not much information available at the present time. Given the relative paucity of explicit XPC supporting XML protocols, the bulk of the comparisons we will make will have to be to existing RPC environments.

XML-RPC was one of the earliest XML-protocols, first published in April of 1998. It is in relatively widespread use compared to SOAP and XMI, but should become superceded by SOAP shortly. XML-RPC was originally called SOAP, but has since stagnated while the specification was refined into SOAP 1.1, which is roughly speaking a superset (albeit incompatible) of the original XML-RPC. As such we won’t spend much time talking about it.

SOAP stands for Simple Object Access Protocol. SOAP is basically a standard for serializing inter-object communications using platform-independent XML. SOAP doesn't care what operating system, programming language, or object model is being used on either the server side or the client side. For a protocol it uses HTTP, which is simple to implement and used universally. Using HTTP for transport gives SOAP an advantage over other previous middleware solutions because it does not require changes be made to network routers and proxy servers. SOAP bindings exist to automatically serialize method calls, but for simple applications, the format is simple enough that it can be written out directly. The W3C SOAP standard can be accessed at .

2. The SOAP protocol

As mentioned already, SOAP is a cross-platform way to make remote method calls, serialize and de-serialize objects using XML. SOAP works over many protocols, not limited to HTTP, but can work over any protocol that can carry ASCII text, including SMTP email, file transfer protocol, and instant messaging. In addition, SOAP is not necessarily limited to request/response exchanges. One-way transmissions are quite possible also, as a remote sensor periodically uploading a report to a central repository. In short, SOAP has been designed with maximum flexibility in mind.

An inherent advantage of SOAP being able to use HTTP is that it is a universally deployed protocol. As Don Box said, “One of the issues was just that the fabric of the Internet rejects new protocols. It takes a long time, especially nowadays, to get a protocol working end to end given the state of the way most corporations work.” In other words, SOAP may succeed where DCOM and CORBA failed because they used new protocols on ports that tend not to be open on corporate firewalls, whereas SOAP uses the existing web infrastructure. An immediate disadvantage of this is security, because typically the HTTP port is unsecured in network infrastructures, because no harm could be done. Now those open ports can be used to make malicious SOAP calls.

Security considerations aside, HTTP is almost trivial to implement, and XML parsers are ubiquitous, so there are no barriers to SOAP deployment and growth. A SOAP request would be processed in the following steps:

1. Get a request on the listen port.

2. Parse the request for the method id to call.

3. Consult a configuration file for what class/function to call to handle the request.

4. De-serialize the parameters for the method call.

5. Call the function with the given de-serialized parameters

6. Serialize the return value from the function and send it back to the requestor

As the list above shows, SOAP is not rocket-science. SOAP is simple to understand, implement and deploy.

SOAP looks to be an industry sweetheart. Almost all the major companies have announced varying levels of support. Microsoft is betting its .NET initiative on SOAP. IBM provided the reference Java implementation SOAP4J to the Apache group in open-source form. IBM and Microsoft together announced an initiative for a standardized directory of SOAP services.

3. Basic SOAP Sample

The easiest way to see how SOAP works is to see an example of a SOAP service in action. What follows is the perhaps overused stock quote service example. A server on a network provides a method to look up stock quotes given the stock symbol to look up.

Here is a SOAP request:

POST /StockQuote HTTP/1.1

Host:

Content-Type: text/xml; charset="utf-8"

Content-Length: nnnn

SOAPAction: "Some-URI"

   

       

           DIS

       

   

And the matching response is:

HTTP/1.1 200 OK

Content-Type: text/xml; charset="utf-8"

Content-Length: nnnn

   

       

           34.5

       

   

As you can see, both the request and response are valid XML documents with some HTTP headers at the top. In an example like this, one can see that the only correlation between request and response is that of the HTTP session. Other means must me used for longer lasting sessions, as cookies are for session-based web browsing.

4. SOAP is “Bare-bones”

SOAP does not provide the complete functionality of DCOM or RMI. A SOAP solution has to implement many features that had previously been handled by the middleware, such as transactions, sessions, and remote references.

SOAP is essentially stateless; as is the HTTP it is built upon. This means all state has to be kept explicitly in the application and explicitly sent in requests, if need be. There is no built-in concept of a SOAP connection, where some authentication could take place initially and then all requests could follow. There is no way to get a reference to a remote object and call methods on that reference.

Some of the lack of features is mandated by the language independence goals of SOAP. How could a scripting language use a remote object reference? The statelessness of SOAP is a product of riding on HTTP, but in addition, it simplifies server design, and speeds server execution.

5. The Apache SOAP Implementation

For the project, we used the Apache Java SOAP libraries, some of which were originally created at IBM. Apache SOAP uses the Apache Xerces XML parser to handle the SOAP packets. In the Apache implementation, the SOAP listener that takes HTTP-SOAP requests is implemented as a Java Servlet in the Apache Jakarta Server.

Apache SOAP is configured through a web page where applications can be installed and un-installed. Parameters that must be given to install a service include:

• The SOAP service name.

• The Java class that implements this service.

• A list of methods implemented by this service.

• A list of type mappings for this service.

The type mappings are especially important because this is the point of SOAP: serializing and de-serializing objects. A type mapping creates a relationship between the SOAP name for a data type and the Java name for a class. The mapping defines what class can serialize the object from Java to XML, and vice-versa. For ease of use, a JavaBean serializer is included that uses Java reflection to serialize arbitrary classes. This may or may not work or be efficient for a given application so the framework is there to write one’s own serialization framework. The built-in serializer worked fine for this project’s needs.

One item that would be nice to use at least for solutions with the same language on the client and the server would be an automatic stub and skeleton generator in the fashion of rmic, the tool for automatically creating stubs and skeletons for Java RMI. In this project we had to hand code the stubs and skeletons, which is quite a bit of duplicated code and effort.

3. Exploration of SOAP

3.1 The SOAP Banking Application

3.1.1 System Overview

The project implements a remote banking application. A user can login using a GUI client and use basic banking services as balance inquiries, transfers, withdrawals, deposits, as well as see a complete transaction history. The application is a proof of concept, is insecure, slow, and does not provide atomic transaction support.

3.1.2 The Requests

Our system handles only the bare minimum functionality of an online banking application. The use cases for the system can be seen in figure NUMBER below:

[pic]

A brief description of the requests follows:

|Request |Data Received |Data returned to client |

|Create new account |Account name, password, contact info |Success/Failure message |

|Login to existing account |Username, password |Success/Failure message |

|Close account |Account id |Success/Failure message |

|Get balances |Account id |Balance info |

|Deposit |Account id, deposit amount |New balance |

|Withdrawal |Account id, withdrawal amount |New balance |

|Transfer |To account, from account, amount |New balances |

|Transaction History |Account id |History |

|Logout |Account id |none |

3.1.3 The Classes

Our implementation consists of 16 classes. We will explain each of them below:

|Class Name |Purpose |

|Account |Hold/transfer data about an account. |

|AccountPanel |GUI class to display account information. |

|Client |The stub that the GUI uses to communicate with the server via |

| |SOAP |

|Constants |Convenience class to keep track of some constants. |

|createUserFrame |GUI class used for creation of a new user account. |

|Customer |Hold/transfer data about a customer. |

|DataBaseAccess |The class that connects the server to the backend database via |

| |JDBC. |

|HistoryModel |GUI class to hold JTable information. |

|Launch |GUI class to start up client. |

|LoginGUI |GUI class, prompts user to log in. |

|MainWindow |GUI class, main client frame. |

|Server |Passes SOAP requests to database backend. |

|Transaction |Hold/transfer data about a transaction. |

|TransferPanel |GUI class for executing account transfers. |

|WithdrawPanel |GUI class for executing account deposits and withdrawals. |

The 3 non-GUI classes are shown below in their class diagrams.

[pic]

3.1.4 Component-wise Breakdown

The system has 2 components: the server and the client. Figure NUMBER below shows the basic system layout.

[pic]

As can be seen above, the server and client classes are basically equivalent to the stub and skeleton classes from RMI. The same methods are found in DataBaseAccess as in Server and Client. Server is a stub to the SOAP environment from DataBaseAccess, and client is a stub of Server.

3.1.5 The Server

The server component consists of two classes, the DataBaseAccess class and the Server class. The server uses an instance of the DataBaseAccess class to connect to the backend Microsoft Access Database via JDBC and the ODBC-JDBC Bridge. The Server class has no “main” method because the SOAP dispatcher Servlet loads it.

3.1.6 The Client

The client is implemented using the Java Swing API. It consists of 12 GUI-related classes, a startup class, and the client class which functions as the connection to the database. The client class does all the SOAP work, which consists of no more than 10-20 lines of code to setup and execute a SOAP call. A screen shot of the GUI can be seen below in figure NUMBER.

[pic]

3.1.7 SOAP Banking Application Conclusions

The implementation was successful, all functionality works as needed for a prototype. Once up to speed with setting up the SOAP server runtime environment, SOAP was simple to develop in and use. The SOAP libraries provided by Apache are simple to use, and keep one from ever having to read or generate XML within your application.

As mentioned elsewhere, one thing that would have made SOAP development even easier would have been an automatic stub generator. Perhaps once there is a standardized way to describe SOAP services a tool like this will be created.

3.2 SOAP Benchmarking

3.2.1 Why Benchmark?

In the course of our implementation, we noticed that our application is rather slow. This led us to wonder what part of the slowness is due to MS Access and what part is due to SOAP. We designed a benchmark to test the performance of SOAP relative to the complexity of the object being serialized. We then compared this object complexity to an RMI example. Finally we compared RMI to SOAP directly in performance.

3.2.2 Object Complexity in SOAP

One of the major time consuming steps in SOAP is building an XML tree from the object being serialized. The more fields an object has to serialize, the longer this should take. In addition, there will be more data to push over the wire, so transmission time will grow.

In these tests, we built a simple time server. One implementation would return the current date as a java.util.Date object; the other would convert the date to its java.lang.String representation before serialization. Here is how the objects serialize to XML using the default Bean serializer, which may or may not be efficient for a given object:

SOAP Serialization of String

Thu Nov 23 17:24:46 EST 2000

SOAP Serialization of Date

975013920065

12

0

23

4

16

100

300

10

One can plainly see that the date object has 9 fields compared to the string object with just one field. The total length of the return SOAP envelope including all HTTP headers is 456 Bytes for the string return, and 785 bytes for the string return. SOAP packets are large. An example that really shows the bloat in moving from a binary data representation to ASCII is that sending the 8-byte double in XML,

3.141592653589793E+000, requires 40 bytes of data. That 5 times size expansion is astounding. Also keep in mind that this is just using UTF-8. If SOAP uses true Unicode, the size doubles again.

3.2.4 The Tests

All benchmarks were run for 3 trials each consisting of 250 SOAP or RMI calls. In each test run, the initial environment setup was run just once, so the cost was amortized over the remaining 249 calls. The results of the 3 trials were averaged together.

The tests were run on a Pentium 2 266Mhz laptop with 288 Megs of RAM. The tests used Java version 1.22 (Classic VM (build JDK-1.2.2_006, native threads, symcjit)) and Apache SOAP version 2.0. In all tests, the client and server were running on the same machine.

3.2.5 SOAP Serialization

The normalized results of the SOAP serialization benchmark can be seen in figure (NUMBER) below.

[pic]

The results show that it takes 12% more time to perform the request that returns a Date object rather than a String object

3.2.6 RMI Serialization

The normalized results of the RMI serialization benchmark can be seen in figure (NUMBER) below.

[pic]The results here show that that it takes just a little over 5% more time to perform the request that returns a Date object compared to the test that returns a String object.

3.2.7 SOAP vs. RMI: Overall speed test

Here we compare the speed of SOAP to RMI with both Date objects and Strings.

The times are as follows:

|Soap-Date |143.46 |

|Soap-String |128.05 |

|RMI-Date |12.60 |

|RMI-String |11.96 |

Times in the graph are per-request round trip times.

[pic]

As the graph clearly shows, SOAP is substantially slower than RMI. A call in SOAP takes about 11 times as long as the equivalent call in RMI.

3.2.8 Benchmark Conclusions

The initial serialization tests show that RMI scales better than SOAP for larger size objects. The overall results show that SOAP is significantly slower than RMI.

We also ran the tests a second time, where instead of doing the environment and connection setup one time only, we performed it once per call. This was done because RMI has a rather long initial setup time, so we wanted to see how SOAP’s setup time compared. These tests revealed that the per-request time for RMI doubled, while SOAP’s per-request time increased less than one-percent. This surprising result still means that SOAP is 5 times slower than RMI in this case.

Some more information on this subject can be found in “Requirements for and Evaluation of RMI Protocols for Scientific Computing.“(Govindaraju, 2000) Their results for the performance differential between SOAP and RMI are about the same as ours.

4. Conclusions/Future work

The project was successful. We created a prototype that shows off some of the capabilities of SOAP for distributed object computing. All of our original goals have been met, and then exceeded with the addition of the SOAP benchmark tests we have included.

SOAP is not a finished standard. At this time SOAP is only a W3C technical note, far from a standard. It is still evolving day by day, so parts of this project may not work in the future. SOAP may very well look very different by the time it is standardized. SOAP at the very least will become a de facto standard because many companies are already deploying it on a wide scale. Whether or not it can eclipse its predecessors, DCOM and CORBA remains to be seen.

Down the road, some areas to look at using SOAP would be to support pervasive computing, that is mobile phones and connected PDAs. Another near term project that has already been mentioned is an automatic SOAP-RMI constructor for Java so that much of the stub generation is automated, and the SOAP API itself is made invisible to the application code.

Overall, SOAP has been a pleasure to work with. Although slow, it is a breeze to get working. SOAP should make some serious waves in the computing world as web services and the Application Service Provider (ASP) model take hold.

5. Bibliography

1. Madhusudhan Govindaraju, Aleksander Slominski, Venkatesh Choppella, Randall Bramley, Dennis Gannon

Department of Computer Science

Indiana University

Requirements for and Evaluation of RMI Protocols for Scientific Computing

Presented at SuperComputing 2000

0-7803-9802-5/2000/$10.00 ©2000 IEEE.

2. Spencer, Paul (1999). XML Design and Implementation. Wrox Press Inc.

3. Harold, Elliotte Rusty (1999) XML Bible. IDG Books Worldwide.

Web References

• Official W3C SOAP specification:

• Apache Java SOAP implementation:

• SOAP description:

• Microsoft SOAP and .NET info:

• SOAP Overview:

• XML-RPC protocols:

• CORBA Overview:

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download