MULTIMEDIA DATABASES - Oracle



MultiMedia Databases

Multi Terabyte Performance

Jim Steiner, Senior Director, Oracle Corporation

Joseph Mauro, Principal Product Manager, Oracle Corporation

Introduction

The use of multimedia – images, audio, and video – has exploded over the last few years as internet, intranet and other web-based applications have become the norm. Media has also emerged in many mainstream business applications such as in the financial industry. This multimedia explosion is driven by several factors: the communications power of the medium, the commoditization of media capture devices, the standardization of data formats, and the emergence of media capable web page authoring tools and browsers.

Certainly multimedia adds value but it also brings challenges. Multimedia objects are large, unstructured and complex in nature, very different from traditional business data. Mainstream applications are now being faced with managing multi terabyte media stores. Consequently both storage and bandwidth costs can be prohibitive. Multimedia objects such as video can be very large in size, and can often overwhelm storage capacities of small to medium-sized systems. Additionally, substantial bandwidth is required to deliver these large objects to a client in real time. Multimedia also comes in a wide array of formats that are continually evolving. Keeping up with new, standard formats is a challenge in its own right. Also, applications need to manage multimedia data in an integral fashion with “business” data – for example, a picture of a car is associated with information on its model and price..

Oracle interMedia meets these challenges by adding support that enables Oracle to manage and deliver image, audio, and video data in an integrated fashion with other enterprise data. interMedia provides the means to add audio, image, and video columns or objects to existing database tables, insert and retrieve multimedia data, perform image processing on popular image formats, and perform conversion between or transcode image formats. Oracle interMedia adds the native data type services, metadata management facilities and operators to support a number of flexible storage options to access media data including internet URLs, operating system files, and specialized servers for streaming media. With Oracle10g, intermedia’s new features underscore Oracle’s continued strategic commitment to multimedia data management.

Applications that make extensive use of multimedia face the same challenge as most business applications: performance, scalability, high availability, at lowest possible cost. Multimedia applications often have greater storage, distribution, security, and ‘demand peaks’ requirements. Oracle 10g Enterprise Grid Computing benefits multimedia applications through dynamic provisioning of resources.

This paper examines several Oracle interMedia customer applications including a medical information firm’s online web publishing service, a state’s road inventory system, a central banks’ customer records handling system, and a prestigious museum’s inventory system. All have incorporated multimedia successfully in their mainstream business applications.

What is Oracle interMedia?

The foundation for Oracle interMedia is the Oracle extensibility framework, a set of unique services that enable application developers to model complex logic and extend the core database services, including query optimization, indexing, type system, and SQL, to meet the specific needs of an application. Oracle uses these unique services to provide a consistent architecture for the rich data types supported by interMedia. interMedia uses object types, similar to Java or C++ classes, to describe multimedia data. These object types are called ORDImage, ORDAudio, ORDVideo, and ORDDoc and have attributes and methods associated with them. With Oracle, the data type services available in these object types is also available through a relational interface and a SQL multimedia ISO standard interface, so application developers can now choose to store media data in BLOB columns and use the full range of interMedia functionality through PL/SQL and Java API calls.

Oracle interMedia supports multimedia storage, retrieval, and management of:

• Binary large objects (BLOBs) stored locally in Oracle and containing audio, image, or video data

• File-based large objects, or BFILEs, stored locally in operating system-specific file systems and containing audio, image, or video data

• URLs that point to audio, image, or video data stored on any HTTP server.

• Streaming audio or video data retrieved and delivered via specialized media streaming servers, such as RealNetworks and MicroSoft.

• Any user-defined sources on other specialty servers.

• interMedia objects are tightly integrated with SQL and the Oracle database engine, and are easily accessible through various thick and thin client interfaces including Java.

In addition, Oracle offers a content-based retrieval feature so that images stored in the database can be easily searched using image matching technology. Given an image, content-based retrieval provides the ability to search images stored in a database table for other images using specific visual attributes such as color, texture, shape, and location. Examples of database applications where this content-based retrieval is useful span a range from business trademarks, copyrights, and logos to artistic works in art galleries and museums.

Various application development and web authoring tools are also tightly integrated with Oracle interMedia so that it is much easier to develop and implement media- rich applications. Java developers can use Oracle Jdeveloper, since the Business Components for Java (BC4J) framework now automatically recognizes the interMedia object types and integrates seamlessly with interMedia at multiple levels. This approach provides great flexibility as well as granting all the productivity benefits of BC4J. Web Developers can use Oracle Portal for Internet and Intranet delivery. Since interMedia objects, including source location information, are stored in Oracle tables, they can be included in the types of data available to Oracle Portal components. Additionally, third party web authoring tools such as MacroMedia UltraDev, the leading content creation environment for the design of web sites are now integrated for the dynamic display of the rich media objects. This integration can keep a web site current while reducing site maintenance.

In its strategic commitment to multimedia, Oracle10g interMedia further commits to standards and adds several new features. In the standards area, interMedia now supports the SQL/MultiMedia Still Image standard. This makes it possible for imaging applications to be portable across various vendors’ databases. interMedia is now integrated with the latest version of the Java Advanced Imaging package. This provides support for more image processing and, object methods including arbitrary image rotate, flip and mirror, gamma correction, contrast enhancement, quantization methods, page selection, and alpha channel. Oracle10g interMedia also has new media format support including MPEG (2,4) and Microsoft ASF. And with 10g comes a database plugin for both Real Networks Helix Server and the Microsoft Windows Media Format Server. This ensures the ability to stream the most popular streaming media formats. Additionally, interMedia now adds support for wireless devices in the middle tier. This allows for media to be adapted to the wireless network bandwidth and client device characteristics.

Oracle interMedia At Work

Now lets look at some examples of Oracle interMedia at work. These customer applications demonstrate an array of media management tasks including:

• Media-rich, mission critical production application support;

• Global distributed media access - anytime/anywhere;

• Amortization of media objects across multiple applications;

• Media rich web based workflow in support of business processes;

• Storage and delivery of a diversity of media types from > 1TB size media repositories.

Note that these tasks are not specific to a particular application but are common to many types of applications. They also demonstrate how the Oracle server provides for the management of multimedia content and helps in overcoming the challenges listed previously.

BioMed Central

BioMed Central is a publisher of original, peer-reviewed scientific research that includes 70 online journals. Many world class research institutions use BioMed Central’s publications -- Dana-Farber Cancer Institute, Harvard University, National Institutes of Health, and the World Health Organization.

The fundamental objective of BioMed Central is to change the model of scientific publishing. The key challenge to meet this goal is the development of easy to use web tools that allow scientists to personally perform publication tasks that traditionally introduce significant administrative costs.

During the past two years, BioMed Central has built a fully web-based system that covers the publication cycle of manuscript submission, peer review, editorial acceptance/rejection and final online delivery of the completed material. To do this, BioMed Central addressed many technical challenges including:

• Handling a wide variety of formats that carry document and media data;

• Mechanisms that allow multiple roles such as author, editor, and publisher to perform their functions in a distributed fashion;

• Security so that malicious access and modification can not occur;

• Management of a considerable amount of data with multiple versions;

• Workflow that moves the data through the publication process;

• Web access by all of the roles in the system including consumers of the published materials.

When manuscript documents, figures, and supplementary files are submitted, the original files and the respective converted PDFs are stored in the database using Oracle interMedia. When new versions of files are uploaded, and when new versions of the manuscript as a whole are submitted to the editors, it’s recorded relationally in the Oracle database, while the associated files are stored directly in the database using interMedia. Oracle 9i handles the various media formats and rendering into GIF and JPG for use on the web.

Next, editors and invited reviewers receive emails containing web links that allow them to download and view PDFs and original files as needed. Web based tools expedite the peer review process. Peer reviewers simply “Agree” or “Decline” online to examine a submitted manuscript. Manuscript PDFs are automatically sent to the reviewers who have accepted. When they finish, peer reviewers submit reports via a structured online form. The final accepted manuscript is stored in the database as searchable XML with the figures, figure thumbnails, PDFs etc all being delivered from the Oracle 9i database via interMedia.

The database server is currently a dual CPU Sun E420, with 2 gigabytes of RAM running Solaris 7.5. The system software is on internal mirrored drives, and the data files are stored on approximately 300 gigabytes of external RAID devices.

The Web servers are 1U Dell Powerapp and Poweredge machines, containing single or dual Intel processors, 512 megabytes of RAM, and running Windows 2000/IIS/ASP. Currently, half dozen web servers run a large number of different web sites using the common backend Oracle database. Intelligent load sharing between pairs of front end web servers is used to give high availability.

The data architecture is key to BioMed Central system and consists of a number of Oracle-resident relational tables. The table hierarchy defines the journals managed by BioMed Central as well as information on all types of system users and the roles they play in the publishing process. The system also maintains information on the composition and versions of each manuscript including web compatible format article files delivered via Oracle interMedia services. The system also maintains the workflow status of a manuscript through the phases of submission, peer review, acceptance, and rejection. Finally, an access log is kept for institutional reporting on system utilization

New Mexico Department of Transportation

The State of New Mexico Road Feature Inventory (RFI) application maintains information on the entire roadway system throughout the state including pictures taken on the highways at every 50 feet. The application is designed to enable the state to make better road / asset maintenance decisions and to help the state comply with the federal mandate which requires state Departments of Transportation to provide detailed inventories of assets in order to receive federal funds. This application is a perfect example of a multi terabyte multimedia database with performance.

Patrol personal are equipped with a GPS enabled PDA and a digital camera. They are able to update / insert / delete assets on their PDA, synchronizing with the database when they return to the office. Because so much of New Mexico is rural, wireless access to the application is not an option. An image is associated with each new asset added to the database.

The RFI customers are the six districts around the state that are responsible for the everyday maintenance of the State of New Mexico’s roadways. The RFI application allows them to prioritize their maintenance. The RFI application interfaces to the Highway Maintenance Management System (HMMS) that the Districts use to enter Daily Work Reports and keep information for stockpiles and like information. The RFI application will tie into the HMMS system so that any maintenance done on the road will be reflected in updates to the RFI database, keeping the inventory current.

[pic]

The Database server consists of Oracle 9.0.1.4 running on Windows 2000 on a Compaq Proliant with 8GB Ram and 500GB of local disk storage and 3 TB of IBM Shark storage. The Application server consists of Oracle 9iAS 9.0.1.2.2a running on Windows 2000 on a Compaq Proliant with 8GB Ram and 500GB of local disk storage.

The RFI application makes use of an array of development languages and tools including: Java, XML, HTML, JavaScript, PL/SQL, Oracle Portal Forms and Reports, Discoverer, and JDeveloper. It also makes use of some key database technologies including partitioning, RMAN, materialized views, web cache, and virtual private database.

The RFI application data is made up of traditional assets and media assets. There are approximately one million traditional (non media) assets which account for appx. 100GB of storage. There approximately five million media assets, typically JPEG images, which account for appx. 5 TB of storage.

Despite the enormous size of the database, a single DBA designed, built, deployed, and maintains the database. This is possible because for Oracle, data is data, whether it is a four byte integer or 5 TB of digital images.

There are two main parts to the application: reporting, and virtual drive.

Reporting, is done using Portal 3.0.9. Currently, Oracle Reports Builder and Discoverer are used to generate these reports. JSP’s are being written in JDeveloper to generate reports that have the interMedia images embedded.

The virtual drive is probably the most widely used piece of the application. A user can choose a Route, a direction, and a start and end mile marker and take a “virtual drive” of the Route. Each route has an image taken every 50 feet. The image has an associated mile point to the route. The user can then press a play button which will, essentially, drive them down the route that they have chosen, displaying the images one after another.

The virtual drive is intended for use by the whole highway department. For instance, the maintenance bureau can use this data to pin point an area of highway that needs maintenance or lacks appropriate signage. Legal can use the data to verify if the route was safely signed or if a guardrail was where it needed to be, etc.

The United States Central Banking System

There are several United States Central Bank branches located in several locations around the country and each branch in turn services various member (commercial) banks constituting the system as a whole. The Central Banking System acts as a lender of money to these commercial banks, and as a clearinghouse for checks. In its clearinghouse capacity, the Central Bank often faces handling of problem checks including checks that cannot be cleared for payment or those that have been damaged. The process for handling these checks involves:

• A Central Bank branch receives faxes of problem checks and cover letters containing metadata from member commercial banks.

• The Central Bank performs optical character recognition (OCR) on the cover letters at a fax receiving server. The fax image of the check and the now text-based metadata are entered into the database for efficient indexing. The bitmap image of the faxed check can be used to reproduce a high fidelity image when needed.

• By using a web-based application for their queries, investigators can then search the database tables containing the checks and associated cover letters and view them via a web browser as required.

This approach has major benefits. The capture process for problem checks is decoupled from the resolution process for optimal performance. Additionally, the database keeps a permanent record of the checks and problem cover sheets for legal purposes and does so in a secure fashion.

The check images are stored in TIFF with the average check size ranging from 17 to 37 KB. Check images and cover sheets are kept online for a total of 13 months before being archived. As fax cover letters and check images are added to the system, the database is expected to grow to upwards of 1TB.

The resolution of problem checks between banks is a mission critical problem because substantial financial resources or float may be involved. This adds up, given the system takes in approximately 26,000 checks per day! The ability to ‘scale up’ this application with more front end problem check and cover letter capture, and to scale up the back end processing with more problem resolution staff is vital to keeping the float that is held up to a minimum.

This is a perfect example of how an Oracle 10g application can save money.

Banks and financial institutions host a number of mission critical media applications. A second example comes from the Brazilian Federal Savings Institution (Caixa Economica Federal do Brasil) a private, nationwide - the largest in the country - savings bank. Their example is a customer currency slip application. This application takes and stores the image of currency adjustment slips showing weekly deposits, withdrawals, and currency adjustments. Bank customers can query their account records online through a web application. This bank is running a SUN 10000 Ultra Sparc with 34 x 400Mhz CPUs and14GB RAM. They have used this equipment to upload 140M bitonal deposit receipt images of 25KB each and to convert from TIFF to GIF resulting in app. 4 TB of data. The upload rate of document images is appx. 3000 per minute. The use of Oracle 10g interMedia to facilitate automatic image file transcoding saves the bank time and money. This is yet another example of a main stream business application that fits the mold of a multimedia, multi terabyte application.

A third example is UBS Paine Weber that has a 1TB check image database that allows its customer to see images of checks (both sides) that they’ve written and have cleared.

Oracle interMedia brings several advantages to these mission critical solutions:

• Fast upload of images to the database;

• Image format conversion capacity;

• Performance and scalability for reduced costs and financial risk;

• Secure management of sensitive financial information;

• Web based media access for ease of the user;

• Inherent support for business-to-business operations.

The Palazzo Braschi Museum

The Palazzo Braschi Museum is a prestigious, public Rome museum that hosts over 40,000 works of art. The museum has several applications requiring use of photographic images of the artwork. These include:

• Presentation of the art works to the public via an internet site;

• Query access for restorers and students to information related to the art works;

• Cataloging support to add new images, ancillary data, and descriptions (historical and technical).

Storage and bandwidth challenges certainly prevail in this example. 40,000 pieces of art and the associated images can result in an archive of significant size. Size can be a more serious problem if a solution involves replicating media data for each of the applications. Furthermore, replication holds the potential for duplicate images becoming out of synchronization, adding yet another manageability challenge. Related to this, is the synchronization of the images with the historical and technical metadata that can change over time. Bandwidth limitations of the internet pose additional requirements – the timely delivery of the original digital image or some facsimile (thumbnail) to the client must be an important characteristic of the system. And finally, security must be addressed, as controlling access to the media and metadata based upon class of user, is necessary.

An Italian systems integration firm with the help of Oracle Consulting developed the solution that the museum now runs. The base platform consists of an Intel system running NT. The applications are based completely on Oracle technology – the database, Application Server, Oracle forms and Reports Servers, and Oracle Portal.

The system stores all the images and metadata related to the art works in the Oracle database so that they can be amortized across all of the applications, resulting in a saving of both storage and money. The database also provides appropriate security for the different classes of users, and solves the synchronization issues for images and metadata. The textual descriptions of the art works are indexed by Oracle Text to let Internet users, restorers, museum employees, and museum visitor’s search for art works of interest. The client was developed with Oracle Portal saving both time and effort as Oracle Portal is integrated with Oracle interMedia. interMedia objects can be incorporated easily and transparently into portlets by using the standard Oracle Portal interfaces and components.

The images presented a different problem. For each art piece, one, high resolution pictures is captured, scanned and processed to derive variants for different purposes. These derivations include an icon size of 72x72 pixels, a caption size of 150x150 pixels, an intranet size 1200x1200 pixels, and an Internet size 300x300 pixels for each art piece. These images are stored in the Oracle database in a single table with multiple columns as interMedia objects. Generating these derivative images using desktop tools would be costly and slow. Instead, they are generated via interMedia in the following manner:

• The pictures are scanned at 1200 dpi and TIFF files are produced.

• A PL/SQL procedure is then used to load the TIFF files into the database.

• Another PL/SQL procedure processes the TIFF image and through interMedia’s automatic thumbnail generation capabilities, produces the four different sizes described above. The TIFF file is also transformed to JPG format.

• When completed, the original TIFF image is dropped from the database to free space.

The TIFF images range from 15 to 20 MB and allocated process time for each is 30 seconds. The BLOB fields in the database have been tuned to gain optimal performance -- for example, the insert of one TIFF image into the database takes 4-5 seconds. In the remaining 25 seconds, four "process" calls are made to resize the original image, four more "process" calls change the format from TIFF to JPG, and the additional four BLOB fields are loaded into the target database table.

The Museum’s content will triple over the next two years to 120,000 works of art. The size of the image set for each art piece is approximately 600KB (the sum of the four formats). Considering 60,000 art works per year , we have 60,000 x 600Kb = 36,000 KB (~36 GB). The Museum credits Oracle interMedia with the following benefits:

• Consistent storage of image objects in the database;

• Easy creation of derivative images;

• Shared access to image objects across multiple applications;

• Synchronization of the associated metadata and images;

• Secure access to image objects by all users;

• Integrated, simplified image and metadata management.

Their successful management of images in the Oracle database for use by several applications has led the Museum to consider the addition of streaming audio and video.

conclusion

Customers have found that Oracle is well suited for managing multi media, multi Terabyte databases. Oracle performs well for these databases - a 1TB image repository renders images in a web browser in less than 0.4 seconds. It also loads media content at device speeds. They have also found that Oracle scales. Customers already have 5 TB databases with over 140 million images. Bulk loading and associated processing also scales – parallel processing has scaled to loading 300,000 images per hour while scaling (thumbnail) and transcoding the images. Oracle is also easier to manage using tools such as RMAN for backup. Oracle is also more secure as multimedia data inherits all of the built in security of the Oracle database (authentication, auditing, encryption, access control…) – even banks use it.

Digital media has become a central component of today’s online applications and information services in a broad array of situations ranging from e-commerce and B2B, to traditional mission critical business applications. The increase in volume and valuation of various multimedia types has highlighted the importance of managing media objects. Complexity is a factor as well because the management and use of media presents unique problems to each application area.

Oracle with interMedia and the Oracle Application Server provide integrated services for the management, access, and multi channel (internet and wireless) distribution of media content. This includes both static and streaming media delivery and management in most popular formats. With Oracle, media data is managed along with other application specific data making both overall management and application development easier. Together, these data management and distribution services are proving invaluable in diverse established enterprise application areas as well as the emerging wireless space.

-----------------------

Distributed Scientific Community

WEB

Oracle with interMedia

Documents

graphs

images

Multiple versions

workflow

Author

submission

Peer

Reviewer

Editors

Html

Readership

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download