INTRODUCTION



INTRODUCTION

In the modern fast paced lifestyle of people, the faster a person gets the information he or she is after, the more satisfied they become. Style of presentation of the information pales in comparison to the substance and the ready availability of the information a person desires. RSS or the Really Simple Syndication provides a neat and elegant method of disbursing data collected from various sources to the end users. In short, RSS is an easy way to easily distribute a list of headlines, update notices and other contents to various people. It can also be used to organize the content for simplicity of reading

The term RSS and its expansion is a misnomer as in reality, it stands for an umbrella of various formats, most of them structurally similar but used for different purposes. There are seven formats of RSS currently in use of which the prominent ones are the RSS 1.1 and RSS 2.0 formats. It is really a lightweight XML format designed for sharing content in a much simpler manner.

The reason for the widespread popularity of RSS is that many people are interested in websites with frequently changing content but at unpredictable schedules. These websites mainly include news sites, community sites, product information pages and medical websites. RSS feeds are used in order to keep abreast of the changes in the content of such websites. This saves a lot of time and removes the tediousness associated with repeatedly checking each website.

The above problem was initially solved by email notification. However, the drawback of this was that emails sent in bulk were mistakenly taken for spam and were lost. Also, the problem of disorganization of the emails gets overwhelming to the user. The benefit of using RSS is that notifications of changes to multiple websites are handled easily and the result is distinct from email.

Presently, RSS is used by the internet community for sharing the content of numerous websites, including the day’s latest headlines from around the world. A recent use of the RSS XML format is for podcasting which is the method of sharing multimedia content among people.

Examples of RSS

|Item 1: |  |

| | |

|  Title: |Sidewalk contract awarded |

|   | |

|Description: |The city awarded the sidewalk contract to Smith Associates. This hotly contested deal is worth |

| |$1.2 million. |

|   | |

|Link: | |

|Item 2: |  |

|   | |

|Title: |Governor to visit |

|  | |

| Description: |The governor is scheduled to visit the city on July 1st. This is the first visit since the |

| |election two years ago. The mayor is planning a big reception. |

|   | |

|Link: | |

History

Anything and everything that creates a revolution in any field arises out of a need for something. Similar is the case with that of RSS. It came about because Netscape wanted a mechanism to keep its existing customers and any other potential clients about the changes in their website. It wanted to notify them the instant any changes are made so that they are kept abreast of the latest changes.

The format of RSS as it is known today is a successor of many previous formats used for syndication designed to work only with a single server. The original RSS format, which stood for Rich Site Summary, later to be rechristened RSS 0.9 was developed by Netscape for use on its web portal. Later, as the requirements increased, more developments were made to the format, resulting first in the RSS 1.1 or the RDF Site Summary and then to the latest version called the RSS 2.0 or the Really Simple Syndication.

The table below shows the various formats and their pros and cons as well as the stages in their development.

|Version |Owner |Pros |Recommendations |

|0.9 |Netscape | |Not used presently |

| |UserLand | Drop dead simple |Used for basic syndication. Migration to |

|0.91 | | |higher forms easy |

|0.92 | | Allows richer metadata than |Rendered obsolete by RSS 2.0 |

|0.93 |UserLand |0.91 | |

|0.94 | | | |

| |RSS-DEV Working |RDF-based, extensibility via |Used for RDF based applications or |

| |Group |modules, not controlled by a |advanced RDF specific modules |

|1.0 | |single vendor | |

| | |Extensibility via modules, easy |Use for general-purpose, metadata-rich |

|2.0 |UserLand |migration path from 0.9x branch |syndication |

Fig . Different stages in the evolution of RSS

Need for RSS

The primary reason for the universal popularity of RSS is that it is the simplest and easiest way to solve a problem that extends far beyond the principle of syndication. It is a much better way to share data than the common methods like fetching and parsing HTML, using proprietary APIs, database dumps or cobranding.

Grabbing and parsing the HTML from a provider's Web site is the most common way to share data. The problem with this cut-and-paste method is that an application must be developed and maintained for each data source. These applications will most likely have to change each time the provider changes the HTML presentation. This can quickly become cumbersome and cost prohibitive when gathering information from multiple sources.

APIs are an improvement over grabbing and parsing HTML. However, they too have their own drawbacks. Firstly, APIs are language dependant, and as a result requires competency in certain core areas which may or may not be available. Secondly, they are not extensible and finally, each API is implemented differently based on the needs of the programmers.

Web sites also exchange data via database dumps. In this method, the data must be converted on both ends and the problem of dealing with multiple data formats is not eliminated. This option would actually work if all content providers used the same data model for delivering information, an improbable scenario.

Cobranding is a method in which the information provider hosts custom versions of the application for each customer. This works out nicely for subscribers that don't have any programming resources. The problem is that the data is either presented in a generic format that doesn't fit the customer's interface, or it requires that the content provider maintain a cobranding template for each customer. While this is a good solution, the functionality is limited to what the Web application can provide. It also requires a large amount of planning and development on the provider's part. However, this technique has worked out nicely for companies like that allow users to sign up and sell books from their own Web sites.

In the RSS model, each site publishes a file describing the contents of its feed. Other sites can subscribe to that channel and grab its contents. The RSS file is then converted to HTML and displayed directly on a subscriber site, or it might be edited first to select only those items that are appropriate for the site's audience. The advantage of RSS is that once an application to subscribe to one site has been created, the same thing can be extended for subscriptions to a lot of web sites.

Working of RSS

RSS works by having a website maintain a list of notifications in the standard way on the website. This list of notifications is called the RSS Feed. To keep track of content changes in the website, a user has to check out this list. For more efficiency, special programs called RSS aggregators have been developed that automatically access the RSS feeds of the websites that the person has subscribed for and organize the information collected. The information is in the form of an XML file. These feeds are also known as RSS channels or RSS Readers.

Basically, the RSS aggregator is a web browser for RSS content. The aggregators periodically check the RSS feeds for new items, thereby making it possible to keep track of changes in multiple websites without having to tediously read web pages and innumerable amount of times. The usefulness of RSS feeds is that they present the additions and the changes in a neat and elegant manner. If the user’s curiosity is piqued by the headline and description, then he/she can click on the link to get the full story.

The figure on the next page shows a typical RSS aggregator. The left hand side contains the list of RSS feeds that have to be monitored or are subscribed to. Additional information such as the number of unread items is also indicated next to each item in parenthesis. The right hand side contains the details of the most recent items in the selected RSS feed.

There are many RSS aggregators available. Some are accessed through a browser, some are integrated into email programs, and some run as a standalone application on your personal computer.

[pic]

Fig. A typical RSS aggregator

RSS Syntax

RSS is an XML grammar for sharing data. That means that an RSS file contains placeholders for data, which are identified by a starting and ending tag. The first task required to RSS-enable a website site is to create such a file on your Web server. This RSS file contains the title and description of items that you want to promote on your site. As you'll see, an RSS file is usually generated by a simple program but it can also be created by hand.

The first line of an RSS file contains an XML declaration:

Though the XML declaration isn't required, it is recommended for backwards compatibility.

The next item in an RSS file is the DTD that identifies the file as an RSS document. This is necessary to determine whether the file is valid when tested against the rules of the RSS DTD:

The rss element is the root or top-level element of an RSS file. The rss element must specify the version attribute. (The current version is 0.91). It may also contain an encoding attribute (the default is UTF-8):

The root element is the top-level element that contains the rest of an XML document.

An rss element may contain one and only one channel element. This element will contain the individual items. Each channel must contain the following elements:

• title - the name of the channel

• description - a short description of the channel

• link - an HTML link to the channel Web site

• language - the language encoding of the channel. A list of values is available from . The code for U.S. English is en-us

• one or more item elements

A channel may also contain the following optional elements:

rating - the PICS rating for the channel Web site. PICS ratings are assigned by an independent agency.

• copyright - content copyright

• pubDate - date the channel was published

• lastBuildDate - date the RSS was last updated

• docs - additional information about the channel

• managingEditor - channel's managing editor

• webMaster - channel Webmaster

• image - channel image

• textinput - allows a user to send an HTML form text input string to a URL

• skipHours - the hours that an aggregator should not collect the RSS file

• skipDays - the weekdays that an aggregator should not collect the RSS file

A channel may contain an image or logo. The image element must contain the image title, commonly used as the ALT attribute when converted to an HTML image element, and the URL of the image itself.

The image element may also include the following optional elements:

• link - a URL that the image should be linked to

• width - the image width

• height - the image height

• description - an area for additional text

The textinput element lets users input data in an HTML text field:

• title - label of the submit button

• description - text input description

• name - text input name

• link - URL to which to send the input

For example, the Freshmeat channel in Listing Two (available online) contains a textinput element that lets users search the application database.

Each channel can contain up to 15 items. Actually, you can include more, but if you do, Netscape Netcenter won't accept the file. Each item contains a title, link, and description. The item elements are the real meat of the RSS file. They provide the headlines and summaries of the content you want to share with other sites.

The RSS specification includes all HTML entities for convenience; however, you can't include any HTML elements, such as . For the RSS file to remain valid you should use only those elements that have been defined in the specification. Additionally, you must follow a few basic XML guidelines for the file to be well formed. An XML parser can't properly parse an XML file unless it follows the following well-formed rules:

1) Each starting tag must have an ending tag.

2) Internal entities such as &, ", <, >, must be encoded.

3) XML elements must be well balanced; that is, the end tag should be at the same level in the tree as the start tag.

Creating an RSS File

Perl modules that make it easy to maintain and parse RSS files are created. XML::RSS also requires the XML::Parser module maintained by Clark Cooper. Both are available through CPAN. In addition, freely available RSS tools for gathering, editing, and displaying RSS files, most of which are available at can also be used. Instructions on installing the XML::RSS module are also available from the site.

To use the module in a Perl program, you must first load the module into memory and create a new instance of the class:

use XML::RSS;

my $rss = new XML::RSS;

Optionally, you can pass the RSS version and the language encoding into the new method when creating a new instance:

my $rss = new XML::RSS (version=> '0.91', encoding=>'ISO_8859-1');

XML::RSS simplifies several common tasks related to maintaining an RSS file. First, the module abstracts the XML syntax into a number of class methods. For each RSS element, there is a related method. Each element method operates in a similar fashion. For example, to set values for the channel element, we would call the channel method and pass it an associative array, which contains the names and values of each channel subelement (see Example 1).

You can also use these methods to modify values of the RSS. For example, to change the URL of the RSS image, you might use the following:

$rss->image(url => '

images/fm.mini.jpg');

The add_item method is used to add a new RSS item in a RSS file. It usually appends the item to the list but items can also be placed in any location.

The parsefile method helps in retrieving the values of an RSS file. The parsefile method takes the RSS filename as its only parameter and transforms it into a multidimensional hash:

$rss->parsefile("fm.rss");

To access the value of a subelement, simply pass the name of the subelement into the method. For example, to retrieve the value of the textinput description:

my $ti_desc = $rss->

textinput("description");

The element method will return the value of the subelement. Once the RSS file has been created or modified it can be saved using the method

$rss->save("fm.rss);

Before syndicating content, a process to keep the RSS file up-to-date has to be set up. Optimally, when a new item is posted to a Web site, it will also show up in the RSS file. This file can be maintained by hand, but the process can also be automated. Listing Two is a Perl script that uses the XML::RSS module that creates a channel for Freshmeat and saves it to fm.rdf. The output of the script is contained in Listing Three.

The XML::RSS module also makes it easy to update an RSS file. Listing Four is a short script that inserts a new item in our Freshmeat RSS file. First, we load the module into memory with the use statement. Then we create a new instance of the class with the new method, setting the RSS version to 0.91. Next we parse an RSS file with the parsefile method, insert a new item with the add_item method, and then save the RSS to a file with the save method.

Converting an RSS File to HTML

Once the RSS file has been received, it has to be displayed. The easiest method of displaying an RSS file on a Web site is to convert it to HTML and use an SSI to bring the content into a template. Listing Five does just that. It's a command-line script that takes a filename or URL as a parameter, iterates through the XML::RSS internal structure, and prints the HTML equivalent. If the command-line parameter is an HTTP URL, the RSS file is fetched from the remote Web server via the LWP::Simple module.

In Listing Five, the items are iterated inside a foreach loop, printing the corresponding title and link. The last part of the subroutine prints the HTML form using the textinput subelements. The result is an HTML form field that lets a user search for applications on Freshmeat. Listing Six is the output of the script when using Listing One as the input file.

Now that we have the channel in an HTML format, we can include it on our Web site. The majority of the script is contained in the print_html subroutine, which handles the RSS-to-HTML conversion. Most of the subroutine is actually HTML code.

The first few lines of the subroutine print a table header that contains the channel title, link, and image. As mentioned previously, the XML::RSS module builds a multidimensional hash that represents the RSS file. The hash can be accessed directly instead of using the class methods. For example, the channel title and link are contained in

$rss->{'channel'}->{'title'}

and in

$rss->{'channel'}->{'link'},

respectively. The image URL and link would be contained in

$rss->{'image'}->{'url'}

and

$rss->{'image'}->{'link'}

variables.

$rss->{'items'}

is a reference to the array of RSS items.

Sample RSS Syntax

...

RSS Resources



Defined in XML, the Rich Site Summary (RSS) format has

quietly become a dominant format for distributing headlines on the Web.

Our list of links gives you the tools, tips and tutorials you need to get

started using RSS. 0323

...

Syndication

Once an RSS file exists, any other site can grab it regularly. RSS standardizes a format for the delivery of content. This makes it easier for a content provider to distribute content broadly, and for an affiliate to receive and process content from multiple sources. However, in most cases, the actual content is not really distributed, only the headlines are, which means that users will come back to your affiliate site if they're interested in the story. For example, many content providers use ad banners as a primary source of revenue. This model depends on a large volume of users reading their content on a regular basis. The RSS format is a marriage made in heaven for extending readership. This explains why most early adopters have been news providers.

The working of syndication is as follows. First, the required RSS files are generated. Drop the headline into the title element. Secondly, make the RSS file available on the web server and register it to as many aggregators as possible. Once the RSS file is published, content can be sent into mailing list, cell phones etc.

Consider a site being part of a web development affiliate network in which each site focuses on a particular specialty or area of interest. In order to maximize readership, cross promotion of headlines is important. This is in the best interests of all the participating affiliates. If each site makes its RSS feeds available, then they can easily integrate the providers’ headlines. When a user reads the headline, and clicks on the links following it, both sites get an increase in page views.

Aggregation

The practice of gathering multiple RSS channels into one central location is called aggregation. While most aggregator Web sites share a common goal (gathering content) they serve different purposes. For example, offers its feeds as channels to Netcenter users, whereas offers news feeds primarily for use on other Web sites. Another implementation of aggregation is Dave Winer's my., which offers a service similar to . However, the aggregator also offers aggregate feeds, which send new content to partners via XML-RPC function calls. The benefit of using aggregators is that they make many feeds available from one place. Furthermore, an aggregator may offer tools or solutions that allow partners to customize feeds and minimize the integration effort. In addition, an aggregator site might provide tools and services that make it easier for content providers to syndicate their information.

Weblogs

One of the more interesting trends the Web has seen in the past months is the advent of the Weblog. A Weblog is a portal to the life of an individual or group. The ideas posted on a Weblog often include personal, political, technical, or editorial comments that are significant to the author. The Web site that popularized the Weblog is probably , a site that posts interesting technology tidbits for computer geeks. , an earlier example of a Weblog, is a site at which readers get a personal insight into the mind of Dave Winer. Dave often combines his opinions of technical innovations with politics, philosophy, and history, which makes for an interesting daily read.

An RSS is a good foundation for creating a Weblog. An example is , a site containing Perl/XML resources and news. A simple CGI script that uses the XML::RSS Perl module is used to add new headlines. The script updates both the front-page HTML file and the RSS headlines, which are then picked up by several aggregators including and my.. This dual-purpose method alleviates the Weblog editor from updating multiple files. Instead, the editor can focus on his or her job and let an application on the Web server do the work behind the scenes.

Common Methods of producing the RSS file

The special XML-format file that makes up an RSS feed is usually created in one of a variety of ways.

Most large news websites and most weblogs are maintained using special "content management" programs. Authors add their stories and postings to the website by interacting with those programs and then use the program's "publish" facility to create the HTML files that make up the website. Those programs often also can update the RSS feed XML file at the same time, adding an item referring to the new story or post, and removing less recent items. Blog creation tools like Blogger, LiveJournal, Movable Type, and Radio automatically create feeds.

Websites that are produced in a more custom manner, such as with Macromedia Dreamweaver or a simple text editor, usually do not automatically create RSS feeds. Authors of such websites either maintain the XML files by hand, just as they do the website itself, or use a tool such as Software Garden, Inc.'s ListGarden program to maintain it. There are also services that periodically read requested websites themselves and try to automatically determine changes (this is most reliable for websites with a somewhat regular news-like format), or that let you create RSS feed XML files that are hosted by that service provider.

The diagram on the next page shows how the websites, the RSS feed XML files and a personal computer are connected.

[pic]

The above diagram shows a web browser being used to read web site 1 over the internet and then web site 2.it also shows the RSS feed XML files for both websites being monitored simultaneously by a RSS aggregator

Comparison of different versions and incompatibilities

As noted above, there are several different versions of RSS, falling into two major branches. The RDF, or RSS 1.* branch includes the following versions:

• RSS 0.90 was the original Netscape RSS version. This RSS was called RDF Site Summary, but was based on an early working draft of the RDF standard, and was not compatible with the final RDF Recommendation.

• RSS 1.0 and 1.1 are an open format by the "RSS-DEV Working Group", again standing for RDF Site Summary. RSS 1.0 is an RDF format like RSS 0.90, but not fully compatible with it, since 1.0 is based on the final RDF 1.0 Recommendation.

The RSS 2.* branch (initially UserLand, now Harvard) includes the following versions:

• RSS 0.91 is the simplified RSS version released by Netscape, and also the version number of the simplified version championed by Dave Winer from Userland Software. The Netscape version was now called Rich Site Summary, this was no longer an RDF format, but was relatively easy to use. It remains the most common RSS variant.

• RSS 0.92 through 0.94 are expansions of the RSS 0.91 format, which are mostly compatible with each other and with Winer's version of RSS 0.91, but are not compatible with RSS 0.90. In all Userland RSS 0.9x specifications, RSS was no longer an acronym.

• RSS 2.0.1 has the internal version number 2.0. RSS 2.0.1 was proclaimed to be "frozen", but still updated shortly after release without changing the version number. RSS now stood for Really Simple Syndication. The major change in this version is an explicit extension mechanism using XML Namespaces.

For the most part, later versions in each branch are backwards-compatible with earlier versions (aside from non-conformant RDF syntax in 0.90), and both versions include properly-documented extension mechanisms using XML Namespaces, either directly (in the 2.* branch) or through RDF (in the 1.* branch). Most syndication software supports both branches.

The extension mechanisms make it possible for each branch to track innovations in the other. For example, the RSS 2.* branch was the first to support enclosures, making it the current leading choice for podcasting, and as of mid-2005 is the format supported for that use by iTunes and other podcatching software; however, an enclosure extension is now available for the RSS 1.* branch, mod_enclosure [4]. Likewise, the RSS 2.* core specification does not support providing full-text in addition to a synopsis, but the RSS 1.* markup can be (and often is) used as an extension.

The most serious compatibility problem is internal to the 2.* branch. Userland's RSS reader -- generally considered as the reference implementation -- did not originally filter out escape HTML markup from feeds. As a result, publishers began placing HTML markup into the titles and descriptions of items in their RSS feeds. This behaviour has become widely expected of readers, to the point of becoming a de facto standard, though there is still some inconsistency in how software handles this markup, particularly in titles. The RSS 2.0 specification was later updated to include examples of entity encoded HTML, however all prior plain text usages remain valid.

Application of RSS- Podcasting

One of the most important applications of RSS that is gaining widespread popularity is podcasting or the sharing of multimedia content. Podcasting is distinct from other types of online media delivery because of its subscription model, which uses the RSS 2.0 XML (or RDF XML) format to deliver an enclosed file. Podcasting enables independent producers to create self-published, syndicated "radio shows," and gives broadcast radio programs a new distribution method. Listeners may subscribe to feeds using "podcatching" software (a type of aggregator), which periodically checks for and downloads new content automatically. Some podcatching software is also able to synchronise (copy) podcasts to portable music players. Any digital audio player or computer with audio-playing software can play podcasts. The same technique can deliver video files, and some aggregators can now play video as well as audio.

Other applications

In addition to notifying you about news headlines and changes to websites, RSS can be used for many other purposes. There does not even have to be a web page associated with the items listed -- sometimes all the information you need may be in the titles and descriptions themselves.

Some commonly mentioned uses are:

• Notification of the arrival of new products in a store

• Listing and notifying you of newsletter issues, including email newsletters

• Weather and other alerts of changing conditions

• Notification of additions of new items to a database, or new members to a group

One RSS aggregator is all that you need to read all of the RSS feeds, be they headlines, alerts, changes, or other notifications. RSS is shaping up to be a very popular and useful means for communicating.

The Alternative - Atom

In reaction to perceived deficiencies in both RSS branches (and because RSS 2.0 is frozen with the intention that future work be done under a different name), a third group started a new syndication specification, Atom, in June 2003, and their work was later adopted by Internet Engineering Task Force (IETF).

The relative benefits of Atom and the two RSS branches are currently a subject of heated debate within the web syndication community. Supporters claim that Atom improves on both RSS branches by relying more heavily on standard XML featuress, by supporting autodiscovery, and by specifying a payload container that can handle many different kinds of content unambiguously. Opponents claim that Atom unnecessarily introduces a third branch of syndication specifications, further confusing the marketplace.

The Future

RSS can be used easily as a generic format for exchanging content on the Web. More Web sites are using XML and RSS as they discover that the technologies help promote traffic to a site. RSS is a good starting point for many Webmasters who aren't ready to immerse themselves in XML yet.

It's important to note that while RSS is capable of syndicating content headlines, there are other XML formats like XMLNews and ICE that are better suited for handling larger syndication systems.

References



• . – the free online encyclopedia







................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download