Alphabetizing titles leaving out leading 'A', 'An' and ...



Alphabetizing titles leaving out leading 'A', 'An' and 'The': using substring, XML to XML transformations with SAXON, and using position() for columns

Suppose you have an xml file of picture or book elements, with each element having a title. You are using xsl to transform the xml file into formatted XHTML. It is easy enough to sort on the child nodes (see below). However, most sorting leaves out leading articles such as 'a', 'an' and 'the'. The solution shown here is to have a presort transformation that takes the original xml file and produces a new xml file. In this file, picture nodes with a child node of name title with the leading article will be replaced by a picture node with a node holding the article and a node holding the rest of the original title. This xml file is transformed using a sort based on the value of title. Any title nodes without leading articles are left as is.

The first transformation is done using SAXON, an XML/XSLT program. In this example, the xml file is named picturearchive2.xml. The xsl file is picturepresort.xsl. The result of the transformation using saxon is an output file named fixed.xml. This file was modified [by hand] by adding in the instruction that refers to the (final) xsl formatting. This second xsl file is named picturesort.xsl.

The original xml file: picturearchive2.xml

A Proud Trio

three.jpg

Jeanine

Esther

Aviva

At the picnic

emcoreydog.jpg

Esther

Corey

The Adoring Mom

adoringmom.jpg

Daniel

Jeanine

Aviva

Marching In

graduation0002.jpg

Daniel

Leanne

In the stands

reception0001.jpg

Amy

Jeanine

Illa

Dhmant

The two titles: A Proud Trio and The Adoring Mother need modification.

The xsl file used to find the titles that need to be split into ptitle and title nodes is picturepresort.xsl:

The critical steps are performed in the xsl:choose construction. Note

• individual tests are made using and substring function.

• the 'otherwise' test is for the elements that do not need modification

• the current node is the picture node, so the substring must refer to the child named title

• the numbering for strings begins with 1 (origin-1 not origin-0).

• substring uses 2 or 3 parameters: the first is the string, the second is the start, and the third, if present is the length. If the 3rd parameter is missing, then the substring returns the rest of the string. Both of these forms are used

• the blank after a, an or the is referenced in the test and used in the creation of the contents of

• the coding recreates the photos, picture, filename and person nodes, along with any ptitle nodes and modified or unmodified title nodes. This is done using several times.

The two files are copied into the folder holding the saxon program. The command line command

saxon –o fixed.xml –a picturearchive2.xml

is used. It produces a new file, fixed.xml (copied and pasted from TextPad, using word wrap).

A Proud Triothree.jpgJeanineEstherAvivaAt the picnicemcoreydog.jpgEstherCoreyThe Adoring Momadoringmom.jpgDanielJeanineAvivaMarching Ingraduation0002.jpgDanielLeanneIn the standsreception0001.jpgAmyJeanineIllaDhmant

This file is modified by hand by adding one line: the instruction. (This could be avoided using HTML/JavaScript invocations of xml and xsl files).

A Proud Triothree.jpgJeanineEstherAvivaAt the picnicemcoreydog.jpgEstherCoreyThe Adoring Momadoringmom.jpgDanielJeanineAvivaMarching Ingraduation0002.jpgDanielLeanneIn the standsreception0001.jpgAmyJeanineIllaDhmant

The (final) transformation/formatting is done using the picturesort.xsl file:

Picture Archive

200

The critical step here is the following the . A subtle point is that the line has no effects in the cases of a picture element NOT having a child.

One additional step is to display the pictures in 2 columns, with the [combined] titles underneath. The picturesortcols.xsl file is used, making use of the position() function and the mod operator. This stylesheet does not use tables. The fixed.xml file must be changed to reference this stylesheet.

Picture Archive

200

The elements are used to check if the position number of the picture node is odd or even. Odd positions trigger a before the picture and even positions trigger a afterwards. Because in this example, everything is centered, if there is an odd number in total, the last picture and title combination will be centered.

There is a new standard/technology called XML pipeline that may be relevant to chaining together transformations. This process could be made fully automatic using the JavaScript techniques discussed. What I have shown here is the workings of the XSL.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download