Earth Science Markup Language - …



Earth Science Markup Language

Schema Documentation for v3.0

Design Team:

Rahul Ramachandran

Andrew McDowell

Xiang Li

Sunil Movva

Matt He

Updated – 03/11/03

Introduction

The Earth Science Markup Language (ESML) project has a goal of creating an XML-based markup language which can be used to describe the structure, semantics and content of any earth science dataset in any data format. The ESML contains two parts: 1) ESML files written using a number of XML-style elements and attributes to describe Earth Science data sets, and 2) ESML schema which defines the elements and attributes used in ESML files. These elements and attributes are used to describe various Earth Science data sets. The ESML schema is also used to validate an ESML file. To facilitate ESML applications, an ESML library is designed and implemented to help data reading using ESML files. The ESML files conformed to the ESML schema are the valid ESML files and are supported by the ESML library.

ESML provides Syntactic (or structural) metadata describe the data in terms of bits and bytes. These metadata are used by the ESML parser to give structure to the bit stream which is the data file. For example, the syntactic metadata tell the parser that the next 32 bits of data are to be interpreted as a big-endian 32-bit two’s complement integer value.

Syntactic Metadata

The highest level syntactic metadata structure covers the entire syntactic metadata description. It contains structure segments which further describe the details of the data structure. All the syntactic metadata in an ESML file are contained within the ESML element tags of and .

Binary Format

Binary metadata describe a file that is in binary format (i.e., not in ASCII characters). Structure a can be used to define grouping data. By default BinaryStructure is one. These can be nested to any level. A field declaration defines the lowest (atomic) level of data.

BinaryStruc Def

This is the object that can contain different types and allows any combination of the following types.

BinDatum Type

This declaration defines one of the atomic elements that makeup the data structure. The definition must contain type attributes The name and occurs attributes are optional. Size attribute is optional and defaults to a value based on the type attribute. The order attribute parameter defaults to "Big Endian"

BinHeaderDef

Similar to the definition for Field, extends BinDatumDef but also contains attribute for symbols than can be present the headers.

Bin Array Def

Arrays provide grouping for lower level elements. Multidimensional array can be declared by nesting these declarations.

BinIfCondition Def

If condition tag may be used for conditional structuring for files that contain variably formatted records based on a value contained in a field at the begining of the record. If the condition evaluates to true, then the enclosed declaration list is used for the next part of the file; otherwise the declaration list is ignored. The expression must contain only the names of previously encountered bit or integer fields, integer constants and operators.

(This tag is currently not supported by the ESML Library)

ASCII Format

ASCII meta data describes character text data that are encoded in ASCII characters (0x20 through 0x7E, plus assorted whitespace characters such as tab and newline)

AsciiStruct Def

This declaration allows any combination of the following types:

AsciiArray Def

Arrays provide grouping for lower level elements. Multidimensional array can be declared by nesting

DelimiterOptions

options that can be used with the delimiter attribute in an array

AsciiDatum Def

This declaration defines the atomic element that makeup the data structure. An atomic data definition must contain a format attribute. The name and occurs parameters are optional.

Ascii Format Param Def

Used in format attribute in AsciiDatum

formatparam ::= format="fixedtext formatval fixedtext"

fixedtext ::= any sequence of characters not including % | %% | empty

formatval ::= ctype | dtype | ftype | otype | stype | xtype

ctype ::= % width c

dtype ::= % width long d

ftype ::= % width places long f

otype ::= % width long o

stype ::= % width s | %[ charset ]

xtype ::= % width long x

width ::= number | empty

places ::= . number | empty

long ::= l | empty

charset ::= rangelist | ^ rangelist

rangelist ::= rangelist range | range

range ::= character – character | character

Every atomic ASCII data element declaration (field) must have a format specified. Spaces are shown in the formatval description above for clarity only; the components of a formatval must not be separated by spaces.

formatval is a subset of the formats available in C’s scanf function, which is a very powerful parsing engine.

Each formatval may begin and/or end with “fixed text”. When parsing data, the fixed text must match the next character(s) in the input stream exactly. The single exception to this rule is the space (blank) character: When used in “fixed text” it represents any sequence of leading (or trailing) whitespace characters such as blank, tab, and newline.

The percent character begins a format specification which terminates in one of the type selector characters. (To represent a percent character in “fixed text” it is necessary to double the character (%%).) The type selector characters determine the type and size of the data, as described below:

The c data type represents a single character embedded in a field of width characters. If there is more than one character in the field, the first is used and the rest are discarded. If width is omitted, a field width of one is used.

The d data type represents integer data in base ten. Allowable characters are +, -, and 0-9. If width is omitted, characters are consumed until a character not in the allowed character set is encountered. The long indicator (lower-case “ell”) may be used to indicate that the value exceeds 32 bits.

The f data type represents floating-point data. Allowable characters are +, -, ., e, E, and 0-9. If width is omitted, characters are consumed until a character not in the allowed character set is encountered. If places is specified and the actual data does not contain a decimal point, the decimal point is assumed to be present to the left of the rightmost places characters in the mantissa. The long indicator (lower-case “ell”) may be used to indicate that the value is double-precision.

The o data type represents integer data in base eight. Allowable characters are +, -, and 0-7. If width is omitted, characters are consumed until a character not in the allowed character set is encountered. The long indicator (lower-case “ell”) may be used to indicate that the value exceeds 32 bits.

The s data type represents string data. Two forms are provided: %s consumes all characters until width characters or a whitespace character is encountered, whichever comes first. %[] consumes all characters specified by the character range list within the brackets. If the character range list begins with a caret (^), the sense of the list is inverted. The unprintable characters newline and tab may be represented in the range list through the use of conventional C escape sequences (\n and \t, respectively).

The x data type represents integer data in base 16. Allowable characters are +, -, 0-9, a-f, and A-F. If width is omitted, characters are consumed until a character not in the allowed character set is encountered. The long indicator (lower-case “ell”) may be used to indicate that the value exceeds 32 bits.

Examples

format="Latitude = %d"

format="%27s"

format=" %9.3f degrees"

format="%.3f"

format="%9lf"

format="%x "

format="%13o"

format="%[a-zA-Z0-9_]"

This has been implemented as an XML pattern!

AscHeaderDef

This declaration is similar to the definition for fields but also contains attribute for symbols than can be present the headers

HDF-EOS Format

Definitions for HDF-EOS data format

HdfEosDef

It is semantically equivalent to Swath, Grid or Point Objects within an HDF-EOS file

HdfEosFieldDef

It is semantically equivalent to Fields in an HDF-EOS file

GRIB Format

Definitions for GRIB data format

GribStructDef

Structure declaration to hold various fields

GribFieldDef

Defines individual fields in the GRIB file

Example ESML files

Simple Binary File

Simple ASCII File

Simple HDF-EOS File

Interleaved ASCII File

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download