IDS ADI 15926 Template Methdology



JORD

[pic]& [pic]

JORD PROTOTYPE TRIPLE-STORE & ENDPOINT

Implementation Report & Documentation

JORD (Joint Operational Reference Data) Project

enhancing the

PCA Reference Data Service (RDS) Operation

in partnership with FIATECH

|Rev |Date |Description |By |Check |

|V1 |23rd Nov 2011 |Initial contractor draft for technical review. |MRS |ISG |

|V1.1 |2nd Jan 2012 |First version circulated to JORD members. |MRS |ISG / RDC |

|V2 |18th Jan 2012 |JORD deliverable for project & external use. |ISG | |

|V2.1 |1st Feb 2012 |Index & numbering corrected. Memory leak malfunction resolved. |ISG | |

| | | | | |

| | | | | |

| | | | | |

Executive Summary

ISO-15926, the standard for lifecycle integration and interoperability, is based on highly generic information modeling principles, and has a high dependency on shared reference data. To maximize the flexibility and availability of reference data in ISO15926 compliant forms across distributed business users, ISO15926 adopts “Triples” as the most generic representation of all semantic content, where each element is represented by a URI (web-address) resolvable by browsers and queries through an “EndPoint”.

This document reports and documents the implementation of the Triple Store and Endpoint created as the JORD prototype for reference data publishing to be supported by the enhanced PCA RDS being created in partnership with FIATECH.

The work was performed by DNV under their existing services contract with PCA and the content of this report and documentation is taken in its entirety from

DNV Report No: PP022471-13O34N0-2. The report constitutes the final delivery for JORD Phase 1 End-Point & Triple Store – Prototype. It documents the various system design choices made, and also includes a user guide for the two end-user applications.

The resultant triple-store & endpoint with user instructions are found here:



The PCA RDS support services are described here:



Acknowledgements

JORD Charter Member organizations contributed funding and direction to the project:

Full sponsors

BP,

EPIM,

RosEnergoAtom,

Black & Veatch,

CCC,

Hatch

and VNIIAES

Supplementary subscribers

Woodside

and Bechtel

Table of Contents

1 INTRODUCTION 5

1.1 Abbreviations and Terms 5

2 REPORT 6

2.1 Overall System Layout 6

2.1.1 Note on 32 vs 64 bit 6

2.2 Hardware 7

2.3 HTTP Server (Apache) 7

2.3.1 Redirect to the wiki pages 7

2.3.2 Mounting servlets 8

2.3.3 Configuration 8

2.3.4 Technical Summary 8

2.4 Cygwin 8

2.5 Servlet Container (Apache Tomcat) 9

2.5.1 Technical Summary 9

2.6 Triple Store 9

2.6.1 RDF Graphs, and Graphs in the triple store 10

2.6.2 Loading the triple store 10

2.6.3 Updating the JORD triple store 11

2.6.4 Technical Summary 11

2.7 Joseki (SPARQL Endpoint) 12

2.8 The SPARQL HTML frontend 12

2.8.1 Queries 14

2.8.2 Technical Summary 14

2.9 Pubby (Linked Data Server) 14

2.9.1 The reason we need this software 14

2.9.2 How Pubby works 14

2.9.3 Type sensitive Pubby 15

2.10 Generating the RDS-WIP to PCA Map 17

3 CONCLUSIONS 18

3.1 Joseki memory leak 18

3.2 Next steps 18

4 REFERENCES 19

4.1 Software download locations 19

4.1.1 Pubby 19

4.1.1.1 Pubby configuration file 19

4.2 Apache HTTP server 19

4.3 Apache Tomcat 19

4.4 Cygwin 19

4.5 Joseki 19

4.6 TDB 19

APPENDIX 1 END POINT USER GUIDE

APPENDIX 2 LINKED DATA USER GUIDE

IntroductioN

This is a technical overview for the Phase 1 Prototype endpoint. It contains details concerning what software it incorporates and how the different pieces of the system play together. The intended audience is project managers, technical staff and to some degree end users. As a result the text may be to verbose for some. In an attempt to make the text easier to read, each component description includes a technical summary, which is intended to give a concise description of the nuts and bolts for the technical audience.

A number of third party applications have been used in this project. These are all open source programs with well written documentations, including installation- and usage instructions. To duplicate these instructions in this document was deemed unnecessary. The configurations of the applications however, are relevant, and are included where applicable.

In the section concerning Pubby, the Linked Data servlet, some general notes concerning Linked Data have also been included.

For sections regarding software exposed to the end-user there is a usage description.

1 Abbreviations and Terms

• PCA – The POSC Caesar Association

• PCA RDL – The PCA Library

• DNV – Det Norske Veritas, the company responsible for the project execution.

• The Phase 1 Prototype, the prototype – The entire solution described in this document. Including the endpoint and triple stores.

• The protoype server – The server that holds the Phase 1 Prototype.

• LD – Linked Data

• URI – Uniform Resource Identifier

• URL – Uniform Resource Locator

• Sogndal server - The server that holds the PCA Wiki.

• Resolve, resolveable - If an URI can be written in a web browser and return a resource such as an HTML page, the URI is said to resolve or be resolveable.

report

1.

1 Overall System Layout

[pic]

1

The various boxes outside the PCA & DNV and iRING Endpoint boxes, will be described in further details in this document.

1 Note on 32 vs 64 bit

It is generally accepted that 64 bit is better than 32 bit, although the exact advantages are sometimes hard to grasp. Instead of considering the pros and cons for each software choice, the project has chosen 64 bit over 32 bit unless there is a compelling reason to do otherwise.

2 Hardware

The server is a Virtual Server with 64 bit Windows Server 2008 Enterprise Virtual Server. It has the equivalent of three 2.27 GHz processors and 6 GB of ram. A Windows platform was chosen because DNV use Microsoft as its main software provider. However, all the software used is available on e.g. GNU/Linux as it is open source and binaries exist for many platforms. Apache was chosen before Internet Information Services (IIS, the Microsoft web server) partly to accommodate portability to other platforms.

Aside from the triple store and the particular Java programs in the prototype, the prototype rests on a software environment based on various open source software described below.

3 HTTP Server (Apache)

The Apache HTTP server is an open source HTTP server with wide support and use. It is sponsored by many big corporations, among them Google, Microsoft and IBM. It is said to be used by over half of all the web servers on the internet.

Although the development server is a 64 bit server, the Apache installation is 32 bit. 32 bit was chosen because the jk_mod apache module required for communication with Tomcat wasn’t available 64 bit. This HTTP server will not be used for any heavy lifting, and I believe there are no practical advantages of having 64 bit over 32 bit. Tomcat on the other hand will do heavy lifting, which is why the 64 bit version is installed.

It should also be noted that the version of Apache used is 2.0, not 2.2 which is the last release. The reason behind this is the same as the one for 32 bit vs 64 bit. We were unable to find certain modules pre-compiled for 2.2. While compiling them ourselves certainly is possible, we couldn’t justify the overhead in development time this would cause.

The web server is used to serve all the content of the prototype, directly or indirectly. It is therefore in control of all access restrictions, and also the redirection of requests to the correct java servlets such as the Joseki endpoint and the linked data pages.

It should be noted that the servlet container could have handled most, if not all, the capabilities of the web server, as it contains its own built-in http server. However, the infrastructure for delegating certain tasks to a dedicated web server were already in place on the development server, so these capabilities of Tomcat were not used.

1 Redirect to the wiki pages

The domain name is connected to the prototype server IP, while the domain name is connected to the Sogndal server, containing the PCA Wiki pages. Previously there was a so-called WWW redirect from to , ensuring that users would get to the wiki pages whenever they navigated to the . This redirection was previously handled by the domain registrar, but is now handled by the development server under Apache.

The redirect was achieved using a RedirectMatch directive, as can be seen from the configuration snippet below.

2 Mounting servlets

The servlet container is connected to the web server through an Apache module called mod_jk. This module conveys the requests from Apache to Tomcat in plain text.

The JkMount lines in the configuration files tells the server to mount certain folders under to the Jord Tomcat server.

3 Configuration

What follows is a snippet from the Apache httpd.conf configuration file. Note that relevant modules must be enabled in order for the directives to have effect. For instance: the mod_rewrite module must be enabled for rewriting to occur.

ServerName

DocumentRoot "D:/htdocs-posccaesar"

JkMount /endpoint* jord

JkMount /rdl* jord

RewriteEngine On

RedirectMatch ^/(wiki.*)$ $1 [R]

Options FollowSymLinks

AllowOverride None

RewriteEngine On

RedirectMatch ^/$ [R]

4 Technical Summary

For the prototype, the apache server is simply redirecting requests to the Tomcat server, and redirecting requests intended for the PCA wiki. Aside from setting up this functionality, no configuration is necessary.

All servlets must be set up individually, which means that e.g. access restrictions can be applied to individual servlets. It also means that a restart of the apache server is needed when you introduce new servlets. These servlets are mounted using the JkMount directive, as shown in the configuration snipped above.

Redirection to the PCA wiki pages are done with the RedirectMatch directive, also shown above.

4 Cygwin

The TDB software comes shipped with a number of BASH scripts that require a UNIX environment. In addition, several helper scripts also in BASH on the server machine are used to handle automated downloading of files and updating the triple stores.

Cygwin provides a UNIX environment for Windows machines, and is installed on the test server. The installation is straightforward. In addition to BASH, two other GNU/Linux programs are used by the prototype.

• wget – A program for downloading content from the internet. In our case it’s the PCA RDL OWL file.

• unzip – A program to unzip zipped files. We use it to unzip the packaged PCA RDL OWL file.

These two programs are available from the Cygwin installation program, and those where used in the prototype.

The choice of installing Cygwin is really one of convenience. The prototype could have been designed without it, but it made a number of things slightly easier, not the least because of prior experience with these programs and existing scripts written for other similar projects.

5 Servlet Container (Apache Tomcat)

Apache Tomcat is the Apache foundations take on the Java Servlet and the JavaServer Pages technologies. It is widely used, though not as much as its web server counterpart.

While the Apache web server installed is 32 bit, the Tomcat server is 64 bit as that makes TDB perform better. In particular TDB is able to push the burden of caching files between RAM and disk to the operating system, instead of using the JVM memory. Furthermore it is the general consensus among the users and the developers of TDB that 64 bit is preferred over 32 bit.

1 Technical Summary

Apache Tomcat can be downloaded free of charge from Apaches web pages. Links and other resources are in the appendix.

No particular setup is needed for Tomcat, although it might be advisable to increase the memory you allow it to use. If you install this on Windows, the installation package comes with a Tomcat configuration tool to easily set the required parameters (Initial Memory Pool and Maximum Memory Pool). While in the /bin directory of the Tomcat installation, write the following in a console.

tomcat7w.exe //ES//TomcatServiceName

Where “WindowsServiceName” is the name of your Tomcat service.

6 Triple Store

The TDB triple store is a persistent storage layer for Jena, which in turn is a suite of libraries for building Semantic Web applications in Java. TDB is different from a conventional SQL server in that it doesn’t require a running process to be used. When a new TDB triple store is created, all information is stored in a folder on disk. The triple store will persist in that folder, regardless of any processes running. Another program using the TDB library may then grab the files, and perform queries or update the triple store.

This is the typical scenario when a triple store is updated. One process performs the batch load into the triple store, and a query front-end grabs the triple store for queries after the update procedure is done. We have not tested whether it is possible for two Java processes to use the same TDB triple store at the same time. What we do know is that it is impossible to delete or replace a triple store when a process has a lock on it. This leads to downtime for the triple store when the database is updated. We will get back to this downtime issue in ‎2.6.3.

1 RDF Graphs, and Graphs in the triple store

Like most triple stores, TDB is able to handle multiple graphs. A graph is a set of RDF triples, much in the same way an RDF file is a set of RDF triples. In the prototype, each graph originates from a distinct file, though this is not required by TDB.

Each graph has an URI used when you want to restrict a query to contain only certain graphs. As with an RDF file, there is no connection between the namespaces of the resources in a graph, and the name of the graph. The graph names are meta data in its purest form; required for the inner workings of the triple store, but without any intrinsic meaning. That being said, queries that want to target specific graphs in the endpoint will need to have the graph name hard coded, so a comfortable naming scheme should be agreed on by JORD in future projects. In the prototype, if no graph is specified, the union of all graphs are queried. This is suitable for most queries, and so most users will not have to care what the graph names are.

The “irm” bit of the graph names refers to IRM – Information Risk Management. IRM is the department in DNV responsible for the prototype project, and is also the department paying for the prototype server.

The graphs in the prototype triple store are:



holds the PCA RDL triples



holds the map between RDS WIP classes and PCA RDL classes.



holds the ISO 15926 part 8 representation of the ISO 15926 part 2 data model.

2 Loading the triple store

There are several ways to populate a triple store.

1. Through a SPARQL Update query.

2. Through the Jena API in a Java program

3. With the scripts that come with TDB.

Because we are likely to have both removals and additions to the triple store for each update, it is easier to simply destroy the old triple store, and replace it with a new one. This behaviour will naturally change if/when the triple store becomes the master. At that point updates will be done directly to the triple store.

The alternative is to compare the triples in the master OWL file with the existing triples in the triple store, and then add missing triples and delete redundant triples. DNV has not tried to do this before, as we haven’t found a compelling reason to do so. It would also require a non-negligible amount of research into new RDF technology and a complete rewrite of existing scripts. Lastly, it is not given that the method would increase the availability of the triple store. The bottom line is that, while this approach is possible, it would require a lot of work for little benefit.

Instead, loading the triple store is done with the command line utility "tdbloader", which runs in a Cygwin environment. This method is the fastest way to populate a TDB triple store according to the TDB documentation. The command for loading e.g. the RDL.owl file into a graph called could be as follows:

tdbloader --loc=C:\TDB\JORD --graph="" \



Note there is a mix of different path schemes in play. The documentation for TDB is not very good, and so I have had some problems getting tdbloader to work in the past, particularly concerning what path-scheme to use. This method works, and so I have not investigated if other path-schemes also work.

An update process for the PCA RDL takes approximately 10 minutes. In the prototype, a temporary triple store is loaded while the live version is still online. The server is then taken down for approximately 10 seconds while the temporary triple store is exchanged with the live one. All this is handled by a script.

3 Updating the JORD triple store

Swiftload is a short script that handles various aspects of massaging and importing OWL files to a triple store, and making sure that the transaction is as painless as possible. While the operations it helps doing aren’t very difficult, they are often repeated. It made sense therefore, to use a common method for all projects.

The script that updates the JORD triple store makes use of the Swiftload script. The process is as follows:

1. Download the PCA-RDL.owl.zip file from , and unzip it

2. Change the base namespace from to , using regular expression substitutions.

3. Load the three files into a temporary triple store.

4. Halt the Tomcat process in order to release the read-lock on the triple store.

5. Copy the temporary triple store to the actual triple store.

6. Restart the Tomcat process

In addition, the script is logging the process.

The script requires no input from a user, and is run each night as a Scheduled Task.

4 Technical Summary

TDB requires no installation as such. If you make sure that all the binaries are in the classpath of the executing application, you are set to go.

How you load the triple store with triples has been elaborated on above, the important technical part of that is the code snippet in 2.6.2. The only things that are ever loaded into the triple store in the prototype are OWL files. As long as you provide a location: a folder on a disk and an OWL file to the tdbloader script, you will make a triple store. The graph argument is optional. Omitting it will commit the file in the default graph.

7 Joseki (SPARQL Endpoint)

The endpoint used is an unmodified Joseki servlet, as can be found on . Joseki is able to connect to the TDB Triple Store and perform queries using an enhanced version of SPARQL through the ARQ library. Both Joseki and ARQ are components of the aforementioned Jena library.

Configuration and Connection to the triple store

The configuration files for Joseki are thoroughly documented, and provide several examples. Instead of reproducing the whole configuration file, only the segments pertaining to the Triple Store is included.

rdf:type joseki:Service ;

rdfs:label "SPARQL" ;

joseki:serviceRef "sparql" ; # web.xml must route this name to Joseki

# dataset part

joseki:dataset ;

# Service part.

# This processor will not allow either the protocol,

# nor the query, to specify the dataset.

joseki:processor joseki:ProcessorSPARQL;

.

[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" . ## Initialize TDB.

tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .

tdb:GraphTDB rdfs:subClassOf ja:Model .

#Tripelstore

a tdb:DatasetTDB ; tdb:location "D:\\TDBdata\\JORD" ;

#makes the default graph a union of all graphs on the endpoint

tdb:unionDefaultGraph true ;

.

8 The SPARQL HTML frontend

In the Joseki package there is also a web page frontend to compose SPARQL queries. A query to a SPARQL endpoint is an http GET request with a parameter “query” containing the query. For example:

*+%7B%3Fsubject+%3Fpredicate+%3Fobject+.%7D+LIMIT+10

The only thing the HTML frontend needs to do is take a query formatted as text and submit it to the endpoint. The endpoint servlet then takes care of the rest.

While the Joseki frontend is adequate, it is not very forgiving for the novice user. In order to remedy this, we rewrote the frontend and added the possibility to store and retrieve pre-written queries on the server. This was achieved using JavaScript. For simplicity and reusability, this piece of code was moved to its own file, and could be used by other projects as well. However, it was originally thought of as a feature for a single page, and contains many adaptations to exactly fit its use-case. Any subsequent use in other projects may work, but will likely require individual adaptations.

The JavaScript program, christened QueryHelper, reads a JSON formatted file with a single variable containing all the queries as a hash map. The format of the JSON file is pretty straightforward, but may be hard on the eyes if you are new to JSON. An example of a query definition is included below:

'classByDesignation' : {

name: 'Class by class designation',

query: 'select ?class ?classDesignation { \n \

?class RDL:hasDesignation ?classDesignation . \n \

FILTER (fn:contains(?classDesignation, fn:upper-case($designation)))’ \n \

}

ORDER BY ?classDesignation',

inputFields:

[

{

name: 'String contained in designation',

varName:'designation',

isString:'true',

description:'A string contained in the designation of the class

you are after'

}

]

}

A query has three parameters:

1. name: The query name

2. query: The query itself. Note that \n \ has to be placed at the end of each line, in order to make a new line, and to tell the JavaScript parser that the string is continuing on the next page, respectively.

3. inputFields (Optional): The last parameter is an array of input fields. These are variables the user can substitute in the query. An input field has four parameters

1. name: The name of the input field shown to the user

2. varName: The name the variable to be substituted. In the query this variable must be prefixed with a dollar sign ($). The script will replace any and all variables with the value the user provides.

3. isString: Tells the parser whether the variable is a string or an URL. Strings will be enclosed in apostrophes, URLs in brackets ().

4. description: A description of what the variable is intended for.

Upon page load, the QueryHelper script draws a dropdown box of all the queries in the query file. When a query is selected, it creates input fields for all the variables for the selected query, and adds names and descriptions to these boxes. If no input fields are available for the query, the query-field is filled out with the same. If there are variables, the query-field is filled out as the values are typed into their respective input boxes.

1 Queries

The queries on the endpoint as of this writing are the following:

• Query to find a class by its PCA ID

• Class and subclass by class designation

Finds all classes that contain the given designation, and all their subclasses.

• Class by class designation

• A test query.

Finds the first 10 triples on the endpoint

• Find classes with a specific "R" ID

2 Technical Summary

The front-end page is a simple HTML page with a fairly simple java script component. Aside from ensuring that the right endpoint is receiving the query, and adding new pre-written queries, there is not much to configure.

9 Pubby (Linked Data Server)

Pubby is an open source web servlet which takes care of a key element in the Linked Data method of publishing data; dereferencing URIs. It is written by Richard Cyganiak, who is actively involved in the LD initiative of W3C.

1 The reason we need this software

We want our URIs to be dereferenceable, because we want people who stumble over a class identifier, such as (PUMP) to be able to know what it is. Moreover, we want to claim the right to dictate what the resource is supposed to mean. JORD, through PCA, owns the domain name , and therefore PCA has the sole right to define resources on that domain.

But the resource in question is a class, an abstraction of PUMP. Because it is abstract, nothing we produce can “be” PUMP. Instead we chose to present a description of the cup, or PUMP. What amounts to a good description is a matter of preference. The average human user would want a nicely readable web page. A computer program would like data in a parseable format. Pubby uses content negotiation to discern what the requester wants, and redirects the request to its proper servlet. The two sub-servlets used in Pubby are /data/ and /page/, serving raw RDF and human readable HTML, respectively.

The end user does not have to be concerned with this. By default a web browser will ask for text or HTML, so a human readable page will be the default mode for all non-technical users. Should the user require the raw RDF, there are links on the bottom of the pages to various RDF formats for the current resource.

2 How Pubby works

Every HTTP request contains a header; it is what tells the web server which browser you are using, what your IP is, and a number of different things. Among them is what type of content it expects. A normal web browser, e.g. Internet Explorer, expects HTML as text. An LD browser would expect RDF.

Upon receiving a request, Pubby does three things:

1. Find out whether it should return RDF or HTML response, and redirect to the correct Pubby servlet accordingly

2. Retrieve data for the relevant resource

3. If an HTML response is required, display these data in a human readable way.

Pubby relies on a SPARQL endpoint as an external resource. This endpoint may have any address, as long as it is reachable from the server Pubby is installed on. As it happens, the SPARQL server and the Pubby server are one and the same, but this is not required.

[pic]

2 Diagram showing how requests are retrieved and forwared to the endpoint.

The vanilla version of Pubby performs a so-called DESCRIBE query on the endpoint. This query is poorly defined in the W3C standard, but the common interpretation is to return an RDF graph containing all triples where the resource in question is the subject. This is equivalent to the CONSTRUCT query:

CONSTRUCT { ?p ?q} WHERE { ?p ?q }

DNV thought this was too narrow a query in many instances, as it leaves out important information such as subclasses and resources linked to the subject, among other things. We have therefore modified the Pubby code to allow for type-sensitive queries.

3 Type sensitive Pubby

By “type-sensitive” we mean that Pubby is able to refine its query based on what type of resource it is giving information about. For instance, this makes it possible to get all members of a class if and only if the resource is a class, and get geolocation information if a resource contains such information.

For the JORD case, we are interested in only one type, the type of the PCA RDL item. That is: all resources with a ISO 15926 Part 2 type. We still require more than the basic DESCRIBE query however, as much of the information in the triple store is in reified statements[1]. Where a subclass in OWL would read:

:subclass owl:subClassOf :superclass

In the reified ISO 15926 format in RDF, this would be like this:

:cls p2:hasSuperclass :superclass .

:cls p2:hasSubclass :subclass .

Note that neither the subclass nor the superclass is a subject in these triples, so a normal DESCRIBE query would not catch this information from any angle.

The modified DESCRIBE query we are asking instead is the following, where $subject is the current resource.

DESCRIBE $subject ?hasObject ?R ?cls2 ?cls

?cls3 ?cls1 ?subclass ?classifier ?classified ?superclass

WHERE {

$subject ?objProp ?hasObject .

OPTIONAL { # R numbers

?R owl:sameAs $subject .

}

OPTIONAL { # classifiers

?cls p2:hasClassifier ?classifier .

?cls p2:hasClassified $subject .

}

OPTIONAL { # members

?cls1 p2:hasClassifier $subject .

?cls1 p2:hasClassified ?classified .

}

OPTIONAL { # superclasses

?cls2 p2:hasSubclass $subject .

?cls2 p2:hasSuperclass ?subclass .

}

OPTIONAL { # subclasses

?cls3 p2:hasSuperclass $subject .

?cls3 p2:hasSubclass ?superclass .

}

}

The resulting RDF graph is significantly larger. You can take a look at the PUMP description in N3 format here:

10 Generating the RDS-WIP to PCA Map

A script to create the RDS-WIP map was written during the project. This was technically not a part of the delivery, and the script will probably need a proper examination before using it on a regular basis. It was written in Bash but makes use of a number of programs.

The script works by asking for a number of answers to the following query to the IRING endpoint at

PREFIX rds:

SELECT ?rNumber ?rdsId

WHERE

{ ?rNumber rds:hasIdPCA ?rdsId }

ORDER BY ?rNumber

OFFSET ${offset}

LIMIT ${query_limit}

This query provides a table with two columns in XML format: the IDS-ADI ID, and the PCA ID. The query limit decides how many answers we want for each query, and the offset is incremented for each query. We wanted to ask incrementally so as to not overload the RDS WIP server.

Each iteration the returned XML is parsed so each row of the table is turned to this:

owl:sameAs .

The line is then appended to the final file, and this is repeated for all rows, and for all query sets.

We end up with an OWL file in TTL format consisting of a header with the required PREFIX statements, and one line for each R number that has a PCA ID. The file is then placed in the same folder as the PCA library file, and is imported alongside it.

conclusions

The solution created in the prototype project satisfies the agreement between DNV and PCA. It is now possible to use SPARQL to query for PCA RDL items, and it is possible for RDS WIP users to use R# to query the same data.

2.

1 Joseki memory leak

As the usage of the endpoint has increased, some weaknesses have been exposed in the Joseki software that we were previously unaware of. Apparently a memory leak is causing the program to malfunction after an unknown number of queries. DNV is investigating how this can be remedied. The likely solution is to replace the offending software different software with the same specifications. [Note : 18Jan2012 – This problem appears to have been fixed by correctly setting the memory allocation – general use, evaluation and stress-testing since has not been able to replicate the memory leak malfunction.]

2 Next steps

The next step according to the JORD Phase F document is to evaluate the prototype and gather feedback from the users. To ensure that the next production value endpoint from JORD will be as complete as possible, the feedback from the users should be used to create a set of use-cases. These should give requirements for topics such as.

• Metadata and metadata management

• Uptime requirements

• Expected query complexity

• Expected concurrent users / users per day

• Procedures for RDL item creation

• Procedures for RDL item migration to (and from) the JORD repository

• Possibly: review the OWL ontology used as a basis for RDF serialization of the PCA RDL.

• Possibly: have the PCA RDL in multiple RDF serializations.

The end point now established is a prototype. This gives JORD the advantage of having a system they can modify and improve easily, without the large overhead associated with modifying production systems. That being said, these improvements should probably be small tweaks rather than extensive modifications until the use-cases mentioned above are established.

By the same reasoning, the prototype endpoint should probably not be moved to a different server any time soon.

References

3.

1 Software download locations

1 Pubby

Download and installation information.



1 Pubby configuration file

There is a lengthy discussion of what the different parts of the configuration file means on the page linked above.

The Pubby configuration file used in the project.

a conf:Configuration;

conf:projectName "JORD";

conf:projectHomepage ;

conf:webBase ;

conf:usePrefixesFrom ;

conf:defaultLanguage "en";

conf:dataset [

conf:sparqlEndpoint ;

conf:datasetBase ;

];

.

Remember to also include relevant namespace declarations.

2 Apache HTTP server

Project home page: The prototype uses version 2.0.

3 Apache Tomcat

Project homepage: . The prototype uses version 7.0.21

4 Cygwin

Cygwin can be downloaded from here: .

It is installed by downloading the “setup.exe” file, and running that file. It is during this installation that you can opt to include specific GNU/Linux programs, such as wget and unzip.

5 Joseki

Joseki can be downloaded from:

6 TDB

TDB can be downloaded from

1. End Point User guide

The endpoint is located at , and has an HTML front-end at . The casual user will likely run into this page first, and much of the following instructions are taken from that page.

At the end-point front end you may select a query by hovering over the query list and left clicking on a query. Alternatively you can write your own. The pre-written queries also include a list of relevant namespaces. The "test query without arguments" is therefore a good starting point, even if you want to write your own query from scratch.

When the query is ready, press the “Get Results” button.

You may have the result in several different formats. If you choose XML (default), an XSLT transformation will transform the result to readable HTML. The result is still XML, so a "view source" on the result page will give you the results in raw XML. This XSLT transformation will also make URI's into HTML links. In most cases you can click on these links and be taken to a page with a description of the selected resource.

2. Linked Data User Guide

When you press on a resource with an URI starting with you will be taken to a Linked Data (LD) page. This is a description of the resource behind the URI you just pressed. While it is rendered in human readable HTML, what lies behind is a small excerpt from the triple store describing the resource in RDF terms. The designation of the RDL item is in the header of the page, and can also be found in the “Properties” group.

To view this small RDF graph, you can either press the triple-icon on the far top right of the page, or press the “as N3” or “as RDF/XML” on the bottom of the page for N3 or XML views.

The information presented on the LD pages are grouped in 5 groups.

• Equivalent RDS-WIP class ID

This is just what the label says, the corresponding R number for this class in the RDS-WIP repository. The R number is clickable, and will take you to the RDS-WIP endpoint.

• Types

This shows the entity type of the RDL item you are viewing. These are always ISO 15926-2 entity types

• Properties

This group lists all the RDF properties of the URI. RDF properties in this context are metadata for the RDL item. Among other things, it includes the designation and the definition of the item you are viewing.

• Superclasses

Here you can find all classes that are in a SPECIALIZATION relationship with the current RDL item on the “has subclass” side.

• Subclasses

Here you can find all classes that are in a SPECIALIZATION relationship with the current RDL item on the “has superclass” side.

• Classifiers

Here you can find all classes that is classifying this RDL item. The classes listed are in a CLASSIFICATION relationship with the current RDL item on the “has classified” side.

• Members

Here you can find all members of this class. The members listed are in a CLASSIFICATION relationship with the current RDL item on the “has classifier” side.

All links to resources on the endpoint are clickable, and will take you to a description of the resource that you clicked. It is worth noting, however, that not all links refers to resources on JORD endpoint. In these cases the URIs will resolve most of the time, but this can’t be guaranteed, as they are handled by different servers in other projects or companies.

-----------------------

[1] The whole reason why the ISO 15926 RDF format uses reified statements is too complicated to include in this document. The short version is that in OWL, the lowest level of relational operator is “property”. A property is similar to an ISO 15926 class of relationship. The type “relationship” simply does not exist in RDF. To mimic this, we need to use regular individuals as “relationship” individuals. The two ends of a relationship becomes RDF properties.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download