NODC and NCDDC Information Exchange and FY08 Planning



Enterprise Metadata Workshop

Meeting Minutes

May 12 - 13, 2009

Participants: 2

Fox sets the stage: 2

Fox Responded to Questions: 3

Session 1: Getting Going: Setting the Stage for Success through Familiarization 3

NCDC: Nancy, Chris & Jeff 4

NCDDC: Rost & Stacy 7

NGDC: Ted Habermann 10

NODC: Ken Casey, Don Collins 12

Closing Session Remarks 14

Session 2: Moving Ahead through Commonalties: Defining the Enterprise 15

NCDC: 15

NCDDC: 15

Fox, General Observations: 16

Identifier/Controlled Vocabulary issues – how to tackle this. 16

Commonalities vs. complementary aspects to our operations. 17

Enterprise Elements: Identification of common metadata needs 17

Review of Yesterday/Preliminary Thoughts 23

Session 3: Communicating for success: Crafting an Architecture of Shared Components 24

Open Discussion about Tool Assessment and Catalogs. 24

MERMAid (MM)/GeoNetwork (GN) Comparison w/in Enterprise Function Categories 27

Review and Implementation Plan: Where to go from here? 30

Session 4: Setting out to Succeed: Elements of an implementation plan 33

NEAR TERM Brainstorm: 33

Intermediate Term Brainstorm – 6 to 9 months or early FY10: 35

Framework of an Implementation Plan 36

Responses: 37

Day 1____________________ May 12, 2009

Participants:

NCDC:

- Jeff Arnfield, Data Processing, Soon to be Product Branch.

- Nancy Ritchey, Archive Branch Chief

- Phil Jones, Archive Branch

- Chris Fenimore, Metadata Specialist

NODC:

- Ken Casey, Technical Director

- Don Collins, Metadata, Cataloging Specialist.

- Rost Parsons (NCDDC), Chief Scientist

- Stacy Ladnier (NCDDC), Senior Software Engineer

NGDC:

- Ted Habermann, Metadata, GIS specialist

- Anna Milan, Metadata Specialist, CLASS support

- Rich Fozzard, Supports NMMR

- Kelly Stroker, Data Manager

Facilitator: Chris Fox, Director of NGDC

Note taker: Dan Kowal, Data Administrator, NGDC.

Note: In the notes, people are identified by first name except for Chris Fox who will be identified by last name.

8:00 – 8:30 Visitors arrive at NGDC

8:30 – 8:45 Welcome, Introductions, Logistics Chris Fox

8:45 – 9:00 Review of Agenda, Goals and Objectives for the Workshop Chris Fox

Fox sets the stage:

This meeting was precipitated by the Climate Service transition, moving to ISO, US GEO activities (such as put up different layers from different agencies to address societal issues, etc….) Data center directors have met to see what they can do to improve the data centers. They have asked the question: “How can we be more integrated?” Metadata is the first topic on the list to examine. They are looking to this group to give us a way forward, making good use of resources. One objective is to provide things to the public in a consistent manner.

Fox reviewed the 3 goals.

1) Development of a common/interoperable metadata catalog for the data centers (and extensible to the rest of NOAA)

2) Identification and development of a limited set of tools to manage metadata within the data centers

3) Development of a plan to migrate from the FGDC standard to ISO standard.

Ultimate goal: Come out with a road map for how to get there.

Fox Responded to Questions:

What is the expected time-line for this “Data Center Metadata Enterprise” project?

- No idea. For all of NOAA, very long. For Data Centers (and CLASS) shorter, but nothing defined yet.

What are some of the significant milestones that NESDIS and or the Data Center Directors expect?

- Don’t know. Have to know where we’re going first.

Expectations for available funding/resources for project management AND implementation?

- Directors willing to put resources out there for meeting.

- CDR (Climate Data Records) development funding might be a pot of money to tap.

- Take it out of hide possibly…

- If there’s another team member you need, let them know.

Future face-to-face meetings to keep on track?

- The directors will take direction from this group as to what’s needed.

- Let directors know what makes sense, but try to meet virtually as possible.

Session 1: Getting Going: Setting the Stage for Success through Familiarization

9:00 – 10:00 Data Center Metadata Activities

“The overviews should cover the state of metadata management at each center, the tools used, and any plans for transitioning from FGDC to ISO standards. We can add some other overview topics if the group would like.”

Note: The following notes are not intended to reproduce the entire Power Point Presentations, but only capture key points.

NCDC: Nancy, Chris & Jeff

Presentation (Boulder-Workshop-NCDC-Metadata_08May2009_PJedits.pptx): NCDC Metadata Tools, Techniques, and Tribulations

Introduction:

NCDC Expectations

o Determine where an Enterprise Solution can assist us.

o Determine the gaps.

- Collection Metadata Overview (FGDC, ESRI Profile included – allows for OGC services with hooks into the GPT)

- Resources that access NMMR’s metadata:

o Dataset documentation

o CDO – If there is good information, you can get something from it; not available now.

o OLS

NMMR and Other Issues

o Resources that link back to the NMMR.

▪ jOAI – open archive initiative.

▪ GOSIC – has info on observing systems, but it’s not an NMMR record.

▪ GPT – Geoportal toolkit (ESRI).

• GOAL: refresh a centralized repository that can be harvested to GPT.

▪ Tools that users can use to query the metadata.

o 470 metadata records. 75 – 80 % of all collections.

o A lot of the older datasets do not have records.

o Lots of disparate systems.

o Need to get a handle of our inventory and compare that with what metadata records that exist.

Tool Overview:

JOAI

o Developed by DLESE

o Over 7100 DIF records

o DIF used by Paleoclimate Group.

GOSIC

o Windows server w/ MS ACCESS.

o GOSIC only maintains a link to GCMD

o Access to 300 global datasets.

o Implemented these tools to serve specific communities – originally to serve Univ. of Delaware.

o Working with Scott Ritz at GCMD, a hard process as it’s all manual.

GeoPortal Toolkit (GPT).

o Oracle Spatial.

o 30 FGDC-RSE reocds (also in NMMR)

o Supports 19115 and 19119.

o ESRI product houses OGC compliant metadata records. Manually import into the GPT. You can harvest from the NMMR, but security issues preclude this activity – supports NIDIS and NCS, guts of the Climate Portal.

o History: FGDC asked ESRI to develop GOS – the outcome was a productized toolkit.

GPT Follow-up Discussion:

- Tool serves specific customers.

- Tightly integrated with metadata – uses ArcCatalog to manage.

- Version 9.3. (will increase holdings from 30 to 100)

- CPC will host layers of information.

- The only archival data will be NCDCs. The tool will only manage metadata from NWS and other entities.

- Manual process: develop the metadata individually and then export into different applications. [General issue for all tools.]

- Goal is to have one metadata record in one repository.

- Using this tool for the portal aspect?

- ISO: There’s no conversion: import ISO in, get ISO out.

- Fox: being able to discover services is an important goal

o 19119 is service metadata and GPT has the means to do this.

High-level view of the Metadata Flow.

- Ideally received metadata from data provider.

o For NCS, Phil/Chris developed a document that gets filled out and translated to XML; right now it’s just for data sets at NCDC.

o GPT, NMMR publishes to GCMD and GOS.

o Trying to decide if they want NCS in the NMMR. Create a Node for that.

o Archive Documents refer to Submission Agreements; administrative metadata.

o Users access the 3-4 toolkits.

Station History Metadata:

Presentation: (Boulder-Workshop-NCDC-Metadata_JDA.ppt)

- Metadata Intro.

o Metadata usually meant Station History.

o They don’t often get ancillary metadata like station histories.

o Time element for collecting info; not an appealing activity, so hard to gain interest.

o Standards issues.

o Identifier issues.

- Vend Diagram of where metadata exists:

o Observing systems

o Station histories

o Inventories

o Granules metadata.

o Collection: datasets and products.

o Standards.

▪ Standards do not address all areas; variations in how standards are implemented.

- ISO. Jeff discussed some recent meetings about how to use ISO for station histories.

o Fox informed group about IEEE standards to ISO TC211, Geographic information/Geomatics. A special committee is looking at the evolution of ISO. If there’s something that this group needs to convey to Fox about ISO, please pass it on.

- Station Situation

o Importance of surface networks doing cal/val for satellites.

o Change history important.

o Can determine fitness of a station for observing particular phenomena based on its descriptions.

o Station history retention issues.

o Agreement what a station is.

o Station IDs not managed consistently. Variety of IDs used.

- MMS description. (Multi-network Metadata System.)

o Allow users to enter info

o Do QC

o Standardized ingest into a central archive.

o Provide a single point of access.

o If everyone uses a different abstraction, gets hard to manage.

o Sources and Features.

o Carved things up into subject areas.

o NWS story. Oklahoma Mesonet. Who owns the entity?

o Architecture and function overview.

o 45,000 stations at present.

o Screen capture overview:

▪ Mapping

▪ Drill down features.

- ISIS.

o Nobody knows what the requirements are.

o No abstraction at all.

o Define what you want to track.

o High level workflow. – CRN example

▪ Colloquial terminology.

▪ Bring in additional networks.

o Schema Overview.

o NDBC/DART Stations – NGDC tried to do it, but NDBC decided to do their own thing.

Questions:

Fox: How would ISIS be represented as a record in the NMMR?

Jeff: Summary of day, a logic that makes sense to the user. 20k records in the NMMR if done.

MMS is not in FGDC as it’s treated as granular metadata.

NCDDC: Rost & Stacy

Presentation (Metadata_Task_Team_final.ppt): Overview of Metadata Management

Overview

o More of focus towards ecosystems

o Geospatially based.

Perspectives

o External Metadata Services

▪ Ensure quality.

▪ Feed to other repositories, like NODC.

o Long-term Stewardship

o Slightly different mission and focus than the other data centers.

Requirements

o Deal with metadata.

o Administrative Orders.

o Ecosystem Goal Team directives.

o Directives from the 5 State Alliance for Gulf of Mexico

o External User Community Reqs.

Metadata Activities

o Trainings – 42 sessions. (FGDC, Tools – MerMaid and ESRI)

▪ Fisheries, Sanctuaries, NOS, SE Regional Team, States, Universities

▪ If they are trained, they will provide quality metadata.

o Coordinate with data providers. (NOS, FL Wild Community.)

▪ Data Provider might have metadata, or non-standard metadata, or nothing and then NCDDC transitions it into validated metadata records.

o Automated Processing for:

▪ West Coast Obs. – Sanctuaries.

▪ Okeanos Explorer. (OE)

o Management Provisions for:

▪ Gulf Geospatial Assessment – Marine Ecosystems.

▪ Providing tools to users

MERMAid

o 500 user accts.

▪ NOAA: 150 users.

▪ Private individuals, labs, universities

▪ What is the scope of metadata management: Mandate is geospatial coastal focus – 99%

• CSC relationship: very good; pass metadata to them. Kim Owens and NOS Explorer.

o Workflow w/in MERMAid:

▪ Account Management.

▪ Ingest

▪ Validation:

• Automated metadata through XSLT (not w/in MERMAid)

▪ Database is more of a working database

▪ Approval can be from an external entity to NCDDC.

o Publishing (only a few people can do this)

▪ File Repository

▪ Distribution:

▪ MARC record to NOAA Library

▪ Data Ceenters Repositories

▪ IMS Server

▪ Fisheries

▪ PHINS

▪ NMFS

▪ Record Updates. Can remove records and republish.

▪ Reviewed different output types.

▪ There is information in a database that never goes out in a metadata record.

▪ A record to GOS is different than what goes to a Client’s repository.

▪ Biological metadata can be a hurdle.

▪ Affiliations. Can go back to different publishing points.

▪ They manage users but not content.

o SOA (Service Oriented Architecture) Overview

▪ Top boxes are client apps.

• C-SIDE (Aggregation for hazards community)

• Keep web content up to date

• Volunteers can access for samples

▪ Each box relies on a common set of services.

▪ Enterprise Service Bus – info Broker (ESB).

▪ Security Layer in between.

▪ Client apps can be packaged up.

o Process:

▪ OER CIMS – Cruise management systems. OE example; tags locations into the metadata records, instruments.

• Bathy data collection.

• ESB takes this and pre-populates the CIMS.

• User can log into it and add additional info.

• CIMS passes it to ESB.

• Chain of Events: When complete, publish to XML and send it to ESB -> XSLT Service -> FGDC compliant record.

• User can publish it.

• Then published to clearinghouses.

• Who is the manager of this data once it goes out the door? They give PIs accounts to manage it.

• Metadata record:

o Cruise Collection level

o Multimedia. (not sure what this relates too.)

• Metadata comes separately from the data.

• Double publishing is occurring in GOS: MERMAid and NGDC publishing the same record - who is listed as the distributor?

▪ West Coast Obs. System

• Chain of Events: Client sends info to ESB -> XSLT -> FGDC -> ASCII data to NetCDF -> GOS, GCMETADATA, NBII, PHINS, NOS DE; NODC can grab it from an FTP Server.

o Metadata Functions.

▪ List of enterprise services

▪ List of Resources

• Schemas

• Schematrons

• XSL Library

• XPath 2.0

▪ Manage Vocabulary Function and Services Used.

▪ NCDDC Metadata Destinations.

▪ Publish Function Example.

• Step through Services that draw upon schemas and schematrons, records presented in EML format; then goes through a workflow process, calls the publishing service; pull out contact or vocabulary managements service for a layer id to map to a shapefile that stored at one of their IMS sites; XSLT does the conversion to FGDC to send to GOS; IMS affiliation and goes to their IMS services.

▪ Schematrons come into play when the schemas cannot be used to validate a record. Allows for further validation. Conditional examples. Offers Rules. Schemas are content checking, but schematrons offer a rule set that check semantics.

Transition to ISO/NAP (North American Profile)

o See NAP removing elements

o NAP provides additional code lists: extend the code list.

o NCDDC remains in a standby mode until the NAP is officially adopted.

o See it as a painful process to transition.

o Aims to provide a tool that supports both FGDC and NAP and a conversion process between the two.

o Support a variety of different profiles. Not force them to create FGDC or NAP metadata. For example, users put information in their own standard and then output it into an official standard.

Mermaid 2.0 for FY09: NODC external milestone.

o Want to leverage stuff that’s built somewhere else. Take the following developing approach: Adopt, buy, then build as a last resort.

o Handed out CDs that have documentation to the group.

o Support for NAP/ISO

o User Interface Redesign.

o How big is their development team: 8 – the whole data center.

o Initial MERMAid development based on Zope ().

o MERMAid 2.0 – based on W3C standards and not so much home grown.

o Where in the development cycle and status?

▪ 1 month: user interface available.

▪ Underneath development (services) ongoing.

▪ ESB – a lot is in place.

▪ Code based drastically reduced. Call services when you need it.

▪ Validation Service is most requested.

o Services developed in Python, Java, PHP for others to add or augment.

Comments/Discussion:

- Fox: Likes the idea of Data Providers delivering metadata.

- OE Data delivery logistics discussed.

10:15 – 11:30 Data Center Metadata Activities (cont.)

NGDC: Ted Habermann

Presentation (MetadataFoundation.ppt): Metadata Foundation Spectrum

Introductory Comments by Ted:

- Good high quality documentation really supercedes it all.

- The presentation layer is the presentation layer.

- Want a solid foundation that supports everything.

Foundation.

- See slide presentation for complete details.

- Left end of Spectrum (RDBMS): Components, Objects / Xforms (ESB in MERMAid) Micro Interfaces, SemanticWiki, Workflows. NGDC examples:

o Tracking Database

o Rich Inventory

- Right end of Spectrum (File Systems): Composites/Records

▪ Publishing Records. Pull the blob from the db, validate and publish.

▪ Harvest to GOS, GCMD.

▪ Links to metadata on website.

▪ Really a cache of the most recent publish records.

- Points in between

o Database with Built-in XML capabilities. (NMMR) – was implemented by Blue Angel Techs and migrated to Oracle XDB; Completely programmable state engine.

o XML Blobs (w/ some fields)

o SNAAP (Simple NOAA Archive Access Portal) using open source eXist DB - NCDDC is looking at this as well.

- Database Implementation

o NGDC does a lot of database development in Oracle/Mysql.

o If you have 100k records flat files, it’s hard to manage, but in a database it’s not.

o Satellite Product End to End Data System (SPEEDS): harvest information from database -> export to NMMR -> XSLT to publish for different presentations. NGDC has extraction tools to do this.

Future:

- GeoNetwork ()

o XML Blobs (with some fields).

o Where does MERMAid fit?

▪ Object oriented database, but 2.0 will have the XML-related DB.

o Trying to migrate away from file systems and moving towards the component side of the spectrum. eXist is more with XML/RDBMS with some fields. However, the SNAAP piece is more towards the file system.

Partners:

- “What’s used” in the enterprise is one part of the equation, but the other one is partnerships.

- Record Sets in the NMMR – controllable collections of records – reflect the partnerships with entities both inside and outside of NGDC.

o NOS hydro surveys is the largest collection.

o NESDIS Products – go to CLASS

o STP – Manage records in NMMR, but publish in their system.

o CoRIS example discussed outside the NGDC NMMR.

o Ted defined the stats. Mostly collections of things. The surveys are different. The 20k published total is a mixture of both.

o Unpublished, used for internal management. The MGG geology example given. One collection rec. is in NGDC Record Set.

- Expertise:

o Working with data providers; expose them to FGDC RSE to document data sets. Multibeam, Seismic examples – ISO implementation. GOES-R profile with the GOES Program Office; NOS CO-OPs (tide gauge stations, 1 minute data) and Hydrographic Surveys. SAWG – Submission Agreement Working Group. Input from CLASS, NOSA, IOOS…

- Technology

- Training. How to do training for ISO Standard will be important.

Questions/Discussion:

- CoRIS NMMR Instance at NODC

o contains ~5000 records.

o Mostly the same technology of NGDC instance, but not the same level of support for IT. NODC is talking with NCDDC to migrate to MERMAid for discovery and access functions.

- Maybe CLASS should hold the main record – all metadata records reside here in one directory.

o It’s duplicated – not sure what this means, but entrenched in the File system part of the spectrum.

- Controlled Vocabularies. How do you name platforms; the standards support multiple vocabularies.

NODC: Ken Casey, Don Collins

Presentation (NODC_DataCenterEnterpriseMetadataSystem_v1.0.ppt): NODC Archive Management System Overview and Perspectives on a Data Center Enterprise Metadata System

NODC Process

- Met regularly, cross-center team of 15 people past 6 weeks, discussed tools workflows, metadata systems to formulate this presentation.

OAIS Review

- Archive Accepts the responsibilities.

NODC Activities

- Take sufficient control of the data.

- Discovery and access.

o Reaction Discussion: Why provide access to somebody else’s data – not archived at the data center? NCDDC example given to fulfill that. Approach the portal idea very carefully in terms of giving the impression of where the data is archived.

o CoRIS example. Some data is in the archive; some is not.

o CoRIS and NCDDC Relate?

▪ Feeders to the Archive description in slide.

- Would like to tie into other systems that have data that’s not necessarily stored at the archive.

NODC Archive Management System.

- Covered the main mission of the data center.

- ATDB is a core piece.

- CoRIS uses the NMMR, but waiting to see what happens from this meeting.

- Overview – see slide for better understanding:

o Producer -> acquire and ingest

o Approval -> Manage -> archival storage processes and publish ->

▪ OpenDap

▪ Further product generation systems: they have their own interfaces – could feed into other products that go into the archive.

o Consumer.

- Acquisition:

o Either they collect data or get requests. Have Submission Information Forms.

o Use NOAA Approval Procedure – > Y/N.

o Transfer Data logistics.

- Ingest Process.

o Flowchart presented.

o Conical Form of how data is stored. Sometimes data is translated into a new format – different representation.

o What if you can’t read the data? Go back to the data provider is possible. Cursory level of investigation of whether or not they can proceed.

o Need to still work more on the QA process.

o Publishing doesn’t necessary mean publish to GOS, but through their own process.

o Their system doesn’t rely on a specific standard.

o Data Officers. Don is in charge of them. They do the approval. It’s a distributive input mechanism of contribution – any number of people work on these records and have the officers approve it.

o Automated approval is different. All of their systems follow the same workflow.

o Start with a SIP and end with AIP. Although drawn like that, there could be multiple AIPS from the SIP or vice versa.

o Everything that comes into NODC goes through this system.

o Started 10 years ago.

o Have some scientists who can provide some additional QC – like the Ocean Atlas. Trying to get them to input more metadata into the system.

o Agnostic to data type.

o Try to engage experts:

▪ Fish Count example – establish a strong relationship with the producer as a consultant to help the Data Center steward the data. Jason-2 is another example where NODC works closely with STAR as the subject matter expert.

- Metadata Collected.

o The collection-level metadata captured via the ATDB is mostly thin. There is a tradeoff here.

o From ATDB, they can publish a minimally compliant metadata record. Only started to look into this; but GOS is not interested in the 1000s of granular records. Could be published with the AIP and the WAF.

Summary Thoughts:

- FGDC Record – more robust development needed.

- ATDB is mostly the OAIS Data Management of Data – mostly administrative info. – link more to descriptive elements in FGDC.

- Like the “Lots in Common” scenario as the approach to take with the other data centers.

- Not looking for an inclusive system; include more possible ways to edit metadata.

- Enterprise Function Slide – Ken did quick overview of categories:

o Metadata Manipulation functions

o Management/Admin Functions

o System-wide or Cross-cutting Functions/Requirements.

o Future Review Spreadsheet Categories – Boldface type indicates ones that should be part of the Enterprise System

- Want a common NOAA Data Catalog. Would like all of these functions to support this catalog.

Closing Session Remarks

Fox: Generalized about the commonalities observed in each presentation, but also found that many metadata management strategies unique to an individual Data Center were complementary to the others.

Session 2: Moving Ahead through Commonalties: Defining the Enterprise

Recap Morning Discussion, Set Stage for Requirements

Note: Observations from Fox with inline comments from others

NCDC:

- You have multiple copies of records; would consolidate into one catalog.

- Links needed to the common catalog.

- Jeff: Does ISO solve problems for you? Haven’t looked closely at it. Wondering how long to take it. Low-level detail is represented in SensorML, but how to blend it with ISO is the question. It’s more of the content we are concerned about, not the method of transfer. ISIS is parameters and events that affect the attributes. It’s really a database of time period of constant behavior.

- Ted: FGDC has only one extent for time. ISO has multiple. ISIS looks like a powerful tool to work with ISO.

- As new requirements come up, you can add to ISIS. More flexible than MMS. The Rich Inventory (granules) has a same model as ISIS. Very compatible with ISO.

- Were going to go to ISO. When you have a need, modify it. Look at a Station History and turn it into ISO metadata rec

NCDDC:

o Tools aside, seems to be overlaps with other tools like NMMR. But differences, interactions with data providers. MERMAid relies on having a repository to hand it off to. Management/Generation tool – capable of doing large transformations.

o Looks like NODC/CoRIS is evaluating it to move to it as BAT is going away. Plus, looking at it as a management tool that goes beyond the ATDB capability. The goal is to create richer and more descriptive metadata.

o Where MERMAid has been, where is it headed. Manages content associated with XML schema; doesn’t matter what the underlying content standard is. As long as you have the schema, you’re good. NMMR and MERMAid may have more overlap than you think. GeoNetwork is another example.

o MERMAid looks useful (possibly for NCDC), but NMMR has provided functionality.

o Ted: It’s a conceptual framework as SOS – not sure if this was supposed to be SOA or Software as a Service. MERMAid is about to do a large refactoring development effort. All four Data Centers have big refactoring things that have to be done in the near future. It’s not the question of what you like, but how to integrate the development efforts. Example, going to build a XML Schema or XSLT? Figure out how to share it. The real crux of the question is components and who is going to build what? It’s all standards agnostic. We (NGDC) like ISO because it’s broad. But there are other standards, like EML that provides more.

o Tracking System at NGDC is one example that builds upon a standard.

o How important is it that we work on one tool or not? Depends on situation. NCDDC is working on a diverse cliental and use cases. It does allow compatibility (XML schemas)

o Need to define the common elements for the enterprise. Concerned about launching into the tools discussion too soon.

Fox, General Observations:

- Encouraging: enthusiasm and expertise in the group.

- Discouraging – dichotomy between center operations. There are significant differences in approaches.

- NODC Process is so different. Flowchart is well defined. From NG/NC perspective, if it would work, we would staff it differently.

- Don: The reason NO does it differently because they were criticized about not giving back exactly what they received. The data officers evolved from that. This incited a larger discussion about preserving original data even if it’s crap.

Identifier/Controlled Vocabulary issues – how to tackle this.

- MERMAid has surveys (cruiseIDs), SIMS – whatever the data manager decides to put in. The vocabulary management service can’t just be a static list of words; need to define the vocabulary.

- There is a standard list for ship names.

- Stations are called different things: Keep the crap (Jeff), crosswalk to show the best. NCDC aims for a standard abbreviation. Jeff cites the authority. Requires interactions between people at the data centers.

- There are at least 3 or 4 projects involved with keywords.

o MMI (tools like VINES - - are ontology matchup tools); can crosswalk two ontologies: NGDC using it with some seismic and multibeam.

o European

o GCMD – has 7 different keyword lists

o CF

- How do we utilize all of these keyword lists?

- We are not going to be the source of the keyword lists.

- Minimum price of entry should use the same terms if using a common catalog.

- There’s a lot of grunt work. There’s no magic tool that will entirely meet this.

- RSE example from NCDC. This parameter is this, here’s where I find it.

- One task that should be in our plan is controlling the keywords.

- NCDDC makes use of a gazetteer.

Commonalities vs. complementary aspects to our operations.

- Seen very few features that no one is saying that they don’t like it. Seems like we appreciate what each center has to offer.

- GeoNetwork and MERMAid are development models

o Open source vs a development project using some set of standards and a smaller group of developers. Other differences are deltas. Is it important what tool you use? NCDDC has priorities. Take what’s free, if not, buy it, or build it. The cost is in the same order. That’s the main difference with the two models.

o Stacy: We just don’t do Metadata. It’s just a web-based web editor that draws upon services.

o GeoNetwork draws upon lots of open source projects – 12 different ones.

o Take Apache for example, an open source project. No one is writing code here to support this. GeoNetwork is more of a portal, does some editing, but doesn’t meet all of our needs. There’s a huge hole that needs to be filled. It’s mostly servicing small collections. There’s a hole with U.S. leadership and geospatial data. Leaders are in Australia and Europe. GeoNetwork would like some US leadership. It would be an active development effort, contributing effort.

- We need organizational commitment to provide tools to create metadata. What are the resource issues?

Enterprise Elements: Identification of common metadata needs

Review the Enterprise Functions Slide

- Goal:

o Make sure that all parties are comfortable with the representation of functions.

o Review a concise description of each function.

- Support Queries:

o Needs to be fleshed out more. Use Case differences for data managers and outside consumers. “Search” might be better.

o Don: 3rd type of search is for management: How many things were accomplished – backend search; a metric to be reported to management. Tracking systems are used for this.

- Components and Vocabularies:

o Break into two boxes.

o Component means: (probably should be a metadata manipulation function or may need to be in both).

- Link to Archival Storage:

o NODC example: ATDB Triggers an archive storage task.

o A link to where the data sits; spawning different processing is an interesting thing. Linking metadata records to AIPs in Archival Storage.

o ESB (MERMAid) – has listeners and triggers a process.

Review Each Function Description

Note:

- See Requirement Descriptions in DataCenterEnterpriseMetadataSystem_v1.0.ppt for a complete description.

- Try to generalize the functions to meet everyone’s definition/understanding.

- The group accepted all Functional Requirement descriptions. Some needed modification, and the notes below capture some of the issues discussed along the way.

- Import – no discussion.

- Convert

o In a lossless way? Convert in a “lossless” a way as possible.

o It’s a functionality that’s necessary. Rich disagreed – Can’t import/export encompass this?

o Doing it as Components – simplest and most basic level – is easiest to deal with.

o If somebody has something new, but standard doesn’t accommodate it, what do we do? Take a similar approach to catalog information.

o Jeff: we agree on a Best Practice on how to handle this situation. It’s a process, or really a Governance function.

o Training relevance. And need to support this to make the Enterprise solution to work.

o Entities and Attributes issue with ISO. Anna: Don’t worry. ISO asks if this information is in a Feature Catalog and make a reference to it.

o FGDC -> ISO. Will need metadata systems that allows you to add some stuff on. How do you get back to FGDC when asked or other formats once in ISO? NGDC trying to deal with this.

o Identifying elements in ISO. EML – people will take part of the FGDC standard and add them in.

- Export

o Non-ISO 19139 XML. ESRI created an earlier version.

o Preconfigured vs. custom views.

o Where does a REST Interface go? Each one of these components has an API associated with it – should get mentioned explicitly. Could be put on every slide.

- Validate

o Integrated across systems.

o Rost: would like more of an inline validation as opposed to waiting at the end like a spell checking. Validate all along.

o For different organizations, want to capture custom validation.

o Fuzzy validation. Warning level validation.

o Heavily nested sets: biological records and entities/attributes: go down many levels. Validation processes are not able to handle these situations. Schema validation could do this; but some validation may fail going down to the second level.

o Validate with or without components. Like validation of citation records. “Incomplete Records and those without components.”

- Publish

o Must support ArcIMS Metadata Service. Use ARCIMS: Like supporting the Portal Toolkit. Import it into a system.

o If external system can do it – not sure what this means.

o Change detection: difficult to do in Components. Could be complicated.

o Added Support for DOIs (Durable, or Digital Object Identifiers) for publishing datasets.

- Edit.

o Edits in Multiple Languages?

- Manage Components and Vocabularies.

o Separate Out: Manage controlled vocabularies.

o Manage Components (and CRUD – create, read, update, delete).

- Support Queries.

o Editor (Internal) and Discovery (External - clearinghouse) Search use cases.

o Enables search by fields/spatial/temporal, data center – narrowing search.

o Fuzzy or Precise.

o Support a federated search based on a percentage – support ranking.

o Support queries in a Restful manner – saving the search terms. Some people may want to re-execute a search.

o Sensitivity of Metadata.

o Support CSW -

o Support SRU (Search-Retrieval) via url.

- Manage DM Data

o What is system instrumentation? IT Term

o Pulled from OAIS.

o Where does my metadata get sent? NCDDC requirement.

- Generate Reports.

o Supports client and server-side configurable outputs.

- Control Access

o Do it by a “Tree” level – for a group of records.

o Filtered views on publication

- Manage Workflow

o Includes validation – doesn’t move down the workflow until it’s been validated.

o Linked to accession/datasets/AIPS.

- Link to Archival Storage

o Every Data Center has it’s own native environment; need a unique ID.

o Issue: Multiple data types within a collection, but the granules can be associated with other collections.

o Support many to many.

o Need to maintain the relationship between the data and metadata.

o Issue: Archive storage is not at a NOAA data center.

- Handle Versions.

o Accounting Records. Have a version and it goes through the publishing process and comments added by different folks – history being retained. Was borne from an IT security requirement. MMS does this.

o Impact on components?

o Linked to import (is particular to NODC) – this is a tool for doing CRUD.

o Record on publish.

o Handling data transitions (Issues).

▪ Handling metadata that’s not being maintained at a NOAA data center?

▪ Creating metadata (IOCM) for a data collection effort that hasn’t occurred yet. “Progress Code in ISO” can handle this.

- Minimize Duplicates.

o May require human interaction/ validation.

- Support Human GUI and Machine AAPI Interfaces.

o Added supports RESTful interfaces.

- Support Standards.

o Change NAP to any ISO* profile.

o Easily add additional standards (may require xsd development)

- Support Collections and Granules.

o Ties into the links to Archival Storage.

Discussion:

- Does this capture requirements from other data centers? Yes, is a good first step.

- For the majority of collection records in the NMMR, the migration can happen. (to ISO?)

- Are we clear on our understanding to meet the user requirements, a centralized repository? To come up with a common catalog, to have an enterprise metadata system…are they two separate pieces? The system manages its own metadata records. The catalog is a target to publish to. Should the repository associated with this catalog be a super set or is it a subset? It’s pretty undefined. This idea of the common catalog is the file system part of the spectrum. Repositories that are producing components is much different than the NMMR. CLASS can do file systems pretty well. Will databases that get populated for metadata development be handled by CLASS? Is it a set of capabilities that we’re going to support?

- Common Catalog.

o What does a common catalog really mean?

o Will rely on multiple databases.

o If there’s a good capability that’s well implemented, doesn’t matter where it resides, so long as it provides services to the intended users. We have architectures that can do this now.

o Bottom line is convergence. What can we do right away?

o How do we minimize the resources – the most efficient means to get to that space?

o What is reasonable to generate metadata for? State of the Climate Report example: wouldn’t this be a candidate, how many others like it?

o Metadata that goes with DIPs. – needs to be in there.

- NMMR Components Discussion. See MovingForward.ppt

o NMMR Snapshot.

▪ 22k composite records that contain 203k component records.

▪ Contact DB

▪ Online resources

▪ Citations

▪ Contacts

▪ Platform information

▪ Algorithm Info

▪ Processing Info.

▪ Would like to change to Component Manager Repository.

▪ Similar construct in MERMAid

o EML

▪ Each EML module is designed to describe one logical part of the total metadata – this is a great description of what components are.

▪ Module = components.

▪ Created in the late 1990s before ISO.

▪ Ted showed components.

• Each module has its own schema

• Dependency Diagram. – shows relationships between modules.

• The eml-dataset module, like other modules, may be "referenced" via the tag. This allows a dataset to be described once, and then used as a reference in other locations within the EML document via its ID..

o ISO.

▪ Number of IDs included in Standard – MD_Identifiers. More like RDBMS – shown in the slide.

▪ And References.

▪ PORTS data set – extents. The url has an ID. The Extent has a UUID for the reference or namespace. This data set has different extents or subnets in the Ports Dataset.

▪ Dependency diagram in EML is replaced with UML: MD_Metadata With Scope. It’s the high level container for all of ISO Metadata. Ted showed all of the components associated with them.

▪ MD_Identification is connected to MD_Aggregation - Classes in UML. MD_Format and BrowseGraphic are components.

▪ DataSet Series is the Parent above MD_Metadata.

▪ Ted showed the mapping of components in ISO with EML.

• DQ Lineage (19115-2)

• NCDDC and NGDC could sit down and make a crosswalk.

• Entity/Attribute don’t presently crosswalk very well to ISO; could use EML to do this.

• We need a best practice for mapping project module in ISO.

• When they get EML, translate to FGDC in MERMAid.

• EML –> ISO would be more straightforward to crosswalk.

• There is a XML schema for EML.

o MicroInterfaces.

▪ Edit with a micro-interfaces: has been there from the beginning in the NMMR. Editing little pieces of the metadata record. The target audience never used it. Need to support both Expert and Non-expert users.

▪ Tasks. Chains or workflows of micro-interfaces.

o Metadata development and Workflow Slide

▪ Scientific Questions.

• What they need to do. Metadata Needs and ISO Features. Driving use cases.

• A description of how you do that in ISO.

▪ Metadata Content (Independent of standard).

▪ Standard Implementation/Guidance

• ISO 191*

• OGC Capabilities

• Fgdc w/ extensions

• NASA DIF

• Dublin Core

▪ Presentation

• FAQ XSLT on top of FGDC. Provide access to sections of the metadata.

• Views, stylesheets.

• Contact Info.

• This is querying the ISO.

• ADD Information with a link that goes to an interface to edit record and add info – could use XForms –MERMAid has lots of experience with. Semantic Wiki works in the same way.

o Concluding Thoughts on Development

▪ Ted proposes that component/Xform development could be one place to begin collaboration.

▪ Xlink () – in a new version, they (MERMAid?) will support components with use of Xlink.

▪ One of the strongest tools is MapForce. Jacque Mize from NCDDC will hopefully do a presentation on this to IOOS.

▪ Simple components should be in tables, not necessarily in XML blobs; strainghtforward, easy to index and search.

▪ What are the components to work with – Jacque has done a significant amount of work doing this. Implementing in a RDBMS – how do we link it with GeoNetwork that deals with XML Blobs?

• Stacy: doesn’t necessarily agree with the approach.

• Represent components as native XML – integrate small pieces of XML to represent the full record. Thies really simplifies the process and code maintenance.

• These technical discussions have a large shade of gray. Will need to work on this, experiment and see where information should be stored and managed within the spectrum. Don’t have the answer right now; may butt heads many times figuring this out when we get down to the technical level – need some give and take.

• RDBMS could help solve some of the thorniest issues dealing with components. Blobs of XMLs – native XML Database like eXists.

• We’re in a tiny window that we can work together.

Day 2________ _____________ May 13, 2009

Review of Yesterday/Preliminary Thoughts

Ted: How does the standard affect the kinds of things you want to do?

- DMIT (data management integrated team)

- GEO-IDE Guidelines and best practices wiki.

o Hopefully will evolve as a place for best practices in NOAA

o Metadata Section – IOOS (Jeff, Ted, Anna, Kelly, Jacque)

o Outstanding issues and use cases link:

▪ File Formats and Structures

▪ Disjointed Datasets – PORTS example.

o Does ISO help address when specific requirements need to be met – an objective of this wiki.

o Stay aligned with some of the work done by IOOS; GOES-R is another example

o Encourages folks to review the site, get an account and contribute.

Fox:

- Looked at all of the modules/requirements.

- Everyone seems to be in agreement that good metadata is important and be able to change as standards change.

- There is a spectrum of how to store information

- Need to make sure that we’re doing practical stuff, but not get too focused on some great stuff that might take awhile to achieve.

- Still confused on tools.

Session 3: Communicating for success: Crafting an Architecture of Shared Components

Current Capabilities that Meet Requirements: What components exist that can be shared?

Open Discussion about Tool Assessment and Catalogs.

Note: This was probably the trickiest part of the discussion to capture. Since I didn’t tape, I did my best to capture dialogue albeit cryptically that will hopefully expose some the thinking that went into formulating an preliminary action plan.

Ken: Anna will be able to compile the spreadsheet (FGDC Spreadsheet?). Snapshot of what we have today. Don’t limit ourselves with what current resources we have today:

Rich: What can we do quickly, but what can we do/ prepare for the future.

Ken: Look at what’s available right now and get us on the right path, and offer an opportunity to launch into other arenas not being met.

Anna: confusion over the term “components”.

Ted: 90% covered in existing tools.

Stacy: But how do these systems work together in a shared way.

Rost: What is the path to make these interoperable?

Fox: Have a common interoperable data catalog. Doesn’t like GOS. One possible solution: why can’t we publish to a common catalog in NOAA, like GOS?

Ted: Yes there are problems with GOS, but the underlying assumption is that we can do better.

Fox: GOS is trying to be all things to all people. But if we want to customize it, can’t do it.

Ted: Yes you can. Say we’re going to do a GOS for NOAA, how much $ will go towards it.

Fox: Should we go with GOS?

Ted: Believing that NOAA can build something better?

- Contributing Code issue: Geonetwork vs. GOS.

- Climate Data Portal

Rost: How do you expose climate records that can be available as a common resource? But how much do we take out of hide, showcase and get additional funding?

Kelly: What happens to stuff NCDDC doesn’t manage (archive)? GOS has an IOCM Portal – perhaps another piece just for NOAA.

Rich: NMMR is more of a repository, not a clearinghouse.

Fox: Wants to build a catalog.

Jeff: Having one catalog is tough.

- What is the difference between a catalog and clearinghouse? The latter spans multi agencies; the former is a listing.

Fox: wants to get all of our FGDC metadata in one place.

Ted: Is it outward looking and searchable?

Rich: Have two models: pleading for changes with ESRI (GOS) or GeoNetwork/MERMAid where there’s more influence over code development.

Fox: We have repositories. But include all of NOAA with it.

Ted: We have a bunch of files, presented to GOS, or through WAF – copied to one disk somewhere, then it could be searched by google, first gov.

Fox: MERMAid, you have a catalog?

Rost: MERMAid is not a searchable app, only for metadata generation. Have a semantic search capability for regional ecosystem data assembly part, but operates on a subset of metadata. Internally, for Florida, built a web interface that points to a set of records in a RDBMS – hosted at Florida.

Rich: We have tools via the NMMR. But BAT (Blue Angel Technologies) is done. The current implementation of the NMMR can’t go anywhere.

Ken: We discussed elements of a system. How we go about building it? What we are in need of to build a system to produce better metadata? Need a mechanism of what we can do now, and a development path of how to build the system. Which of the functions are already addressed, and what’s in need of attention?

Fox: Worries that building a system will take a long time. Final statement: different organizations need to work in their own environment, but the collections need to be stored in one place. We’re not going to get more funds. The only thing that is going to sell is for people in NOAA to build a means to access their data.

Ken:

- We all need to row together.

- We need to come together on a starting point.

- Architectural concepts are there.

- Tools for a system to generate good metadata.

Fox: Creation of metadata is local; serves their data centers; but collections need to be published in a common way.

Jeff: Come up with a shared thing, does that mean that there’s no funding to support it – usable and extensible, what can we do to show capabilities, how will it be deemed important, a commitment to the process.

Fox: For purposes of this workshop, assume that resources will be there.

Ken: Re-defining the editing interface of Mermaid?

Rost:

- People don’t necessarily use the editing interfaces for generating metadata.

- Don’t put a software footprint on our users.

- The editing interface is pretty lightweight.

- Server-side, virus checker, validator is more complex to demonstrate.

- Once you get the record, what do you do with it? If it comes in a non structured way, how do you deal with it?

- Developing some of these common functions, schemas, validators (schematrons) – work we can do now; connecting services together (server-side metadata tools)

- What is the low hanging fruit? What are the common denominators – more on the output end. Prioritize, start with the backend, and then have the technical people get together in a follow-up mtg. to work out the details and how to implement.

Fox: How many of the boxes [functional requirements] do our tools address the functions?

Rost: What are “nice to haves,” and what are actually happening?

Fox: Are there any of the functions, that are not being handled?

Ted: NMMR – versioning. Tracking changes. Not addressed at all in GeoNetwork or NMMR. This is extremely difficult.

Rost: More of a security requirement (versioning). Adopting DOIs has a cost associated with it; there are UUIDs that are not.

MERMAid (MM)/GeoNetwork (GN) Comparison w/in Enterprise Function Categories

Metadata Manipulation Functions:

- Yes for both

Management/Admin Functions:

Versioning – “No.” [I think that’s for both?]

Components: [I don’t have a note here. But, think it is “Yes” for MM, “NO” for GN.]

System-Wide or Cross-Cutting Functions/Requirements:

Definition of granules; satellite granules, “NO” for both.

Ken: What is the right framework?

Rich:

- NMMR is a good source of requirements.

- MM is the same in terms of what a meatdata system needs.

Stacy: More services that are exposed RESTfully.

Rost: Components are developed independently.

Rich:

- Services development is a mixed bag.

- Build things independently

- When you build services, sacrifice performance. Mix of codes can be problematic.

Stacy:

- If you build it all yourself, it’s better for dealing with performance. A lot of performance testing has been done.

- Even though we develop most services in-house, in the end, developers come together on how the services talk to one another.

Rich:

- If tightly controlled, put in a Skunkworks, yes it works.

- Collaborating remotely, we would have to do it in a way Open Source Projects are built. Here’s the signature of my method – how to call the services.

Don:

- Do you know in GN, what services you don’t need to build because MM has them so that you don’t need to re-invent; is that the simplest way to assess? And vice versa, what services have you built that MM doesn’t have?

NCDC: hasn’t tried GN.

Rost:

- Function, Writes out and supports Queries. Perhaps that’s the place to start. This is what we want, and then have someone with more capabilities to decide the “how.”

- Some of the Management/Admin Functions can be worked on and decide what to focus on short-term and long-term.

Rich: Ideally, you hire a project manager to ensure things stay on track.

Ted:

- We have two significant efforts that will be taking place in the near future (6 mos.): MM 2.0 and GN.

- Does it move us closer or further away from where we want to be if we work independently?

Anna: Moves us away.

Stacy: Still open to input/requirements. Communication is important to ensure that we don’t go astray. She is the Project Manager for MM. The work is in their charter, and project plans drive their build requirements.

Rost: Things they’re building are multipurpose. They are down resources, too.

Ken: Can the MM services be woven into GN?

Ted: There is an effort afoot in GN to build the same editing interface like in MM. Can we join this effort in GN? How can we use the same interfaces to talk to the underlying substrate?

Ken: We need an Open Source framework for how we communicate and interact; plus we get the benefit of a wider community support.

Ted: There is a proposal afoot to do component development and an editing interface.

Ken: MM has an editing interface and link.

Ted: Could have a few conferences with GN community.

Fox: What does it do to their (MM) development effort?

Stacy: Difference of opinion about the use of GN. Could tap into it and feed. Could also take some things from it as well.

Rost: Premature asking the “how” question.

Fox: Need a Technical Follow-up mtg. (Stacy, Rich, David Sallas (ESB guy), John Relph and Eric Ogata, Jeff suggested that NCDC would have to figured out who to attend, Ted and Ken).

Ted: GeoNetwork, could meet with Simon Pigot – Australia. Discuss Sub Template implementation of Components.

Fox: How do we get MM to interoperate with GN development.

Rost: Throw requirements into the room with the technical people and see what happens – develop a design document from the outcome.

Ken: They need more than just the requirements we fleshed out. It’s not the same thing as a building a widget.

Ted: Starting from square one is not the way to go. We have two mature tools that implement these capabilities. NMMR is going to be abandoned, evaluate and join an international project; understand that we will lose control, but believe the gains are more than the losses. Will the MM group join this or not?

Fox: Not sure the decision is made. We should take advantage of what MM has done.

Ken: MM is a natural fit.

Stacy: The issue of going into the mercy of an Open Source community – can’t direct them to take the path you want them to take. In her experience of 8 years she’s done this, she’s seen this.

Rost: We [at NCDDC] need to develop a business case to get approved. Analysis of the effort and get adopted before they begin. Thinks the management of components is an interesting effort to engage in. That’s where we have the commonalities.

Fox: One of the gaps in MM…it doesn’t allow people to access their system; rely on external catalogs for this purpose.

Ted: The migration to eXist is the same approach we would be doing with Stacy. Like Oracle for it’s spatial capabilities. We at least have 2 person years evaluating GN. Just having a tech. mtg. won’t solve it.

Stacy: She’s participated in Plone and Python projects; some DJango experience, too. She has evaluated GN several times. The implementation of XForms is doing the same technology that she’s familiar with. She’ll take another look at it. We can do some joint-efforts collaborations before then to still work towards the goal.

Ken: What parts of NODC will contribute to this effort is still outstanding. May decide to be loosely affiliated. Whether MM gets on board or not, shouldn’t stop to get us to move forward; don’t get too hung up on it. MM development has some realities, contracts/milestones in place that they have to meet.

Review and Implementation Plan: Where to go from here?

Rost: Priorities; issues for the data centers.

Ken: What’s being addressed with current capabilities.

Anna: Organized at a higher level? View things in terms of Editors/Catalogs.

Rich: Likes the term “authoring” instead of “editors.” Authoring implies more collaboration. Dissemination, perhaps we can rely on GOS, or dump on a disk at CLASS – this is the easiest piece to figure out.

Fox: What’s not being discussed is “How do people use the standard, what goes where?” Lamont example: Where do contacts like Chief Scientist goes, instruments details, community requirements; had to work out a consistent workflow: this is training and best practices.

Anna: That kind of conversation will avoid having to do any type of crosswalking down the stretch.

Fox: Are we following all of the same standards – putting the same info in the same place? This is a data manager issue.

Ted: That’s why training/partnership is important.

Fox: training is interesting, more like negotiating.

Anna: Need a collaborative environment to accomplish this.

Ken: Governance Plan is needed.

Jeff: It would be good ff you at least have a common place to share technology.

Fox likes this idea as an issue to crack.

Kelly: All data centers are doing things differently, could be hard to implement.

Fox: Are we doing things the same?

Rich: With ISO development, it would be even harder.

Don: There are resource issues to keep the metadata practices up in order to meet the minimal requirements.

Anna: Examples speak loud.

Rich:

- Tracking systems, duplication, etc…

- NCDC: digital accession catalog, very rudimentary.

Phil: May want to identify commonalities between the two tracking systems for link to archival storage.

Don: OAIS linkage from SIP to AIP….pointer to individual containers. NODC can’t identify groupings or collections of the containers. GHRIST example. Need to work out internally to NODC of how to manage. How do the other data centers do this?

Fox: Wanted to use the tracking systems to see what the backlog is for multibeam.

Ted: The tracking system uses FGDC Lineage section with some ISO extensions; no connection with the metadata records.

Don: This would be useful for their records. 5 – 10% of NODC collection records have lineage info - the non-automated streams.

NCDC: Doesn’t exactly have a data manager based structure.

Jeff: For insitu data, he is not considered a data manager. Probably the Climate Analysis Branch. The realignment has mixed things up. They use a distributed functional matrix that’s based on historical practices; and not a lot of metadata was created.

NGDC: Has data managers linked to every data set. Their names are on the metatata.

Chris: Contact info is more generic on NCDC metadata.

Don: Vocabularies. Platform table management. ISES is one instance on a way to converge on something. Reconcile vocabularies and nomenclature.

Ken: Transitioning to ISO Country codes, CF conventions – standard variable names. NODC’s thinking: If no international standard, than national. But information was not shared with other data centers when they went with the ISES ship standard.

NGDC uses the GCMD platforms. If NODC is working more closely with entities about platform identifiers and fold that into one thing, it could be problematic.

Ken: When there’s not a necessarily agreed upon way of doing things, the system should support doing things in a standard way.

Rost: What controlled vocabulary is being used? Could be done in a wiki. The most custom set used at NGDC is GCMD. NODC does not use it or is offered in the ATDB. Could do it, but don’t have the resources to do it.

Fox: Will we do it? Can we at least agree to go the same way.

- UNOLS looking at using the ISES Codes. NOAA is probably on that list. Is it properly managed list? NGDC has the problem w/in the MGG division and trying to reconcile ship IDs for the archive. But we should do it in a way that’s compatible with NODC.

- What’s in the vocabulary service in MM. Stacy wants to separate it out – a system w/in itself; they want to tap into either manually add or tap into existing vocabularies like GCMD.

- Add requirements to the system or adding a controlled vocabulary list like Ship Names.

- Communication issue is address by adding this as a requirement.

- Unique ID on a ship by ship basis. It’s on a request basis. WMO also has identifiers for the ships. Pub 47 has a list. If a ship is renamed does it retain the ID? Lots of issues still to be addressed. Governance model when changes are proposed to the ISES list.

Action Item or Issue List thus far:

- Common Practices across data centers.

- Vocabulary Management. In developing this system, controlled vocabulary issues will come up; 1st action – what do we use now? Use the Best Practices Wiki to list. It’s agnostic. It’s a challenge, but is it outside the system. The NERC vocab server has a ton of vocab lists; can you upload? Plus, there’s a lot in MMI. Adopt, adapt, and as a last resort, add. If there are crosswalks available, can that be done across the theme keywords? Already there are examples at MMI. Vocabularies are valuable for discovery. GCMD has a strong hierarchical search. That’s where GOS falls down.

- Rost: List the controlled lists by data center. CMECS controlled list for ecosystems, an example from NCDDC.

- Request what each data center is using for keywords. What they have, what they don’t have and need.

- Jeff and Don discussed that there are also some back controlled vocab list out there –not sure I know what this means?

- Distinguishing between a platform and a platform type – important. Can be very complex.

Resolve the different ways the data centers keep track of Identifiers.

- Linking unique IDs to the archive.

- Digital Object Identifier

o Someone in Germany gives you an ID. Is it worth investing in?

o Seeing it more in data set publishing.

o Each seafloor sample gets a DOI. But this is different?

- We use the reverse URL. And then use an unique internal identifier.

- UUID – programmatic way within software; unique number generated – GN uses this. Could be used within our internal systems.

- Where do we need unique identifiers? Decide where and how to go about it. List it as a requirement as part of the system. For DOI needed for data set publishing, data set credit, that’s a different story. Add this to the requirements as a subclass.

- Shane and ISOIDS – object IDs that are very long.

- Metadata can have any number of unique identifiers.

Session 4: Setting out to Succeed: Elements of an implementation plan

NEAR TERM Brainstorm:

Teds’ 3 things: What can we do with existing resources:

- Using XSLTs onto of NODC WAF. Consolidate into the Big WAF.

- Taking the ATDB and crosswalking to ISO.

- Four transition issues at NODC - ISO

o GHRIST Granules

o Barebones ATDB Records. (short term thing that’s straight forward.)

▪ Translate to ISO easily

▪ Translate to FGDC

o 15% of the more robust ATDB Records

▪ Same category as CoRIS. Basic FGDC. No RSE, but supports NBII and some Shoreline and CoRIS specific extension.

o Look at CoRIS and see about exporting to NGDC NMMR. At some point have to migrate the CoRIS content to ISO. Is more of a research task. BAT software should export a template and see about importing into the NGDC BAT. Could look at the database and see if that approach works.

▪ ISO is the next standard that CoRIS needs to support.

▪ Once the view of the NMMR of ISO is constructed, can make ISO records and import into GN. The transition from FGDC to ISO is a nasty thing. Bruce Wescott has been trying, but it’s not complete because of the entity and attribute issue.

▪ Inventing a database structure that’s based on an ISO construct.

▪ Another option is to import the records from NMMR to GN that complies with FGDC schema.

ISO Activities.

- ISO 19115 and NAP; very little difference.

- NCDDC has been evaluating now, but no translation yet. Biological records are up in the air.

- NGDC has some very full records and will be a nasty translation. No plans to make a complete translation of a full FGDC record to ISO. Just Discovery level, minimal number of elements.

- Keep it simple:

o Develop an ISO view of the ATDB – now feeds the “IWAF” – Files in an ISO format in a WAF: have the FAQ view of the WAFS.

Big WAF.

- Look at the collection level records of NODC and combine them with the rest. GHRIST could be culled down to ~ 30.

- NCDDC – already has a WAF.

o All of NOS

o What’s physically stewarded and including other things we’re responsible for managing.

▪ XSLT views and a 1st Gov. search.

▪ Ken proposed to do it at NODC.

▪ We’ll need an rsync account and work out transmission frequency.

▪ You can filter what the indexers are searching. (FAQ, HTMLs)

▪ Need an IP Address from John Ralph

o In the future, can add some other services to interact with the catalog.

o Achieve by the end Fiscal Year.

Technical Follow-on Meeting

- Need a predefined agenda.

- Objective:

o Decide a “Go” or “No go” as to what functional services can be shared and build a framework that links them together and becomes our metadata system.

o Aim for later on in the summer.

o What kinds of contributions can NCDDC make to the Open Source GN community? Present a business case at NCDDC, do a risk assessment. Some of the XFORM work being done, common technology and share information, get on the GN community mail list to see what’s going on.

- Don’t know how long it will take – to really build an enterprise system.

- Should discuss the way we do the preliminary and critical design reviews.

- Additional things to consider for the meeting:

o Common pieces/approaches in both GN/MM solutions.

▪ Need some pre-meetings to assemble this.

o Team composition – try to limit to 2/data center.

o Outcomes: not that we have all of the solutions, but confident to layout a few options and roadmap. Not to have a design document.

o There are organizational questions, too.

o What’s feasible technically?

o What technologies can be shared?

- Pre-meeting to dos:

o Identifying the components.

▪ Mini-schemas; don’t know how that works with Xforms.

▪ Nice to know if there are things to extend in the ISO Schema.

o XSLT translations of EML to ISO; ISO to FGDC.

Intermediate Term Brainstorm – 6 to 9 months or early FY10:

Station History Collaboration. Look at doing a Pilot. Short term success.

- Need some people with more experience with ISO.

- ISIS metadata using ISO*.

- To accomplish this we need to:

o End goal: either becomes an enterprise solution or not.

o Find out what metadata their tracking. Output information into ISIS. What is the abstraction. Elements that change at certain times.

o It’s metadata that’s not well handled.

o NODC – monthly buoy data – could it apply?

o Related to Time-series metadata. Worth exploring.

o Supporting collections and granules function.

▪ There’s more to the metadata picture than the collection.

▪ Apply an existing tool and see if it works.

▪ RI (Rich Inventory), ISO and ISIS that overlap. Neither RI/ISIS do an ISO Implementation.

▪ ISIS collects information in a hierarchical structure.

o Explore the capabilities of ISIS to integrate ISO/RI into ISIS as an enterprise solution.

o Dan/Jeff will ride herd.

Have an FAQ view on top of ISO.

Enterprise Controlled Vocabulary Strategy.

Framework of an Implementation Plan

Priority actions, next steps, way forward

Anna gave a demo on the 1st Gov. Search.

- Action Item: Rich will change style sheet to list the Data Set Title in the Title tag instead of the data set identifier.

Ken’s Presentation of Near/Mid-term Action Items: Workshop Briefing to the Directors

Presentation: DataCenterEnterpriseMetadataSystem_v1.0.ppt was presented to Fox.

Title:

- The NOAA Data Center Metadata Enterprise: System Functions, Requirements and Next Steps.

Commonality Message.

Enterprise Functions with a link to the Appendix with a breakdown of Functions.

Notes.

- The three data centers already manage a lot of collection level metadata in a common system (NMMR).

- While there is much overlap in term of metadata functional needs, the manner by which the Data Centers currently operate differs both within and across the centers. This is a current state. (risk of adoption).

- Regardless of software, governance, best practices, and consistent approaches to using the standards are issues that must be addressed.

Next Steps (A person’s name is in parentheses if they are overseeing a task area):

Short Term (End of Fy09)

- Create a cross Data Center FGDC catalog

o Web Accessible Folder, WAF, hosted at NODC

o Rsync, XSLTS with FAQ/HTML/TXT views.

o FirstGov/Google Search. (Ken)

- ISO view of the NODC metadata in the ATDB (Don)

- Inventory of vocabularies used across the Data Centers. (Anna)

- Hold face to face technical follow-on meeting this summer:

o Two to four folks from each data center

o Examine from a technical standpoint, the feasibility of a joint term vision as described in the suite of enterprise metadata functions and requirements.

o Output would be options on the path forward toward the long-term vision.

- Hold follow-on meeting of the Enterprise Metadata Team to lay out the development and implementation plans. (Ken)

- Identify all metadata Components (contact list, Distributor, Source, Services…) (Stacy)

Mid-Term (Q2 FY10)

- Adopt Common Vocabularies where possible. Identify the vocabularies that are ripe for cross-center adoption. – SPWx community needs to be considered in this discussion. (Anna)

- ISO Data Center Catalog (Ted)

o WAF, but in ISO

- Expore the overlapping areas of ISO, RI, and ISIS for their ability to handle “time series metadata in our enterprise system. (Jeff)

Long Term:

o Follow Development Plan that leads to the envisioned enterprise metadata system

▪ Resources

▪ Risks

o Follow Implementation Plan that leads to adoption of the system within the data centers

o Common Practices

o Governance

o Training

Responses:

Fox:

- Common Practices for using the standards is not in there. This requires face time that is expensive. Ken says it should come out as part of the implementation plan: training, the governance….

- The management needs to know who is doing what – followed up with people assigned above.

- Climate Portal needs will be a big drain on NCDC – one year PAC funded.

Ted: Service Metadata is not mentioned.

- Get Capabilities response

- 19119 (KML service and a metadata that’s associated with it).

Fox: We should encourage metadata creators to include services in their metadata.

Ted: We have already begun this; it’s done in the FGDC records in the NMMR. Components (data center) and Services registry in GEOSS. We have 30 - 40 records that they have harvested.

Fox: We should try to incorporate them in the metadata, short term, already have a best practice for the FGDC. Data managers would need to do the work.

Ken will send the presentation to Fox who will reformat it and share with the other data center directors.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download