Proposed Template for Key Functions



4A Clinical and Translational Ontology Services

4A.1 Overview

How can we master the ever-increasing flood of biomedical information in a world in which each biological and medical discipline and each hospital system, clinic, government agency and research group uses its own guidelines for terminology and categorization? The discipline which seeks answers to this question is called ‘ontology’. OOntologists in the BTC are seek prtoviding a service to Buffalo clinical scientists develop robust category systems for to allow collection, the description and management of data in ways which will be of benefit to their research, to the dissemination of the results of their work, and to the satisfaction of NIH mandates governing reuse, information, and knowledge of all kinds, through the application of methods derived not only from computer, management and information science, from logic, philosophy and linguistics, but also, and most importantly, from the scientists who are collecting data and formulating and testing hypotheses with their aid. Ontologists in Buffalo are seeking to establish consistent structured representations of given domains of reality that are accessible to both human beings and computers, working specifically to support bioscience and translational research and clinical care in a maximally effective way.

UB is a leading center of research and training in biomedical ontology. It, and is, with Stanford and the Mayo Clinic, one of the three constituent US institutions of the NIH Roadmap U54 National Center for Biomedical Ontology (NCBO) recently renewed by the NHGRI, and a partner of the NCATS-funded CTSAconnect initiative, which is using ontologies to promote integration and discovery of research activities, resources, and clinical expertise across the CTSA Consortium..

The BTC’s Clinical and Translational Ontology Services This Key Function will contribute to the realization of multiple strategic goals of the National CTSA Consortium through the creation and testing of ontology resources designed to improve human health by transforming the environment in which clinical data is collected, shared, and analyzed, translated, and used to provide quality health care.

4A.2 Specific Aims

Aim 1: To develop ontology services for clinical scientists (1) within the framework of three interrelated pilot experiments designed to test our methods, (2) on an ad hoc basis through consulting with specific clinical communities, both inside and outside the BTC.

The three pilot experiments will be in the areas of

i. Electronic Health Record interoperability

ii. neurological disease especially Alzheimer's, MS, and Stroke

iii. cancer pathology  including cancer surgical pathology, and cancer flow cytometry. 

Aim 2: To create an ontology-based Clinical and Translational Data Exchange which will leverage the work performed under Aim 1 to establish an evolving repository of integrated, de-identified Web-based patient data and biological knowledge available for use by members of the National CTSA Consortium and other external institutions and individuals. 

Aim 3: To create, test, and implement an innovative Resource Discovery, Tracking, and Evaluation System building on the ontology-based resource discovery technology of eagle-i, VIVO and CTSAconnect, and contribute this system to the CTSA consortium.

Aim 4: To create a cross-disciplinary on-line Advanced Graduate Certificate Program in Biomedical Ontology directed towards trainee scientists both in Buffalo and in the National CTSA Consortium.

4A.3 Background

Two Buffalo-developed The work described below grows out of a multi-year collaboration between UB ontologists and the Gene Ontology (GO) Consortium(1). Of all the attempts to use ontology-based technology to advance the goals of clinical and translational science, the Gene Ontology is by far the most successful. This success turns, we believe, on the fact that the GO is created and maintained by biologists and is in consequence widely used in the description of many different sorts of biological data. Where terms from the GO are used to describe given bodies of data, these data become linked, in a growing network, to other bodies of data tagged or annotated with the same ontology terms (2). This brings immediate benefits to search, data mining, and aggregation, and integration; increasingly it is allowing also intelligent querying and algorithmic reasoning (3) and also new kinds of distributed research within the framework of the Semantic Web (4, 5). Its utility to clinical research draws on the multiple ways in which GO annotation information can be applied statistically to elucidate how gene products functionally interact to drive both normal biological phenomena and associated pathologies. (6)

Through their work with the developers of the GO, UB researchers can claim to have brought about a transformation in logical sophistication of bio-ontology research and to have created and disseminated techniques for evidence-based ontology development, application, and evaluation which have been adopted by public and private research groups and organizations throughout the world.

Our collaboration with the GO led to the creation of the ontology resources – the Basic Formal Ontology and the Relation Ontology (7), (8) – along with the associated methodology for ontology-driven data analysis, are, now providing a common integrating architecture for being used by more than 100 biomedical ontology projects throughout the world, including major initiatives such as the harmonization of SNOMED CT and ICD 10/11 (11-13), openEHR, the Open Biomedical Ontologies Foundary, and the Neuroscience Information Framework (NIF) Standard. We lead the dissemination efforts of the NCBO Where, on the hitherto dominant approach, multiple ontologies for each domain were created in ad hoc ways by multiple groups of researchers, the increasing acceptance of this common architecture enables scientists to create ontologies in a coordinated fashion in such a way that the data annotated in their terms is automatically linked together (9). Our collaboration with the GO led also to the creation of the OBO (Open Biomedical Ontologies) Foundry (10), which is seeking to create an evolving set of semantically interoperable ontologies, with the GO at its core, to support information-driven research across the entire domain of the life sciences.

UB ontologists have played an important role also in scientific research pertaining to health IT, for example through our influence on terminology initiatives such as SNOMED CT (11-13) and the National Cancer Institute Thesaurus (14). We have made contributions to diagnostic decision support (15), to the scientific study of the Electronic Health Record and of health information exchange and standardization (16-22), and to research in medical natural language understanding (23, 24) (see Hogan letter of support). Buffalo ontologists are involved in a range of funded ontology projects, currently including the Cell Ontology (funded as part of the Gene Ontology Consortium by NHGRI) (25, 26), the Protein Ontology (funded by NIGMS) (27), the Infectious Disease Ontology (funded by NIAID) (28), the Plant Ontology (funded by the NSF) and the Biometrics Ontology (funded by the Department of Defense).

4A.4 Key Personnel and Staffing

Werner Ceusters, MD, has degrees in medicine, neuropsychiatry, informatics, and knowledge engineering. Currently he is Professor of Psychiatry and Director of UB’s Ontology Research Group. He joined UB in 2006 after having served as Chief Technology Officer and VP for Research of L&C Inc., a company providing ontology-based Natural Language Understanding services to companies such as Eclipsys, First DataBank, Merck and WebMD that was acquired in 2010 by Nuance (owner of the Dragon Naturally Speaking software and world leader in natural language understanding for health information technology). Ceusters has received some is PI of an NLM grant applying Referent Tracking to the evaluation of the SNOMED CT clinical terminology.

• Responsibilities. (30% effort). As Co-Director of the Key Function, Dr. Ceusters will manage the day to day operations with Dr. Smith. Dr. Ceusters will Chair the Executive Committee for this Key Function, lead the Electronic Health Record Semantic Interoperability Project (Aim 1, Pilot 1), collaborate with BTC investigators on multiple ontology development and implementation efforts, and participate in ontology education in Aim 4.

Barry Smith, PhD, Distinguished Professor of Philosophy and Professor in the Departments of Neurology and Computer Science and Engineering in UB, is a prominent contributor to both theoretical and applied research in the field of biomedical ontology, who has received over $10M in funding for his ontology work. He is Director of the National Center for Ontological Research, a principal scientist of the National Center for Biomedical Ontology, and a Coordinating Editor of the OBO Foundry initiative. He is also a scientific advisor/consultant to ontology projects in the Cleveland Clinic, Johns Hopkins University School of Medicine, Mount Sinai School of Medicine, University of California at San Francisco, the German Federal Ministry of Health, the US Army, and the US Joint Forces Command. He is, with Melissa Haendel of the Oregon Health Sciences University, leader of the Expertise Ontology Working Group of the NCRR-funded eagle-I and VIVO resource discovery initiatives. (See Haendel letter of support and Aim 3 below)

• Responsibilities. (30% effort). As Co Director of the Key Function, Dr. Smith will manage the day to day operations with Dr. Ceusters. He will serve as Co-Director of the Resource Discovery, Tracking, and Evaluation System (RDTE) (Aim 3) and Director of the Advanced Certificate Program in Biomedical Ontology (Aim 4). He will collaborate with BTC investigators on biomedical ontology development and implementation efforts

Alexander Diehl, PhD, has been recruited to the UB Department of Neurology after serving as Research Scientist for the Mouse Genome Informatics resource of the Jackson Lab. He is an active developer of the Gene and Cell Ontologies, currently directing a project to revise and restructure the Cell Ontology with a major focus on hematopoietic and nervous system cell types. He is also a member of the Infectious Disease Ontology and Vaccine Ontology development teams, and of the International Neuroinformatics Coordinating Facility Program on Ontologies of Neural Structures Representation and Deployment Task Force. Formerly he worked for Decode Genetics in Iceland on target validation in MS, Stroke, and Peripheral Artery Disease.

• Responsibilities. (40% effort) Dr. Diehl will serve as Director of Ontology Services, providing guidance and support to BTC clinical scientists using ontologies in their work. He leads the Neurological Disease Data Project (Aim 1, Pilot 2) and will teach in the Advanced Certificate Program in Biomedical Ontology (Aim 4).

Albert Goldfain, PhD. From 2002 to 2008 Goldfain worked as software engineer for the medical device manufacture Welch Allyn Inc. where he is first inventor on a patent specifying a language for medical device information exchange. Since 2008 he has worked for Blue Highway, Inc., a subsidiary of Welch Allyn devoted to innovative projects in informatics and medical device technology. He also serves under Smith’s direction as postdoctoral researcher on an NIAID-funded grant on the development of an Infectious Disease Ontology to support research on Staph. aureus biology and treatment.

• Responsibilities. (10% effort) Dr. Goldfain will lead the Vital Signs Ontology initiative as part of Aim 1, Pilot 1. His salary costs will be met by Blue Highway, Inc. (See letter).

Alan Ruttenberg, MSc, was employed from 1998 as a computational biologist by Millennium Pharmaceuticals, working as project lead on the award-winning PARIS system, which addresses a key problem in drug discovery – that of identifying the cellular pathways that are active or perturbed in microarray experiments, where thousands of genes may be differentially expressed (29-31). Since 2007 he has directed the Neurocommons Project, which supports computationally aided discovery in the domain of neurological diseases through collaborations with Yale’s Senselab (32), the Cure Huntington’s Disease Initiative Foundation, the Alzheimer Research Forum (Alzforum) (33) and the NIH Roadmap Neuroscience Information Framework (55). Ruttenberg serves on the Oversight Committee of the Digital Brain Atlasing Task Force of the International Neuroinformatics Coordinating Facility and is a lead in its Program on Ontologies of Neural Structures (34). He is also a Chair of the World Wide Web Consortium Web Ontology Language Working Group (35).

• Responsibilities. (30% effort) Ruttenberg will serve as Director of the Clinical and Translational Data Exchange (Aim 2) and lead the Surgical Pathology Ontology Project (Aim 1, Pilot 3). He will also collaborate with BTC investigators on ontology development and in ontology education under Aim 4.

Carmelo Gaudioso, MD, MBA, PhD in Medical Informatics, he currently is Assistant Professor of Oncology, Director of Medical Informatics, and Director of the Clinical Data Network at Roswell Park Cancer Institute and has over 25 years of experience in the health care field with both clinical and management positions. In these roles he works in close collaboration with the VP of Information Technology, the Director of Clinical Informatics, and the Director of Bioinformatics to provide seamless leadership and infrastructures for the development of an informatics program and data management required to support translational research clinical informatics efforts at RPCI. He is currently working with Dr, Steve Edge, Chair of the Breast Cancer Program at Roswell Park Cancer Institute, and Dr. Judy Smith, medical Director at Roswell park Cancer Institute, to design and develop care plans designed to meet meaning use requirements. He chairs the Inforamtics Governance Committee at Roswell Park Cancer Institute and serves on a Clinical Informatics, IT Governance, and Continuous Medical Education Committees. His research is focused multidisciplinary knowledge management and decision support systems, including ontology driven systems, patient centered healthcare and the use patient-physician portals to support patient-physicians partnership across the continuum of care.

▪ Responsibilities. (20% effort) He will work with Alan Ruttenberg on the Surgical Pathology Ontology Project (Aim 1, Pilot 3) and Dr. Ceusters in the Electronic Health Record Semantic Interoperability Project (Aim 1, Pilot 1),

Staff Clinical scientists/informaticians involved in the realization of the three pilots under Aim 1.

Gunther Kohn, CIO of UB’s School of Dental Medicine, will manage extensions to the Picasso system and applications of the system to test our EHR interoperability strategy in Aim 1, Pilot 1 (20 (?% effort).

Kinga Szigeti, MD, Assistant Professor, Department of Neurology will devote 40?% of effort to management and research of the Alzheimers.

Jose Luis Tapia, DDS, MS, is Assistant Professor, Oral Diagnostic Sciences. He will create and apply the ontology content for the Breast Cancer Pilot (Aim 1, Pilot 3) (?40% effort).

Peter Winkelstein, MD, CIO, UBMD; Director IHI.

Sustainability

For all pilot work and all ad hoc ontology service contributions corresponding FTE will be listed on associated grant budgets. We plan on investing a proportion of the money leveraged from successfully funded CTSA-based grants, returned in the form of both direct costs and income fund reimbursement dollars (the NYS system for reimbursing departments for successfully funded grants), in terms of hiring additional faculty, staff and/or additional research assistants.

All staff members will receive training in applying ontology methods to enhance database interoperability. All will work in teams involving both the ontologists and the clinicians involved in the three pilots.

4A.4.1 Sustainability

A key feature with respect to the sustainability of this Key Function is the listing of staff and faculty for effort on grant budgets. The collaborating ontology researcher would be expected to be funded as part of any grant application requiring substantial support. We plan on investing a proportion of the money leveraged from successfully funded CTSA-based grants, returned in the form of both direct costs and income fund reimbursement dollars (the NYS system for reimbursing departments for successfully funded grants), in terms of hiring additional faculty, staff and/or additional research assistants.

4A.5 Governance

Day-to-day operations of this key function will be managed by Drs. Ceusters and Smith, who will be assisted by an Operations Committee with Diehl, Guadioso, Ruttenberg, Smith, and Soergel and Michael Glick (Dean, School of Dental Medicine) as members. The Ontology Key Function will work closely with the Biomedical Informatics Key Function, and Smith, Ceusters, and Ruttenberg, and Gaudioso will serve on the Biomedical Informatics Executive Committee (see Figure 4-1 above). to address the planning and needs assessment for the BTC to accomplish an interoperable informatics environment. Ceusters, and Glick, and Gaudioso will serve on the Biomedical Informatics Governance Committee, where they will serve as liaison between the community of holders of patient data in the BTC and the community of users of this data for information-driven research. Both the Operations and the Biomedical Data Governance Committee will work closely with the Ethics Key Function.

4A.6 Action plan for each Aim

4A.6.1 Aim 1. To develop ontology services for clinical scientists (1) within the framework of three interrelated pilot experiments designed to test our methods, (2) on an ad hoc basis through consulting with specific clinical communities, both inside and outside the BTC.

To develop clinical and translational ontology resources and to test their clinical utility in three pilot experiments in data integration and analysis relating to Electronic Health Record (EHR) interoperability, neurological disease, and breast cancer.

In each of the 3 pilots ontology modules will be developed as needed, in accordance with the methodology developed and applied in Buffalo in multiple NIH-funded ontology initiatives (7), (8), (25). The modules will be tested and incrementally enhanced in light of their ability to realize the goals of each pilot and also on the basis of feedback received from our external collaborators in the wider GO and OBO Foundry communities, in the National Center for Biomedical Ontology, and in the Neurocommons and other Semantic Web initiatives.

The three pilots themselves are interrelated, so that ontological resources will be shared between them. In particular, the results of our work on Neurology and Breast Cancer pilots 2. will be exploited in the Electronic Health Record interoperability pilot.

Each pilot will involve ontologists, clinicians and medical informaticians in the BTC, three of them (Diehl, Ruttenberg, Szigeti) hired to UB specifically for this purpose. Each pilot is designed to bring measurable benefits to both (1) patient care, monitoring and regularity compliance, and (2) clinical and translational science, by increasing possibilities for retrieval, combination, and analysis, reuse and dissemination of biomedical research data.

We will be working with multiple departments and research groups within the BTC and also drawing on ontology content that has been created and tested in collaboration with external healthcare institutions. For this level of interoperation to be possible it is important that all ontologies are built to the same logical and computational standards and that all are able to foster communicability of data by ensuring preservation of meaning when data are transferred from one system (or hospital, or research lab) to another. To this end tThe ontologies we develop will be constructed to ensure maximal comparability of the data annotated in their terms through the employment of best practices in ontology construction deriving from the OBO Foundry, our prior work. including the use of logical definitions of relevant terms and relations and of advanced public-domain Semantic Web software (36-39) and World Wide Web Consortium standards (40-42). The fact that all of our work is based on widely used open source ontologies in this way will mean that all the resources developed on the basis of the 3 pilots will generalize to resources useful also outside Buffalo.

At the center of all three pilots is the Ontology for General Medical Science, which has been developed by UB ontologists in collaboration with ADD HOGAN ETC. HERE Richard Scheuermann and his team in the University of Texas Southwestern North and Central Texas Clinical and Translational Science Initiative (43, 44) as an upper-level ontology comprising terms such as ‘disease’, ‘disease course’, ‘diagnosis’, ‘complication’, ‘pathological process’, together with associated logical definitions, extended by a Vital Signs Ontolgy that will be developed in collaboration with Blue Highway, Inc., serving interoperation of EHR data with standardized data deriving from medical devices. (See Blue Highway letter of support.)

4A.6.1.1 Pilot 1: Electronic Health Record Semantic Interoperability Project (Director: W. Ceusters)

Pilot 1, under the direction of Werner Ceusters, is a project to test ontology-based strategies for interoperation of EHR systems that we believe will have implications globally in the coming years. The goal is to demonstrate experimentally the feasibility of using public domain ontologies to create semantic interoperability between diverse EHR systems. We will focus initially on small portions of Allscripts, Eclipsys and Picasso focusing on general demographic data and on data of the types used by our clinician partners in Pilots 2 and 3. Allscripts is used by UBMD, a consortium of more than 450 physicians who are also professors at UB School of Medicine and Biomedical Sciences. Eclipsys AllScripts is also used by the Roswell Park Cancer Institute; Picasso is used by the UB School of Dental Medicine. The goal is to devise and implement a strategy that will enable exchange of information between three separate systems about those patients who move between the corresponding institutions, thereby supporting continuity of care, and potentially expanding patient data resources available to clinical scientists in the BTC.

Given the complexity of all three systems, and the broad scope and proprietary nature of the Allscripts and Eclipsys systems (AllScripts), we can achieve this goal only incrementally, initially working on highly focused portions of the systems in question guided by the needs of our clinical collaborators. The consequences of even partial success, however, would be of potentially great significance, since the issues associated with creating interoperability between EHR systems is one that will become of increasing importance to the nation’s health.

The feasibility of our approach is greatly enhanced by the fact that we will have full access to Picasso, the Electronic Oral Health Record developed by the UB School of Dental Medicine (SDM), which is the only North American Dental School with this level of control and funded commitment to developing a full blown EHR that is semantically interoperable with multiple other data resources. Picasso began operation in 1996 and comprises records of more than 130,000 patients, 700,000 patient encounters, 1,150,000 procedures, and 200,000 DICOM images. The entire system, including both hardware and software and an award-winning design to put computers at the point of care, is locally designed and managed. Picasso has been developed in such a way as to allow progressive incorporation of additional health record and biosample data, including data deriving from integration of point of care medical screening in dental settings and from longitudinal health and disease studies using oral samples. Picasso will also support the work of the Periodontal Disease Clinical Research Center (see 7.3.7c below), which is focused on epidemiological studies and clinical trials exploring relationships of periodontal disease to cardiovascular and other sysmetemic diseases, such as diabetes, lung diseases and osteoporosis. Its value to our work however turns most importantly on the fact that, oOf the patients recorded using this system, some 85% will visit other Buffalo hospitals at some time in their lives, some 10-15% will visit UBMD clinicians, and some 15-20% will visit Roswell Park Cancer Institute. Furthermore, UBMD is the preferred provider for non-cancer related healthcare for patients receiving their care at Roswell Park. Picasso thereby provides a unique laboratory for our experiments in better sharing of clinical and translational data, which will enable us to avoid the cumbersome retrofitting of the underlying data structures that is required by most commercial EHRs.

Approach. We will begin by analyzing the common portions of the 3 systems using the Ontology for General Medical Science, including identifying which standard terminologies are currently embedded in the selected portions of the 3 systems, paying attention in particular to what parts of the semantics are explicitly conveyed by means of the architecture, and what parts are hidden in the user interface or in the pragmatic ways users work with these applications. We will then extend the Ontology for General Medical Science to create ontologies for specific neurological diseases and for breast cancer as described below, drawing on existing OBO ontologies such as the Foundation Model of Anatomy (FMA) (45) and the Human Phenotype Ontology (HPO) (46) as necessary. The latter will provide also the needed links to the basic biology annotation datasets (10). We will then apply the Referent Tracking (RT) methodology developed at UB (16, 20) to create an experimental data resource of codes from the 3 systems used to represent phenomena on the side of the patient identified by terms in the Ontology for General Medical Science and the disease-specific ontologies that extend it. The possibility of creating such a resource turns on the fact that many patients move between hospitals or clinics using the different systems within the BTC. This unique data resource will then be used as a semantic mediator between the 3 systems in which algorithms can be applied to understand the relationships between data entries pertaining to the same phenomena within the 3 systems. This in turn will allow structured cross-system summaries of these data to be prepared in a highly flexible manner, for example in satisfaction of multiple regulatory requirements. It will allow conflicts between the data of potential significance for patient care to be recognized automatically, and it can be used also to identify, across all institutions using the 3 systems, patients who might serve as candidate subjects for clinical trials.

Implementation. In order to be minimally disruptive of the processes and applications currently supported by these data sets but with the goal of achieving a maximal level of semantic interoperability with internal and external datasets, we will use the data warehouse system developed by the Bioinformatics Key Function, building specifically on the following three components:

(1) a data modeling and integration component for flexible and effective management of data,

(2) a high-performance computing component for fast analysis of data,

(3) a security component for protection of privacy.

Adding the ontology and referent tracking resources outlined above will bring a number of benefits: First, it will allow us to transform existing EHRs in the ways described in (18, 47, 48) in such a way that they can serve as peer to peer applications. Second, because all clinical ontologies we use are part of the OBO Foundry and thus semantically interoperable with basic-biology ontologies such as the GO, this will mean that the clinical datasets described in their terms will become automatically aligned with the huge quantities of genomic, proteomic data annotated using Foundry terms. Third, it will allow us to employ the resources made available through the i2b2 framework used also by other CTSA institutions (49), and thereby take advantage of the peripheral software components designed in the i2b2 hive. Fourth, although the i2b2 software suite is designed to work with ontologies, its ontology resources are still restricted to conventional terminologically-based resources, rather than the realism-based resources created on the basis of OBO Foundry principles, which we will contribute to i2b2. We believe that our work can bring significant benefits in this respect to the wider CTSA consortium.

Collaboration. Our adoption of the OBO Foundry framework means that we can benefit from a number of complementary initiatives, including work by our collaborator Richard Scheuermann’s lab of the University of Texas Southwestern Medical Center CTSA to create a framework for clinical research databases also based on the Ontology for general Medical Science (50) (See Scheuermann letter of support). We can benefit also from the work of our collaborator Hogan of the University of Arkansas CTSA which is demonstrating benefits of the Referent Tracking methodology to medical information systems. By using RT-identifiers to expand a set of EHR data extending over 5000 hospitalizations, and including patient demographic data and diagnosis data coded with ICD-9-CM, our Arkansas colleagues were able to show that querying, for example for patients with a particular type of disease or disorder, is greatly simplified as compared to what is achievable with traditional resources. (See Hogan letter of support)

Participants: Gunther Kohn (CIO, UB School of Dental Medicine, 20% effort. Jennifer Cox (Database Manger, 25% effort). In addition, contributions to the realization of the goals of this pilot project in the form of data collection, management and use will be made by the following participants as part of the normal course of their work in the clinical context: Carl Morrison DVM, PhD (Director, Pathology Resource Network, Roswell Park Cancer Institute), Dr. Carmelo Gaudioso, MD PhD (Director of Medical Informatics and Director of the Clinical Data Network at Roswell Park Cancer Institute), Dr Peter Winkelstein, MD (CIO, UBMD). Additional contributions will be made by Ceusters (Director of this Pilot), Diehl, Ruttenberg, and Smith.

4A.6.1.2 Pilot 2. The Neurological Disease Ontology and Data Interoperability Project (Director: Alexander Diehl)

Background.

Significance. The results of our work will be of value to multiple ontology-based projects in the areas of neurology and neuroscience, including the NIH’s Neuroscience Information Framework (NIF), with which we collaborate closely (see Martone letter of support). At the center of our work is the Neurological Disease Ontology (ND), a CBBI developed ontology for the comprehensive representation of neurological diseases and their patients. We are currently developing ND Ontology modules for Alzheimer’s Disease, Stroke, and Multiple Sclerosis.

Creation of New State-of-the-Art Patient Data Resources in Alzheimer and Stroke. We will create innovative patient databases in Alzheimer and Stroke designed to support both everyday clinical care and multiple information-driven strategies for clinical and translational research. Each database will involve:

• maximally liberal default conditions on patient consent forms concerning use of data

• systematic effort to keep track of all the data relating to the patient and to patient care

• creation of multiple innovative ontology resources for in-depth annotation of clinical phenotypes

• conformity to NIH guidelines for example concerning Common Data Elements

• integration with UB’s Data Warehouse using i2b2 software

• systematic effort to preserve samples and digitalized images for at least a representative subset of patients

• experimental effort to create de-identified patient data for publication on the Web.

Participants: The Alzheimer Patient Database will be used to identify genetic risk factors for Alzheimer disease using copy number variation as a genetic marker map. We will establish a longitudinal cohort of subjects affected by Alzheimer disease, depositing all clinical, neuropsychological, laboratory biomarker, imaging and genetic data in the patient database. Two specially created ontologies, the Alzheimer Ontology, under development as a module of the ND Ontology, and the Neuropsych Testing Ontology, will then be used to support mining of these data to identify endophenotypes for subsequent genetic association analyses. We will use categorical endophenotypes and quantitative variables (age at onset, neuropsychological measures, biomarker data) as outcome and the copy number state as predictor in the analytic process. Replicated findings will be followed up by subsequent clinical studies to assess their predictive value for the associated clinical feature (disease status, age at onset, cognitive deficit).

The Stroke Patient Database will be used to identify outcomes based on variances in stroke presentation and variable stroke treatment. Outcomes will be Rankin Scale and Barthel Index, death or subsequent stroke/TIA. Variables in stroke presentation will be CT Stroke Study data (CT, CT Perfusion, CT angiogram of head and neck), MRI data that identify territory and possible etiology of stroke as well as laboratory data such as echocardiogram (2D or 3D), carotid ultrasound, lipid profile, Hemoglobin A1C, sedimentation rate and other laboratory markers for stroke in the young. The cohort will be longitudinal and complete, currently 1750 strokes/year. TIAs will also be put into the cohort. One use of the Stroke Ontology module of the ND ontology will be to match clinical data from stroke patients to phenotypes seen in knockout mouse models of stroke and arteriosclerosis. Through such phenotypic matching, patterns of human stroke phenotypes may be tied to disregulation of genes experimentally linked to stroke and related diseases in animal models. The biological processes and pathways that such genes are involved with can then be identified via Gene Ontology annotations and other resources. Knowing such pathways may then point to therapeutic approaches in the longer term. [pic](51-53)

The Multiple Sclerosis Ontology Module of the ND ontology will be used for annotation of patient data in the New York Multiple Sclerosis Consortium Registry, beginning with records that are part of local Baird MS Registry maintained at the University at Buffalo. The NYSMSC includes thirteen active academic multiple sclerosis centers and neurology practices across New York State and represents the largest clinical-based cohort of MS patients in the United States, with over 9,000 registered MS patients and over 16,500 follow-up visits over a period of 12 years providing a unique systematically-collected MS epidemiological database. Annotation of this patient data will enable us to perform registry-based queries about diagnoses, treatments, and outcomes in MS in order to leverage this database to inform clinical research in MS.

The Alzheimer, Stroke, and Multiple Sclerosis ontology modules will be used for the annotation of both experimental and clinical data in a way that will also contribute to the development of the ontologies themselves, following the method described in (3). They will also be used to annotate relevant translational research data in a way that will enable better inference across these related ontologies, enabling for instance the linking of genes annotated to particular GO biological processes to neurological diseases where clinical data suggests those processes may be disregulated. The ontologies will be built in such a way as to be semantically interoperable with the OBO Foundry candidate ontologies, and also with the Ontology of Mental Health (54) and relevant Neuroscience Information Framework (NIF) Standard ontologies (55). Foundry compliance will allow us to incorporate results of translational research annotated for example through the immunological biology terms created in the GO by Diehl and his colleagues (56). The ontologies and data in both resources and all associated annotations will form part of the BTC Data Center with associated security protections (see Informatics Key Function). Subsets of the annotated data will be de-identified for inclusion in the BTC Data Exchange and publication on the web (see Aim 2 below).

4A.6.1.2 Pilot 2. The Neurological Disease Data Interoperability Project

Background. This pilot project will be realized under the leadership of Alexander Diehl by the UB Center for Brain and Behavior Informatics (CBBI). CBBI was established in 2009 as part of the strategic strength Health and Wellness Across the Lifespan of the university-wide UB 2020 initiative. Its goal is to promote cutting-edge research at the interface of biomedical informatics and information-intensive neuroscience through the recruitment of new junior faculty who will work together in the three overlapping areas of: biomedical ontology research, neurology/neuroscience, and cognitive, social and behavioral functioning. Three new hires – Diehl, Ruttenberg, and Szigeti – were recruited in 2010 within the framework of this initiative. CBBI scientists engage in clinical research collaborations with faculty in multiple BTC institutions. They will contribute to training in biomedical informatics and work to promote broader adoption of biomedical informatics tools and resources throughout the BTC medical, behavioral, and health science community, with brain disorders and associated behaviors as primary research laboratories.

Significance. The results of our work will be of value to multiple ontology-based projects in the areas of neurology and neuroscience, including the NIH’s Neuroscience Information Framework (NIF), with which we collaborate closely (see Martone letter of support).

Creation of New State-of-the-Art Patient Data Resources in Alzheimer and Stroke. We will create innovative patient databases in Alzheimer and Stroke designed to support both everyday clinical care and multiple information-driven strategies for clinical and translational research. Each database will involve:

• maximally liberal default conditions on patient consent forms concerning use of data

• systematic effort to keep track of all the data relating to the patient and to patient care

• creation of multiple innovative ontology resources for in-depth annotation of clinical phenotypes

• conformity to NIH guidelines for example concerning Common Data Elements

• integration with UB’s Data Warehouse using i2b2 software

• systematic effort to preserve samples and digitalized images for at least a representative subset of patients

• experimental effort to create de-identified patient data for publication on the Web.

The Alzheimer Patient Database will be used to identify genetic risk factors for Alzheimer disease using copy number variation as a genetic marker map. We will establish a longitudinal cohort of subjects affected by Alzheimer disease, depositing all clinical, neuropsychological, laboratory biomarker, imaging and genetic data in the patient database. A specially created Alzheimer Ontology will then be used to support mining of these data to identify endophenotypes for subsequent genetic association analyses. We will use categorical endophenotyes and quantitative variables (age at onset, neuropsychological measures, biomarker data) as outcome and the copy number state as predictor in the analytic process. Replicated findings will be followed up by subsequent clinical studies to assess their predictive value for the associated clinical feature (disease status, age at onset, cognitive deficit).

The Stroke Patient Database will be used to identify outcomes based on variances in stroke presentation and variable stroke treatment.  Outcomes will be Rankin Scale and Barthel Index, death or subsequent stroke/TIA.  Variables in stroke presentation will be CT Stroke Study data (CT, CT Perfusion, CT angiogram of head and neck), MRI data that identify territory and possible etiology of stroke as well as laboratory data such as echocardiogram (2D or 3D), carotid ultrasound, lipid profile, Hemoglobin A1C, sedimentation rate and other laboratory markers for stroke in the young. The cohort will be longitudinal and complete, currently 1750 strokes/year. TIAs will also be put into the cohort.  One use of the Stroke Ontology will be to match clinical data from stroke patients to phenotypes seen in knockout mouse models of stroke and arteriosclerosis.  Through such phenotypic matching, patterns of human stroke phenotypes may be tied to disregulation of genes experimentally linked to stroke and related diseases in animal models. The biological processes and pathways that such genes are involved with can then be identified via Gene Ontology annotations and other resources.  Knowing such pathways may then point to therapeutic approaches in the longer term. (51-53) The Alzheimer and Stroke ontology modules will be used for the annotation of both experimental and clinical data in a way that will also contribute to the development of the ontologies themselves, following the method described in (3). They will also be used to annotate relevant translational research data in a way that will enable better inference across these related ontologies, enabling for instance the linking of genes annotated to particular GO biological processes to neurological diseases where clinical data suggests those processes may be disregulated. The ontologies will be built in such a way as to be semantically interoperable with the OBO Foundry candidate ontologies, and also with the Ontology of Mental Health (54) and relevant Neuroscience Information Framework (NIF) Standard ontologies (55). Foundry compliance will allow us to incorporate results of translational research annotated for example through the immunological biology terms created in the GO by Diehl and his colleagues (56). The ontologies and data in both resources and all associated annotations will form part of the BTC Data Center with associated security protections (see Informatics Key Function). Subsets of the annotated data will be de-identified for inclusion in the BTC Data Exchange and publication on the web (see Aim 2 below).

Participants: Jennifer Cox, Database Manager (25% effort), Kinga Szigeti, MD (Neurology; Director, Alzheimers Center) (15% effort). Contributions to the realization of the goals of this pilot project in the form of data collection and management will be made by the following participants as part of the normal course of their work in the clinical context: Robert N. Sawyer, MD (Chair, Neurology; Director, Stroke Center),; in addition contributions will be made by Ceusters, Diehl (Director of this pilot), Ruttenberg, and Smith.

Pilot3. The UB-Roswell Park Breast Cancer Pathology Data Interoperability Project

Background. Surgical Pathology Traditionally, the findings of the surgical pathologist resulting from analysis of tissue specimens are presented in a surgical pathology report. This report becomes part of the patient record and is the initial source that clinicians use in the process of devising a treatment plan and establishing a prognosis. The report specifies location of the lesion, surgical procedure, macroscopic description of the specimen, microscopic diagnosis and supporting findings, prognostic features, tumor margins, and results of special studies for example in immunohistochemistry. Thousands of surgical pathology reports are produced every day, and they represent an enormous resource of data for clinicians and researchers whose value is increasing with the appearance of new diagnostic and prognostic methods. Unfortunately, much of the information contained in such reports is encoded as free text, and this prevents integration with other kinds of biological and medical information. To address these problems, pathologists are increasingly required to use synoptic templates based on checklist formats when reporting on malignancies, with the most accepted formats being maintained by the College of American Pathologist (CAP). Unfortunately, while the CAP has standardized the collection of data, its checklists are often insufficiently flexible to be adapted to meet institution-specific needs. Many institutions therefore create their own synoptic templates, resulting once again in a problem of data interoperability.

Flow Cytometry: TO BE FILLED IN

The UB-Roswell Park Breast Cancer Data Interoperability Project is directed by Alan Ruttenberg. It will begin with an investigation of how the information in surgical pathological reports can be rendered in a computable form in order to permit its integration with other types of information. We will use the Surgical Pathology Ontology (SPO) created by Jose Tapia to address the need for a controlled vocabulary of terms and relations used in such reports following the principles of the OBO Foundry. The ontology will grow incrementally to cover specimens, pathologic staging, of patient treatment and the macroscopic description of the cell morphology, cellular pattern and stroma associated with given types of breast cancer tumors. SPO is an application ontology which imports existing terms and definitions from other OBO ontologies as needed, specifically from the GO, the FMA, the Mammalian Phenotype (MP) and Phenotypic Quality (PATO) ontologies, and also from the National Cancer Institute Thesaurus and SNOMED CT.

Goal. The goal of this pilot project is to use the SPO ontology to align three databases related to breast cancer with the aim of making them semantically interoperable through the SPO annotations and thereby available for combined query and algorithmic reasoning:

(1) a database of the synoptic reports currently being created by Roswell pathologists for each breast cancer malignancy;

(2) the New York State Tumor Registry;

(3) Roswell’s (EclipsysAllScripts) Electronic Medical Record.

Approach. We will begin by Iidentifing a subset of records from each of the 3 sources together with a sample set of queries some of which we know can be answered from these records, some of which cannot be answered. We will document both the effort needed to address these queries using the existing databases and software, and also the results achieved. We will then re-represent the contents of the three databases through annotation with SPO, using a combination of manual and automatic reasoning approaches, further developing the ontology as needed. In the next stage, we will address identical queries against the results, using Semantic Web technologies and effectively pulling data from the 3 resources without disturbing workflow on the databases themselves. Again, we will measure the amount of effort needed and the results gained, documenting the work performed in such a way that the strategy can be easily applied in successive iterations with changes in form-filling templates.

The advantages of this approach, which has been used by Ruttenberg and his Neurocommons collaborators in multiple projects (4) is that it avoids complexities associated with traditional database federation and avoids rebuilding of existing databases. If, as we predict, the approach brings measureable reductions in effort needed along with enhanced query results, then we will extend the method to other forms of malignancies.

Participants: Jennifer Cox, Database Manager (25% effort), Jose L. Tapia, DDS, MS (Oral Diagnostic Sciences) (40% effort). Contributions to the realization of the goals of this pilot project in the form of data collection, management, analysis and exploitation will be made by the following participants as part of the normal course of their work in the clinical context: David Gold, PhD (Biostatistics, Roswell Park), Louis Goldberg, DDS, PhD (Oral Diagnostic Sciences, SDM), Carl Morrison, DVM, PhD (Director, Pathology Resource Network, Roswell Park), Carmelo Gaudioso (Director of Medical Informatics and Director of the Clinical Data Network, Roswell Park); in addition contributions will be made by Ceusters, Ruttenberg (Director of this pilot), and Smith.

4A.6.2 Aim 2. To create an ontology-based Clinical and Translational Data Exchange which will leverage the work performed under Aim 1 to establish an evolving repository of integrated, de-identified Web-based patient data and biological knowledge available for use by members of the National CTSA Consortium and other external institutions and individuals. 

To create an ontology-based Clinical and Translational Data Exchange which will leverage the work of selected research groups under Aim 1 to create an evolving repository of integrated, de-identified Web-based patient data and biological knowledge available for use by members of the National CTSA Consortium and other external institutions.

We will create a repository of ontology-enhanced open-source patient data available for use by members of the national CTSA Consortium and other external institutions. The central idea is to make available pseudonymized patient-specific data present in EHRs and to link this to information stored (but sometimes hidden) in on-line accessible text materials and knowledge bases. The goal is to identify and as far as possible to resolve the multiple challenges to publication of de-identified patient data in order to allow such information to be used by researchers independently of where they are located and for the better management of patients independently of where they are treated.

Use of web-based data to advance distributed research. Here we follow the scenario created by Ruttenberg in the context of the Neurocommons, which seeks to make all scientific research materials – research articles, knowledge bases, research data, physical materials – as available and as usable as they can be, by fostering practices that render information in a form that promotes uniform access by computational agents – sometimes called ‘interoperability’. We want knowledge sources to combine easily and meaningfully, enabling semantically precise queries that span multiple information sources. The benefits of such an approach include scalability – with a future promising massive amounts of patient data deriving from clinical images, robot-arrayed gene chips, machines that can sort materials cell-by-cell, gene sequencers and massively high throughput chemical screens, each potentially able to inform a decision or experimental design potentially leading to advances in the understanding of human health. They include the capacity to address the problems created by distributed knowledge and expertise: the nature of modern life science is specialization. One scientist is an expert on the genetics of Huntington’s Disease, another on the impact of protein folding on Alzheimer’s Disease. Both work on the brain, on many of the same genes and proteins. But they attend different conferences and rarely study the refereed literature outside their own disease. Possible synchronicities between such researchers are at a minimum because their knowledge cannot interoperate. For this set of problems, the Semantic Web, the Web Ontology Language (OWL) and the ontologies of the OBO Foundry are a natural fit. The Semantic Web is intended to scale through decentralization and an emphasis on information reuse. It is a means to capture and network the relationships implicit in high volume data sets, or the outputs of sophisticated analytic software. The Buffalo Clinical and Translational Data Exchange will address experimentally the problems associated with applying this technology to patient data.

Use of web-based data to advance patient care. One scenario is as follows: a physician entering data about a specific patient in one health facility receives immediate feedback on this particular case based upon information present elsewhere in the network. On another scenario the mere fact of the data being entered in that health facility would trigger immediately a process in another hospital or clinic suggesting possible further follow-up for a difficult case being treated there. Meanwhile, relevant data are also automatically conveyed to national or international disorder registries and to a researcher preparing a paper on similar cases of some rare disease. At the same time the relevant bioinformatics institutes are warned about a potential new precursor gene, and similar scenarios can be triggered e.g. if an on-line accessible journal or cancer database is updated. Yet another scenario is that a physician has a problematic patient with a particular rare phenotype. Last week, he looked and there was nothing relevant in MedLine. But thanks to the combination of tools resulting from our research, a message is automatically sent to him that yesterday new results came in on just that kind of case.

Technical details of data exchange. All data, source code and software developed will be made available freely to all users on the BTC Portal (see under Bioinformatics Informatics Key Function). What we refer to as the Buffalo Clinical and Translational Data Exchange will consist of evolving repositories of de-identified open-source data available for unconstrained use by any interested party. The tools will be designed, implemented, and wrapped as a stand-alone package that can be downloaded and efficiently used at client facilities. The web portal users will have access to a virtual help desk where questions about software and algorithm interpretation can be answered. We shall share analytical strategies, methods, data sets and tools.

Legal and Ethical Issues. We will work to understand and resolve ethical, legal, technical, institutional, and research barriers to data sharing, and develop pilot programs in collaboration with the Clinical Research Ethics and Binformatics Key Functions, that promote intra-institution, cross-CTSA, and global data sharing programs with appropriate protection of privacy. We will explore a number of other issues relating to patient data, including data ownership, informed consent for different uses of data, how new assay types impact the ability to identify, what the limits of de-identification are, and strategies for protecting patient data that go beyond de-identification.

Data collection initiatives. We will work with selected collaborators to de-identify and enhance data repositories made available within the BTC Data Center, beginning with the Alzheimers and Stroke data described under Aim 1, and with neurological image data provided by the Buffalo Neuroimage Analysis Center (BNAC), which will create an experimental neurological image database for these purposes (See Zivadinov letter of support) One hurdle to the successful release of clinical data on the web turns on the fact that such data is standardly described using local metadata which leaves the data itself unintelligible to outside users. To address this problem we will work with BNAC personnel, including Guy Poloni, Director of BNAC’s Sequence Development Unit to create a controlled vocabulary for describing neurological images, within the framework of the OBO Ontology for Biomedical Investigations (OBI).

Participants. Szigeti (5% effort), Guy Poloni (20% effort), Jennifer Cox (25% effort), Ruttenberg (Director. In addition contributions from Sawyer (Chair of Neurology) and Zivadinov (BNAC Director) and their colleagues will be made in the normal course of their clinical duties.

4A.6.3 Aim 3. To create, test, and implement an innovative Resource Discovery, Tracking, and Evaluation System building on the ontology-based resource discovery technology of eagle-i, VIVO and CTSAconnect, and contribute this system to the CTSA consortium.

To create, test, and implement an innovative Resource Discovery, Tracking, and Evaluation System, building on ontology technology and on the eagle-i and VIVO frameworks, and contribute this system to the CTSA consortium.

The work involved in realizing this Aim is shared with two other Key Functions: 4. Biomedical Informatics and 11. Tracking and Evaluation. This work is described in detail here because of the foundational role of ontologies in the system. The work involves collaboration also with a number of external researchers, including scientists associated with the eagle-i and VIVO initiatives. The realization of this Aim in the BTC will involve the development and application of state-of-the-art procedures for measurement, data capture and management, and data analysis to produce meaningful data that are directly applicable to improving the efficiency and overall function of the BTC in achieving its mission and goals. This work will also advance the strategic goals of the National CTSA Consortium by creating a framework for evaluation and tracking of multiple CTSA activities, including community engagement, informatics and training, which will be made available for use by all CTSA institutions.

Background. The NCRR-funded eagle-i and VIVO consortia, each made up of 9 member institutions, are conducting an experiment to build a prototype national biomedical research resource discovery network. Currently investigators in clinical and translational science often expend effort and funding re-creating resources that have already been created by others. The primary goal of eagle-i and VIVO consortia is to help biomedical scientists search for and find resources such as animal models, reagents, cell lines, core facilities, or training opportunities, resources normally invisible to those outside the laboratories or institutions where they were developed. Both eagle-i and VIVO work closely with the National CTSA Consortium and CTSA institutions who are partners in the eagle-i and VIVO initiatives include Harvard, OrogonOregon, University of Florida, Weill Cornell Medical College, Indiana University, Scripps Research Institute and Washington University. Where eagle-i focuses primarily on inventorizing things (such as equipment, samples, and data), VIVO is focused primarily on people, their expertise, and the groups to which they belong. However, both consortia use consensus-based ontologies for annotation of resources, ontologies derived wherever possible from the OBO Foundry, and they have agreed to pool their resources for ontology development and maintenance. In this they work with UB ontologists Smith, Diehl and Ruttenberg in a number of initiatives, including the Ontology for Biomedical Investigations (OBI), which serves as model for much of the eagle-I ontology-based inventory software. (See Haendel letter of support.) Smith works with eagle-i and VIVO ontologists in developing the Expertise Ontology that is designed to serve as basis for the next extension of the work of the two consortia, which is to serve resource discovery, for example where a hospital is searching for an expert in some rare pathogen who can also speak French.

Approach. The goal is to move the resource discovery work of eagle-i and VIVO into the field of evaluation, thereby adding essential new ingredients. tTo this end we will build the Resource Discovery Tracking and Evaluation (RDTE) System, a novel ontology-enhanced open-source tool that supports not only resource discovery but also the tracking and evaluation of our clinical and translational research activities. The resource discovery function with its ontology-based search will enable investigators in clinical and translational science not only to find existing resources, but also to increase the use of and benefits derived from existing resources for a better return on investment. The tracking and evaluation function will make use of existing and specially collected data combined with a social network graph-based approach to derive metrics for assessing a CTSA's performance along several dimensions. The key to this approach is that data pertaining to such performance, for example concerning publications or use of equipment or training courses attended, will be collected at regular (at least annual) intervals in a form which allows the application of evaluation metrics to assess for example improvements in the level of specific types of expertise among project participants over time. We thereby take advantage of the fact that resource discovery overlaps considerably with tracking and evaluation in their data requirements, so that much of the needed data will already be being collected through routine processing of grant applications, budget information, and annual faculty reporting, to give just some examples. The RDTE system will provide integrated access to all types of required data (vertical integration) from all Buffalo Translational Consortium partners (horizontal integration). It will be configured in such a way as to be operate even independently of eagle-I and VIVO, should the latter cease to operate at the end of their current funding. The system will be designed in such a way that, through the use of common ontologies taken over from eagle-i and VIVO, we can both combine data to provide integrated resource discovery, tracking and evaluation, for example encompassing multiple CTSAs, and also separate data to provide these same servichces for example for specific key functions.

In Buffalo, the RDTE will provide access to many different sorts of information, summarized in (Table 4A-1), which will be accessible through the BTC Portal.

|Table 4A-1. Examples of data accessible through the RDTE |

|Data on |Examples |

|Person |Contact information, biosketch, research areas and methods, projects / role |

|Research proposal/project |Funding, investigators, status, research areas and methods, publications |

|Core facility |Funding, types of research supported, users, publications |

|Instrumentation |Access conditions, types of research supported, instances of research supported |

|Sample |Origin, date of collection, freeze/thaw history |

|Data set |Collected from which research project, contained in which data warehouse |

|Volunteer registry for clinical studies |Healthy volunteers with prior participation in BTC studies area accessible for future study participation |

|Training and mentoring |Objectives, content, trainee participants, mentors, evaluation |

Both internal and external collaborations which occur as a result of the CTSA will be recorded, as will participation in relevant workshops. Data will be stored relating to the progress of trainees in CTSA training and mentoring programs, placement of trainees upon completion of the program, research done and equipment used in the BTC, grants awarded, discoveries resulting from correlating data in the BTC Data Center including both improvements in methods of patient care and improvements in actual patient care in participating hospitals as a result of the CTSA.

RDTE will interface with many data systems in the BTC and in consortium partners, including: faculty, staff and trainee activity reporting systems, bibliographic databases such as PubMed, Web 2.0 social networking databases, wikis and blogs established for trainees and for BTC participants, databases of sponsored projects and IRB data systems used by the CTRC Clinical Research Center and the Roswell Park Clinical Research Service eResearch Technology (eRT) system. VIVO instances are being established in UB and (for biomedical researchers) in the SUNY system as a whole, and these instances will feed data to the RDTE System. Initially however much of the data in the system will be imported manually, as the needed automated interfaces to these systems are implemented incrementally.

Table 4A-2 gives a few examples of the types of data that will be included. Because RDTE is based on eagle-I and VIVO, much of its content will be automatically annotated using appropriate OBO Foundry ontologies. This is crucial for retrieval, since the use of well-disseminated and well-structured ontologies will not only provide guidance in the formulation of queries but also provide for reasoning-based search. The support for reasoning provided by the ontologies will be used also in deriving evaluation measures.

Evaluation. The goal is to create an innovative software-based approach to the evaluation of clinical science research. The idea is to use the resource inventory approach in a dynamic fashion to track how the work of a CTSA evolves through time. We will track not only what resources exist at any time, but also how resources are used, how they are combined to make other resources, for example when two teams working on neighboring fields combine their resources to perform new kinds of cross-disciplinary experiments, as when data on mouse models are combined with results of experimental work on human subjects. Both internal and external collaborations will be tracked, as will the interactions of individuals – using data that must in any case be collected to serve the needs of the Tracking and Evaluation Key Function, in part through the use of common social networking tools to support data entry. The RDTE System will allow all dimensions of the CTSA falling within the scope of the Evaluation Key Function to be tracked using eagle-i and VIVO software within a referent tracking server. Our approach would build for example on, Harvard Profiles software provided as part of the eagle-i infrastructure, which offers a number of advanced query, data visualization, and social network analysis tools.

4A.6.3.1 Mechanism of Tracking. Once established and populated with inventory data, RDTE can be used for evaluation in a variety of ways. The key is that referent tracking technology is implemented, so that all entities referred to via data entries in the RDTE are appropriately tracked. This will mean that multiple entries with different time-stamps referring to one and the same entity – whether it be a person, a piece of equipment, a committee, or a document – will be identifiable as such. In this way the software will in effect create a growing graph-theoretic representation of an immense body of interactions among the participants in the BTC, with links in the graph annotated through the purpose-built Expertise Ontology comprising terms representing those kinds of interactions – such as joint authorship of publications or grants, committee membership, student-teacher interactions and so forth – that are relevant to the measurement of expertise. Given the centrality of the training of clinical scientists to the CTSA initiative, it is to be anticipated that expertise will serve as one central dimension in establishing evaluation metrics for CTSA evaluation. In caricature, a CTSA institution is working well to the degree to which it is increasing the total amount of expertise among its participants from one year to the next.

4A.6.3.2 Tracking a Subset. To evaluate the progress of a given CTSA along the dimension of expertise against the Referent Tracking database of interactions among its participants we must determine how much weight is to be given to different sorts of interactions in the interaction graph. To do this we will create a gold standard of expertise evaluations for a given cohort by using selected experts to rank a small subset of the involved persons or groups manually for their expertise. We will then use social networking graph analysis software, for example as developed for the Profiles system for Harvard University by one of the eagle-i partners – allowing learning algorithms to determine in a series of trials how much weighting to place on each such interaction in order to replicate this gold standard ranking.

4A.6.3.3 Secondary applications. While the focus is on the creation of an ontology-based system for resource inventory, evaluation and tracking, we envisage a number of secondary applications for our software. In the area of informatics we see RDTE as enhancing possibilities for data sharing and for advancing consortium-wide collaborations through more effective use of data and information. In the community engagement area we see the software as serving also a query function, allowing different community partners enhanced opportunities to identify potential collaborators for example through linkage to social-networking tools such as Facebook.

Participants. The RDTE effort is directed by Smith and Soergel. Other staffing information is provided under the Tracking and Evaluation Key Function below.

4A.6.4 Aim 4. To create a cross-disciplinary Advanced Graduate Certificate Program in Biomedical Ontology directed towards clinical and translational trainee scientists both in Buffalo and in the National CTSA Consortium.

The Ontology Summit organized by the National Institute of Standards and Technology in Gaithersburg, MD in March 2010 identified the lack of researchers with ontology expertise as a major national and international training and education need. Aim 4 will address this need, drawing on the multi-year experience of Buffalo ontologists in organizing training programs and teaching both web- and classroom-based courses and tutorials in biomedical ontology, most recently under the auspices of the National Center of Biomedical Research, where Dr Smith leads the Dissemination Core. Soergel, too, has for many years offered tutorials on knowledge organization systems in digital libraries and further complementary resources are described under Aim 3 of Informatics Key Function above.

Drawing on this expertise, and on our existing infrastructure of Web-based course content, we will establish in Buffalo the Advanced Graduate Certificate Program in Biomedical Ontology, covering all aspects of ontology and its biomedical applications. The program will be classroom-based, but all resources from the program will be made available as open educational resources (OER). The program will provide an intensive introduction to:

1. the development, maintenance, and quality control of ontologies, to ontology-based literature and data curation and data mining, to Semantic Web and associated software resources,

2. specific ontologies and their uses in clinical and translational research, focusing on the Cell Ontology, the Gene Ontology, the Foundational Model of Anatomy, the Infectious Disease Ontology, the Ontology for Biomedical Investigations, and the Protein Ontology,

3. development and use of ontologies in the fields of medical and health informatics, focusing on the issues surrounding the use of information technologies and information management systems in healthcare and public health settings, as well as the systems required to help patients understand and manage health information to maintain health or cope with injury or disease.

In the design, structure and content of the course we will draw on UB’s existing Advanced Graduate Certificate Program in Medical and Health Informatics, and on the expertise of its Director, Dr Gary Byrd. The program will require 24 hours of graduate coursework, which can be completed in two semesters of full-time enrollment. Classroom-based modules will be integrated into the new clinical scientist training program of the BTC; web-based modules will be made available to the National CTSA Consortium.

4A.7 Integration and Interaction with Other Key Functions

|Key Function |Interaction and interaction |

|3. Novel Clinical and Translational |We will assist researchers in using ontological methods to discover cross-disciplinary commonalities and |

|Methodologies Pilot Projects and Pilot and |correspondences for the purpose of identifying novel hypotheses of potential research interest, where |

|Collaborative Translational and Clinical |possible identifying existing data resources that can be used for initial virtual trials. |

|Studies Program | |

|4. Biomedical informatics |Close integration at all levels. The EHR Interoperability, Neurology and Breast Cancer Pathology pilots (Aim|

| |1), as well as the Data Exchange (Aim 2), will form part of the Data Warehouse; the RDTE System (Aim 3) will|

| |be jointly developed, with the Evaluation and Tracking Key Function. All resources will be accessed through |

| |the BTC Portal. |

|5.  Research design, epidemiology and |We will develop a common ontology-informed data dictionary for use in all trials that receive BTC support. |

|biostatistics | |

|5A. Clinical research ethics |We will cooperate closely on addressing ethical hurdles to patient data sharing through the Data Exchange |

|7. Clinical research center |We will provide clinical scientists with the ontological tools and support they need to create and integrate|

| |their research data. |

|8.  Community Engagement Key Function |We will assist in creating ontology-informed directories of academic and community researchers and |

| |practitioners, and of common metrics for evaluation of community research initiatives. |

|9.  Translational technologies and resources|We will seek to enhance the value of LIMS and other resources by promoting the use of consensus-based |

| |controlled vocabularies to ensure cumulativity, comparability and retrievability of collected data. |

| |We will work with the Buffalo Neuroimage Analysis Center in publishing neuroimaging data for experimental |

| |purposes to the Data Exchange, developing ontology resources for the description of these data in such a way|

| |as to allow them to be reused by external researchers. |

|9A.  Technology transfer and |All products developed and commercialized will be tracked by the Biomedical Informatics and Ontology Key |

|commercialization |Functions using referent tracking software. |

|10. Research education, training and career |We will contribute a new course in Biomedical Ontologies and Applications, and create a new Advanced |

|development |Graduate Certificate Program in Biomedical Ontology. |

|11. Tracking and evaluation |Joint development of the Resource Discovery Tracking and Evaluation (RDTE) System |

4A.8 Evaluation

| |Outcome measures |Milestones |

|Aim 1 |For each of the ontologies to be developed: |Resources to be developed with date of availability of initial version |

| |Formal approval of the suitability and completeness of the associated|Vital Signs Ontology: Month 12 |

| |vocabularies by clinicians and researchers |Electronic Health Record Database: Month 30 |

| |Formal approval of the logical definitions by a external ontologist |Stroke Ontology: Month 18 |

| |reviewers |Stroke Patient Registry: Month 36 |

| |acceptance of the ontologies as OBO Foundry ontologies |Alzheimer's Ontology: Month 18 |

| |acceptance of the ontologies for use in the i2b2 framework |Alzheimer's Patient Registry: Month 36 |

| | |Brain Image Database: Month 18 |

| | |Surgical Pathology Ontology: Month 18 |

| | |Breast Cancer Data Resource: Month 48 |

| |For each of the pilot projects: |Report on the degree of overlap between the parts of the systems that |

| |Degree of involvement by stakeholders |are to be made semantically interoperable |

| |Assessment of the degree to which research questions can be answered |Representation of the data dictionaries in RT compatible format |

| |more accurately with than without the interfacing | |

|Aim 2 |# of visitors to the exchange |Granting of permission to release a new data set. |

| |# of papers published in which the exchange is mentioned |Encoding of data set using appropriate ontologies. |

| |#of data-elements and types available |Demonstration of queries that yield results of clinical/translational  |

| |# of grants using data from the exchange |interest. |

| | |Demonstration of queries that cut across multiple sources of |

| | |information. |

|Aim 3 |# of people, projects, training events, etc. covered |V. 0 of information model for the RDTE Month 3 |

| |● completeness of data (from sampling) |RDTE core, with basic data input forms Month 6 |

| |● accuracy of data (from sampling) |Source data systems for the RDTE and their data models identified |

| |# of visits to the RDTE website |Month 6 |

| |● usability of website (from survey) |RDTE V.0 input forms, and basic reports Month 12 |

| |● usefulness of data (from survey) |RDTE V.1, connection to most source data systems and many reports Months|

| | |24, 36, 48, 60 |

|Aim 4 |Education outcome measures see Section 11.6.10. |Certificate program in place by end of Year 2. |

| | |Initial cohort of graduates by end of Year 3. |

4A.9 Innovation and Contributions to the National CTSA Consortium

Innovation. We believe that the ontologies and associated resources and methods that we will create, test, and implement have the potential to transform the way large clinical and translational datasets are integrated and analyzed. We believe that our work on integration of clinical data originating in different hospitals and clinics has the potential to bring benefits to both continuity of care and to data aggregation – for example for purposes of identifying subjects for clinical trials, or in allowing the formulation of queries against combined clinical and basic-science data. The overarching vision underlying Aim 1 is to create the scientific basis for the Electronic Health Record of the future, which we see as a computational resource that is able to interoperate not only with the workflows in a hospital or lab or physician’s practice and with billing systems but also with the entire spectrum of biological and translational data sources. The idea is to bring to the clinical context recording processes already commonplace in domains such as banking and credit and parcel delivery and also in the tracking of medication consignments and solid organs by means of Radio Frequency Identification (RFID) and similar technologies.

Contributions to the National CTSA Consortium. All ontology content created within this key function, and all associated software tools and annotations, including the innovative resource discovery based tracking and evaluation system (RDTE), will be made freely available to the National CTSA Consortium. The ontologies will in addition be exported to the i2b2 hive used by multiple CTSA Bioinformatics Key Functions.

Key personnel of this key function already interact with ontologists in various CTSA bioinformatics sub-committees and working groups, above all through our collaborations with Richard Scheuermann and William Hogan (Directors of Biomedical Informatics, respectively, of the Dallas and Arkansas CTSAs), through our work with the GO and other OBO ontologies, and with our collaborators Drs Mark Musen and Chris Chute in the National Center for Biomedical Ontology (Musen letter of support). We anticipate expanding such interactions through contributions to relevant National Consortium Bioinformatics committees and working groups. In 2010 Smith edited with Richard Scheuermann of the University of Texas SW Medical Center CTSA a special issue of the Journal of Biomedical Informatics on the topic of Clinical and Translational Ontologies.

Baltimore Conference

CTSAconnect

4A.10 Transforming Research in the BTC and in the National CTSA Consortium

Mission: Our mission is to contribute to the realization of the BTC’s goal of serving as an integrated academic home for outstanding clinical and translational science through the provision and use of innovative methods and tools to advance consistency of data representation, thereby creating new opportunities for information-driven research.

Vision: Our vision is to one according to which all biomedical data will be annotated using high quality ontologies in such a way that these data become semantically interoperable in a way that makes possible a new kind of virtual clinical research.

References

1. The Gene Ontology Consortium. The Gene Ontology in 2010: extensions and refinements. Nucleic Acid Research. 2010;38:D331-5.

2. Smith B, Köhler J, Kumar A. On the application of formal principles to life science data: A case study in the Gene Ontology. In: Rahm E, editor. Data Integration in the Life Sciences: First International Workshop, DILS 2004, Leipzig, Germany, March 25-26, 2004, proceedings (Lecture Notes in Bioinformatics , Vol 2994). Heidelberg: Springer; 2004. p. 79-94.

3. Hill DP, Smith B, McAndrews-Hill MS, Blake JA. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics. 2008;9(Suppl 5)(S2). PMCID: PMC2367625.

4. Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, et al. Advancing translational research with the Semantic Web. BMC Bioinformatics. 2007;8 Suppl 3:S2. PMCID: 1892099.

5. Ruttenberg A, Rees JA, Samwald M, Marshall MS. Life sciences on the Semantic Web: the Neurocommons and beyond. Brief Bioinform. 2009;10(2):193-204.

6. Hallett RM, Dvorkin A, Gabardo CM, Hassell JA. An algorithm to discover gene signatures with predictive potential. Journal of Experimental & Clinical Cancer Research. 2010;29(1):120.

7. Grenon P, Smith, B., Goldberg, L. Biodynamic Ontology: Applying BFO in the Biomedical Domain. In: Pisanelli DM, editor. Ontologies in Medicine. Amsterdam: IOS Press; 2004. p. 20-38.

8. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, et al. Relations in biomedical ontologies. Genome Biology. 2005;6(5):R46.

9. Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harrisd MA, et al. Cross-product extensions of the Gene Ontology. Journal of Biomedical Informatics. 2010.

10. Smith B, Ashburner M, Ceusters W, Goldberg L, Mungall C, Shah N, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology. 2007;25:1251-5.

11. Ceusters W, Smith B, Kumar A, Dhaen C. Ontology-based error detection in SNOMED-CT®. In: Fieschi M, Coiera E, Li Y-CJ, editors. MEDINFO 2004. Amsterdam, The Netherlands: IOS Press; 2004. p. 482-6.

12. Ceusters W, Spackman KA, Smith B, editors. Would SNOMED CT benefit from Realism-Based Ontology Evolution? American Medical Informatics Association 2007 Annual Symposium Proceedings, Biomedical and Health Informatics: From Foundations to Applications to Policy; 2007 November 10-14; Chicago IL: American Medical Informatics Association.

13. Ceusters W. Applying Evolutionary Terminology Auditing to SNOMED CT. American Medical Informatics Association 2010 Annual Symposium (AMIA 2010) Proceedings. Washington DC2010.

14. Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI Thesaurus. Methods of Information in Medicine. 2005;44:498-507.

15. Ceusters W, Smith B. Referent Tracking for Treatment Optimisation in Schizophrenic Patients. Journal of Web Semantics - Special issue on semantic web for the life sciences. 2006;4(3):229-36.

16. Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. Journal of Biomedical Informatics. 2006;39(3):362-78.

17. Smith B, Ceusters W. HL7 RIM: An Incoherent Standard. In: Hasman A, Haux R, Lei Jvd, Clercq ED, Roger-France F, editors. Studies in Health Technology and Informatics Ubiquity: Technologies for Better Health in Aging Societies - Proceedings of MIE2006. Amsterdam: IOS Press; 2006. p. 133-8.

18. Manzoor S, Ceusters W, Rudnicki R. A Middleware Approach to Integrate Referent Tracking in EHR Systems. In: Teich JM, Suermondt J, C H, editors. Proceedings of the American Medical Informatics Association 2007 Annual Symposium Biomedical and Health Informatics: From Foundations to Applications to Policy. Chicago IL2007. p. 503-7.

19. Smith B, Ceusters W. An Ontology-Based Methodology for the Migration of Biomedical Terminologies to Electronic Health Records. AMIA 2005. Washington DC2005. p. 669-73.

20. Ceusters W, Smith B. Referent Tracking in Electronic Healthcare Records. In: Engelbrecht R, Geissbuhler A, Lovis C, Mihalas G, editors. Connecting Medical Informatics and Bio-Informatics Medical Informatics Europe 2005. Amsterdam: IOS Press; 2005. p. 71-6.

21. Rudnicki R, Ceusters W, Manzoor S, Smith B. What Particulars are Referred to in EHR Data? A Case Study in Integrating Referent Tracking into an Electronic Health Record Application. In: Teich JM, Suermondt J, C H, editors. American Medical Informatics Association 2007 Annual Symposium Proceedings, Biomedical and Health Informatics: From Foundations to Applications to Policy. Chicago, IL2007. p. 630-4.

22. Ceusters W, Smith B. What do identifiers in HL7 identify? An essay in the ontology of identity. In: Okada M, Smith B, editors. Interdisciplinary Ontology; Proceedings of the Second Interdisciplinary Ontology Meeting (InterOntology 2009). Tokyo: Keio University Press; 2009. p. 77-86.

23. Ceusters W. Formal terminology management for language-based knowledge systems: resistance is futile. In: Temmerman R, Lutjeharms M, editors. Trends in Special Language and Language Technology. Antwerpen: Uitgeverij De Boeck; 2001. p. 135-53.

24. Ceusters W, Smith B. Ontology and Medical Terminology: why Descriptions Logics are not enough. Towards an Electronic Patient Record (TEPR 2003); 10-14 May 2003; San Antonio2003.

25. Anna M. Masci, Cecilia N. Arighi, Alexander D. Diehl, Anne E. Lieberman, Chris Mungall, Richard H. Scheuermann, et al. An improved ontological representation of dendritic cells as a paradigm for all cell types. BMC Bioinformatics February 2009, 10:70. 2009.

26. Diehl AD, Augustine AD, Blake JA, Cowell LG, Gold ES, Gondre-Lewis TA, et al. Hematopoietic cell types: Prototype for a revised cell ontology. J Biomed Inform. 2010. PMCID: 2892030.

27. Natale DA, Arighi CN, Barker WC, Blake J, Chang T-C, Hu Z, et al. Framework for a Protein Ontology. BMC Bioinformatics. 2007;8(Suppl9):S1.

28. Sintchenko V. Infectious Disease Informatics: Springer; 2009.

29. Ficenec D, Osborne M, Pradines J, Richards D, Felciano R, Cho RJ, et al. Computational knowledge integration in biopharmaceutical research. Brief Bioinform. 2003;4(3):260-78.

30. Pradines J, Rudolph-Owen L, Hunter J, Leroy P, Cary M, Coopersmith R, et al. Detection of activity centers in cellular pathways using transcript profiling. J Biopharm Stat. 2004;14(3):701-21.

31. Farutin V, Robison K, Lightcap E, Dancik V, Ruttenberg A, Letovsky S, et al. Edge-count probabilities for the identification of local protein communities and their organization. Proteins. 2006;62(3):800-18.

32. Samwald M, Chen H, Ruttenberg A, Lim E, Marenco L, Miller P, et al. Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience. Artif Intell Med. 2010;48(1):21-8.

33. Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, et al. The SWAN biomedical discourse ontology. J Biomed Inform. 2008;41(5):739-51.

34. Program on Ontologies of Neural Structures. [cited]; Available from: .

35. OWL Working Group. [cited 2010 Oct 4]; Available from: .

36. Clark & Parsia. Pellet Open Source OWL2 Reasoner. [cited 2010 Oct 4]; Available from: .

37. Tsarkov D, Horrocks I. FaCT++

. [cited 2010 Oct 4]; Available from: .

38. Openlink Software. Virtuoso Open Source Edition. [cited 2010 Oct 4]; Available from:

39. Motik B, Shearer R, Glimm B, Stoilos G, Horrocks I. Hermit OWL Reasoner. [cited 2010 Oct 4]; Available from: .

40. Boley H, Hallmark G, Kifer M, Paschke A, Polleres A, Reynolds D. RIF Core Dialect: W3C; 2010 Contract No.: Document Number|.

41. Prud'Hommeaux E, Seaborne A. SPARQL query language for RDF W3C; 2009 Contract No.: Document Number|.

42. W3C OWL Working Group. OWL 2 web ontology language document overview: W3C; 2009 Contract No.: Document Number|.

43. Scheuermann RH, Ceusters W, Smith B. Toward an Ontological Treatment of Disease and Diagnosis. Proceedings of the 2009 AMIA Summit on Translational Bioinformatics, San Francisco, California, March 15-17, 2009: American Medical Informatics Association; 2009. p. 116-20.

44. Goldfain A. Ontology for General Medical Science (OGMS). 2009 [updated 2009; cited 2009 September 13]; Available from: .

45. Rosse C, Jr MJ. The Foundational Model of Anatomy Ontology. In: Burger A, Davidson D, Baldock R, editors. Anatomy Ontologies for Bioinformatics: Principles and Practice. London: Springer; 2007. p. 59-117.

46. Robinson PN, Mundlos S. The Human Phenotype Ontology. Clinical Genetics. 2010;77(6):525-34.

47. Manzoor S, Ceusters W, Rudnicki R. Implementation of a Referent Tracking System. International Journal of Healthcare Information Systems and Informatics. 2007;2(4):41-58.

48. Manzoor S, Ceusters W, Rudnicki R, Arp R. The Referent Tracking System as a Peer to Peer Application. In: Khoshgoftaar T, editor. Proceedings of The Ninth IASTED International Conference on Software Engineering and Applications (SEA 2008), Orlando, Florida, USA, November 16-18, 2008. Anaheim, Calgary, Zurich: Acta Press; 2008. p. 112-7.

49. Murphy S, Weber G, Mendis M, Gainer V, Chueh H, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). Journal of the American Medical Informatics Association. 2009;17(2):124-30.

50. Kong M, Dahlke C, Xiang Q, Qian Y, Karp D, Scheuermann RH. Toward an Ontology-Based Framework for Clinical Research Databases. Journal of Biomedical Informatics. 2010. PMCID: 20460173.

51. McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM. Systematic discovery of nonobvious human disease models through orthologous phenotypes. Proc Natl Acad Sci U S A. 2010;107(14):6544-9. PMCID: 2851946.

52. Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE. Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol. 2009;7(11):e1000247. PMCID: 2774506.

53. Baker EJ, Jay JJ, Philip VM, Zhang Y, Li Z, Kirova R, et al. Ontological Discovery Environment: a system for integrating gene-phenotype associations. Genomics. 2009;94(6):377-87. PMCID: 2783409.

54. Ceusters W, Smith B. The Ontology of Mental Disease: a study preparatory to a realist representation of the Diagnostic and Statistic Manual of Mental Disorders. Journal of Biomedical Semantics. 2010 (forthcoming).

55. Larson SD, Maynard SM, Imam F, Martone ME. - A semantic wiki for neuroinformatics based on the NIF Standard Ontology. Semantic Web for Life Sciences SWAT4LS; Nov 21; Amsterdam, The Netherlands2009.

56. Diehl AD, Lee JA, Scheuermann RH, Blake JA. Ontology development for biological systems: immunology. Bioinformatics. 2007;23(7):913-5.

-----------------------

|Table 4A-2. Sample data items stored in the RDTE |

|Person isMentoredBy Person |

|Person submitted Document for Purpose on Date |

|Person participatesIn Event |

|DataSource administeredBy LegalEntity |

|Equipment ownedBy LegalEntity |

|Equipment measures Quality |

|Person uses Equipment for Purpose in Project on Date |

|Person collaboratesWith Person in Project |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download