ISA-TAB proposal



ISA-TAB Proposal – draft v0.1

This document, part of the , is a working draft authored by an initial group (1). Many issues still need to be resolved. Comments and suggestions should be sent to Philippe Rocca-Serra: rocca@ebi.ac.uk. A project page and mailing list is being created to progress the developments.

This work builds on the existing paradigm development for the MAGE-TAB, - a tab-delimited format to exchange microarray data (2). Before reading this document, an in-depth knowledge of the MAGE-TAB specification is required (3).

1. Background 2

1.1 Purpose 2

1.2 Rationale 2

1.3 Definitions 2

1.4 ISA-TAB Structure – overview and examples 3

1.4.1 Reference file 3

1.4.3 Study file 3

1.4.4 Assay file 3

1.5 Relations with MAGE-TAB and biomedical tabular formats 3

1.6 Minimal content and terminology 4

2 References 4

3 ISA-TAB Structure - details 6

3.1 Reference file 6

3.1.1 Contact Section 7

3.1.2 Protocol Section 8

3.1.3 Factor Section 8

3.1.4 Measurements/Endpoints Section 9

3.1.5 Publication Section 9

3.1.6 Ontology Source Section 9

3.1.7 ISA-TAB Files Section 9

3.3 Study file 11

3.3.1 First Section 11

3.3.2 Second Section 12

3.4 Assay file 13

3.4.1 First Section 13

3.4.1 Second Section 13

3.4.1.1 Technology Type: DNA microarray 13

3.4.1.2 Technology Type: Gel Electrophoresis 15

3.4.1.3 Technology Type: Mass Spectrometry 16

3.4.1.4 Technology Type: NMR Spectroscopy 17

4 Annex 18

1. Background

2 Purpose

This document presents the first working draft of the Investigation / Study / Assay (ISA) tab-delimited (TAB) format proposal; a general framework with which communicate both the metadata (contact details, sample characteristics, technologies used, etc.) and the associated data files from an experiment. ISA-TAB is a superset of MAGE-TAB; see section 1.5. It builds on this existing paradigm and it shares the same motivation for the use of spreadsheet.

With this proposal it is our intention to address the pressing need of a group of collaborative repositories for a common framework for transcriptomics-, proteomics- and metabol/nomics-based experiments (hereafter referred as ‘omics-based’ experiments). However, it is not our intention to ‘compete’ against XML-based formats, whether existing or under development, such as the Functional Genomics Experiment Markup Language (FuGE-ML, 4, 5). In its final form, ISA-TAB could be seen as a framework to communicate the ‘omics-based’ experiments, while the suite of FuGE-ML based modules required to fully describe omics-based experiments are under development. When these become available, ISA-TAB could continue serving those with little or no bioinformatics support as well as finding utility as a user-friendly presentation layer for the XML-based formats (via an XSL transformation) i.e. like demonstrated with HTML rendering of MAGE-ML documents (6).

1.2 Rationale

The rationale behind this initial work stems from the current requirement for a common framework for omics-based experiments. Such a framework will fulfill the needs of:

• The BioMAP project at EBI (7) which will create a common submission framework for ArrayExpress (8), PRIDE (9), and in the near future, a metabolomics repository;

• A group of collaborative repositories (10,11); some committed to pipelining omics-based experimental data into EBI public repositories; others willing to exchange data or to enable their user base to import data from public repositories into their local systems.

It is envisaged that the ISA-TAB will also be tested for genomics- and metagenomics-based experiments (12), along with various conventional assays (often associated with omics-based experiments).

1.3 Definitions

Investigation, Study and Assay are the three key entities (10) around which the ISA-TAB framework is built. They assist in structuring and classifying information relevant to both the sample and the different technologies employed. Study is the central unit, containing information on samples, their characteristics and any treatments applied. A Study has associated Assays, which are tests performed either on material taken from the sample or on the whole initial sample, which produce qualitative or quantitative measurements (data). Assay can be characterized as the smallest complete unit of experimentation; i.e. one hybridization equals one assay; each technical replicate represents an additional assay; one LC-MS run equals one assay; a single clinical chemistry assay is (of course) one assay; a multiplexed (^n) microarray equals n assays; and a MALDI MS chip with n spots could perform up to n assays (i.e. all spots analyzed). Investigation is a higher-order object that helps to group related Studies.

It should be noted that the word ‘experiment’ has been deliberately avoided. A comparison of ArrayExpress and PRIDE revealed that ‘experiment’ is used to refer to objects at different levels of granularity in each; i.e. to refer to a set of multiple related hybridizations in ArrayExpress, but only a single gel-based separation run in PRIDE.

Following the abstractions proposed here, an experiment in ArrayExpress would be equivalent to a Study.

The choice of Study as the central unit of the ISA-TAB proposal is supported by its use in existing biomedical formats, such as the Study Data Tabulation Model (SDTM), which encompasses both the Standard for Exchange of Nonclinical Data (SEND, 13) and the Clinical Data Interchange Standards Consortium (CDISC, 14). SDTM has been endorsed by the US Food and Drug Administration (FDA) as the preferred way to organize, structure and format both clinical and nonclinical (toxicological) data submissions (15, 16). See also section 1.5.

1.4 ISA-TAB Structure – overview and examples

The ISA-TAB uses a number of files to capture the information:

• Reference file;

• Investigation file;

• Study file;

• Assay file (with associated data files and other relevant files).

These files are described briefly in the subsections below and more fully in section 3. For submission or transfer, files can be packaged into an ISArchive as shown in Figure 1. Work is ongoing to define a set of rules to regulate the creation of such files.

Currently, example files are provided in , including:

• Two examples of a Study with Assays:

o

o

• One example where the two Studies above are grouped under one Investigation (when i.e. these are part of the same collaborative project or data from each Study are compared)

o

1.4.1 Reference file

The Reference File records all declarative information referenced from the other ISA-TAB files. This file covers information not only about contacts, protocols and equipment, but also terminologies (controlled vocabularies or ontologies) and other annotation resources.

1.4.2 Investigation file

The Investigation file is intended to contain only a small amount of information, because its role is simply to group related Studies as appropriate. For this reason, it is optional and only becomes necessary when two or more Study files are created. The need for this flexible solution was clear in several use cases. In the toxicogenomics domain, for example, acute toxicity studies are followed by long term toxicity studies and in vitro toxicity studies. For clarity, these would be linked to the same Investigation. Another example comes from the environmental genomics domain, where several studies carried out in the same area can be usefully related under the same Investigation. The limitations that stem from the lack of such an object can be seen in ArrayExpress, where related experiments cannot be explicitly linked to one another. Despite this, MAGE-TAB has an Investigation Design File (IDF); this is used as synonym for experiment and is therefore equivalent to a single Study.

1.4.3 Study file

The Study file is the central file, containing information on the samples studied, their source(s), the sampling methodology, sample characteristics and any treatments or manipulations performed. We acknowledge that in some cases it can be hard to define which information should belong to the Study and which to the Assay. More explanation is offered in the next section; and a set of rules to regulate the creation of such files is under development, as stated above.

1.4.4 Assay file

The Assay file contains information about a protocol and further information generated through its execution (using material from the sample, or the whole initial sample), including references to data files (whether raw, processed or normalized). In the case of microarray assays the Array Description File (ADF) and Final Gene Expression Data Matrix (FGDM) are also referenced. As stated previously, it can be hard to determine whether particular sample treatments and manipulations belong in the Study or the Assay file. In general, treatments or manipulations performed immediately prior to executing the assay protocol itself, such as protein or nucleic acid extraction or labeling, should be described in the Assay file.

1.5 Relations with MAGE-TAB and biomedical tabular formats

ISA-TAB is a superset of MAGE-TAB. The paradigm and syntax have been maintained as far as possible, although improvements have been suggested in a few places. However, while the ADF and the FGDM files have been left intact, the others have been refactored, resulting in a more general framework suitable for use with different technologies:

• The Reference file contains many of the fields from the IDF component of MAGE-TAB, however it is not equivalent to that file.

• The Investigation in the IDF is used as a synonym for ‘Experiment’ (sensu ArrayExpress).

• The content of the Sample Design Reference File (SDRF) has been divided between the Study and Assay files; the Study file containing contextualizing information for Assays as described above.

Where omics-based technologies are used in clinical or nonclinical studies, ISA-TAB will complement existing biomedical formats such as the SDTM. It is inevitable that some information will be duplicated between those two frameworks, but this is not generally seen as a major problem. Ultimately a reference system will be needed to link the two files, but we have deliberately left that feature out of this first proposal.

1.6 Minimal content and terminology

Other important issues include; deciding on the ‘minimum information’ that the ISA-TAB files should require; and the terminology needed for use in each field. These issues are beyond the scope of this proposal but are the focus of related efforts. ‘Minimal information’ checklists are under development both by individual communities for their particular domains of interest and collaboratively through the Minimal Information for Biological and Biomedical Investigation (MIBBI, 17) project. The Ontology for Biomedical Investigations (OBI, 18) will provide both 'universal' terms that are applicable across various biological and technological domains and ‘domain-specific’ terms that are relevant only to a particular domain.

The terms used here to describe each ISA-TAB field are not necessarily final, having been created primarily for the purpose of explaining the proposal. These requirements will be submitted to the larger OBI community, to be refined and then explicitly defined, and will be presented in future versions.

2 References

1. NET project members:

2. Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics. 2006 Nov 6;7:489.

3. MAGE-TAB specification:

4. Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A, Degreef J, Hardy N, Hermjakob H, Hubbard SJ, Hussey P, Igra M, Jenkins H, Julian RK Jr, Laursen K, Oliver SG, Paton NW, Sansone SA, Sarkans U, Stoeckert CJ Jr, Taylor CF, Whetzel PL, White JA, Spellman P, Pizarro A. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics.Nat Biotechnol. 2007 Oct;25(10):1127-1133.

5. FuGE working group:

6. HTML rendering of MAGE-ML documents:

7. BioMAP project:

8. ArrayExpress: ebi.ac.uk/arrayexpress

9. Pride: ebi.ac.uk/pride/

10. Reporting Structure for Biological Investigation (RSBI) working group:

11. Sansone SA, Rocca-Serra P, Tong W, Fostel J, Morrison N, Jones AR; RSBI Members. A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS. 2006 Summer;10(2):164-71.

12. Genomic Standards Consortium (GSC):

13. SEND:

14. CDISC:

15. FDA Data Standard Council:

16. Pharmacogenomic Data Submissions - Companion Guidance

17. MIBBI:

18. OBI:

Figure 1

For submission or transfer, files can be packaged into an ISArchive, as shown in this figure.

3 ISA-TAB Structure - details

Each file has a predefined structure, with fields being organized on a per-column and per-row basis. These files are described in details in the subsections below.

3.1 Reference file

In this file the fields are organized on a per-column and divided in sections, as described in details below. Example:

|Contacts | | |

|Person Last Name |Griffin |Fleckenstein |

|Person First Name |Julian |Scott |

|Person Mid Initial | | |

|Person Email |jlg40@mole.bio.cam.ac.uk | |

|Person Phone |44*(0)*1223766002 | |

|Person Fax | | |

|Person Address |University of Cambridge |Imperial College London |

|Person Affiliation |Department of Molecular Biology |Genetics and Genomics Research|

| | |Institute |

|Person Role |submitter;investigator |investigator |

|Person Role Source REF |OBI |OBI |

| | | |

|Protocols | | |

|Protocol Name |standard procedure 1 |grifin procedure 2 |

|Protocol Type |animal procedure |nucleic_acid_extraction |

|Protocol Type Term Source REF |OBI | |

|Protocol Description |All animal procedures conformed to |Total RNA was extracted by RNA|

| |Home Office, UK, guidelines for animal|Isolation Kit (Stratagene) |

| |welfare. Male Wistar |from the livers of Wistar rats|

| |rats (n ) 3 for each time point; |at day 0 (n ) 3), day 1 and 3 |

| |control animals fed control |(n ) 2), and day 14 (n ) 3). |

| |diet for the same time period; Charles| |

| |River UK Ltd.) were fed | |

| |either standard laboratory chow, or | |

| |chow supplemented with | |

| |1% orotic acid (Sigma Aldrich, UK) ad | |

| |libitum.5-6 Rats were | |

| |killed by cervical dislocation at days| |

| |0, 1, 3 and 14, and the left | |

| |lateral lobe of the liver excised. | |

| |Tissues were snap frozen and | |

| |stored at -80 °C. | |

|Protocol Contact | | |

|Protocol Parameter |diet;population density |Extracted Product; |

| | |Amplification |

|Protocol Parameter Type Term Souce Ref | | |

|Protocol Instruments | | |

|Protocol Instrument Name | | |

|Protocol Instrument Component | | |

|Protocol Instrument Component Term Source REF | |

|Protocol Instrument Parameter | | |

|Protocol Instrument Parameter Term Source REF | |

|Protocol Software | | |

|Protocol Software Name | | |

|Protocol Software Version | | |

|Processing Method Parameter | | |

|Processing Method Parameter Term Source REF | |

| | | |

|Factors | | |

|Factor Name |Time |Treatment |

|Factor Type | | |

|Factor Type Term Source REF |OBI |OBI |

| | | |

| | | |

|Measurements/Endpoints | | |

|Measurements/Endpoints Name |Gene Expression |Metabolite Characterization |

|Measurements/Endpoints Term Source REF |OBI |OBI |

| | | |

|Publication | | |

|PubMed ID |17203948 | |

|Publication DOI |10.1021/pr0601640 | |

|Publication Author list |Griffin JL, Scott J, Nicholson JK. | |

|Publication Title |The influence of pharmacogenetics on fatty liver disease in the |

| |wistar and kyoto rats: a combined transcriptomic and metabonomic |

| |study. |

|Publication Status |indexed for MEDLINE | |

| | | |

|Ontology Source Reference | | |

|Term Source Name |CTO |MO |

|Term Source File |

| |ail.cgi?cell |tologies/MGEDontology.php |

|Term Source Version | |1.3.0.1 |

|Term Source Description |The Cell Type Ontology |The Microarray Ontology |

| | | |

|ISA-TAB files |study-1.txt;study-1-assay-1-Mx.txt;stu| |

| |dy-1-assay-2-Tx.txt;study-1-assay-3-Cl| |

| |inChem.txt | |

3.1.1 Contact Section

Person Last Name

The last name of each person associated with the investigation or study.

Person First Name

The first name of each person associated with the investigation or study.

Person Mid Initials

The middle initials of each person associated with the investigation or study.

Person Email

The email address of each person associated with the investigation or study. Email could be used as internal identifier to reference persons within the TABs. However, if this is effective and practical, there might be privacy issues related to relying on email as person tracker.

Person Phone

The telephone number of each person associated with the investigation or study.

Person Fax

The Fax number of each person associated with the investigation or study.

Person Address

The street address of each person associated with the investigation or study.

Person Affiliation

The organization affiliation for each person associated with the investigation or study.

Person Roles

The role(s) performed by each person. Terms for this field should come from OBI. Multiple annotations or values attached to one person may be provided by using a semicolon (";") a separator, for example: "submitter;funder;sponsor”.

Person Roles Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below.

3.1.2 Protocol Section

Protocol Name

The names of the protocols used within the ISA-TAB document. These will be referenced in the Study and Assay files in the "Protocol REF" columns. Used as an identifier within the ISA-TAB document, these can also be accession values. In such case decisions about how to deal with the protocol information are left to data curators and tool implementation. For instance, an importer tool could be designed so that, in case an existing protocol is mentioned by means of a public accession, only those fields which are non empty in the ISA-TAB are updated in the target repository.

Protocol Type

The type of the protocol. Terms for this field should come from OBI.

Protocol Type Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below.

Protocol Description

A free-text description of the protocol. This text is included in a single tab-delimited field. If you wish to include tab or newline characters as part of this text, you must enclose the whole text within double quotes (").

Protocol Contact

If used, the contact should be declared in the Contact section and referenced here.

Protocol Parameters

A semicolon-delimited list of parameter names; these names are used in the Study and Assay files (as "Parameter Value []" headings) to list the values used for each protocol parameter. If more than one parameter was used for a given protocol, they should be separated with semicolons (";"). Used as an identifier within the ISA-TAB document. Terms for this field should come from OBI.

Protocol Parameter Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below. Protocol Instrument Name

The instrument used by the protocol.

Protocol Instrument Component

To indicate key part of an instrument set up. Terms for this field should come from OBI (or PSI-MS or MSI_NMR)

Protocol Instrument Component Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below.

Protocol Instrument Parameter

To indicate an important parameter attached to the Instrument. Terms for this field should come from OBI.

Protocol Instrument Parameter Term Source REF

Source REF have to match one the Term Source Names declared in annotation section, described below.

Protocol Software Name

The name of the software used in a protocol.

Protocol Software Version

The version of the software used in a protocol.

3.1.3 Factor Section

Factor Name

The name of the Factors used in the Study and/or Assays files. Factors should correspond to the independent variable manipulated by experimentalists and intended to affect biological systems in such a way that an assay is devised to measure the responses of the biological system to the perturbation by following a response variable (also known as dependent variable).

Factor Type

The study factor type should be supplied to allow for classification of Factor in categories. Terms for this field should come from OBI.

Factor Type Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below.

4 Measurements/Endpoints Section

Measurements/Endpoint Name

This allows to declare and list all the response variables (aka dependant variable) that will be assessed or quantified: i.e. gene expression, protein expression, and clinical chemistry endpoints. Terms for this field should come from OBI.

Measurements/Endpoint Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section, described below. Note that the declaration of measurement variables could be made in this Section OR at the level of the Assay Section. This is open to discussion.

3.1.5 Publication Section

PubMed ID

The PubMed IDs of the publication(s) associated with this investigation (where available).

Publication DOI

A Digital Object Identifier (DOI) for each publication (where available).

Publication Author List

The list of authors associated with each publication.

Publication Title

The title of publication associated to the investigation.

Publication Status

A term describing the status of each publication (i.e. "submitted", "in preparation", "published"). Terms for this field should come from OBI.

Publication Status Term Source REF

Source REF have to match one the Term Source Names declared in the in the annotation section, described below.

3.1.6 Ontology Source Section

This section is also from the MAGE-TAB specifications. It should be noted that not all sources of terms are ontologies, but these also include controlled vocabularies. We recommend using the ontologies posted under the OBO Foundry to maximize interoperability of the resources.

Term Source Name

The names of the Term Sources (ontologies or databases) used within the ISA-TAB document. This name will be used in all corresponding "Term Source REF" fields. Examples: OBI, GO, DO. Used as an identifier within the ISA-TAB document.

Term Source Namespace Location

A file name or valid pointer to an official resource to allow cross validation and version tracking of terms used in a submission.

Term Source Version

The version of the Term Source used throughout the ISA-TAB document.

3.1.7 ISA-TAB Files Section

A field to list all Tab separated files making up the ISA-TAB. The purpose of this listing is to ensure validation of the archive. For each study, provide a semi-colon (;) separated list of file names.

3.2 Investigation file

This file has fields organized to report information on a per-column basis. Since it is an optional component, the Investigation File has a very simple lightweight structure as detailed below.

Investigation Title

A concise name given to the investigation

Investigation Description

A textual description of the investigation

Date of Investigation Submission

To provide the date on which the investigation was reported to the repository.

Investigation Public Release Date

To provide the date on which the investigation should be released publicly.

Investigation Contact

The contact should be declared in the Contact section of the Reference file and referenced here.

PubMed ID REF

The PubMed IDs of the publication(s) associated with this investigation (where available).

Publication DOI REF

A Digital Object Identifier (DOI) for each publication (where available).

Study File Names

The name of the Study file component.

Assay File Names

The names of the Assay file component(s).

3.3 Study file

In this file the fields are grouped in two sections. In the First Section, the fields are organized to provide information on a per-column and in the Second Section on a per-row basis. Example:

|Study Identifier |Generated by database only, or | | |

| |temporary supplied by users | | |

|Study Title |The Influence of Pharmacogenetics on | | |

| |Fatty Liver Disease in the Wistar and| | |

| |Kyoto Rats: A Combined Transcriptomic| | |

| |and Metabonomic Study | | |

|Study Description |Analysis of liver tissue from rats | | |

| |exposed to orotic acid for 1, 3, and | | |

| |14 days was performed by DNA | | |

| |microarrays and high resolution 1H | | |

| |NMR spectroscopy based metabonomics | | |

| |of both tissue extracts and intact | | |

| |tissue (n ) 3). | | |

|Study Design |time course design | | |

|Study Design Term Source REF |OBI | | |

|Contact |Jules Grifin | | |

|PubMed ID REF |123434 | | |

|Publication DOI REF |10.1021/pr0601640 | | |

|Date of Study Submission |DD/MM/YYYY | | |

|Study Public Release Date |13/12/2006 | | |

| | | | |

|Source Name |Characteristics[Material Type] |Term Source REF |Characteristics[Organism] |

|Study1.animal1 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal2 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal3 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal4 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal5 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal6 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal7 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal8 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal9 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal10 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal11 |whole_organism |MO |{Term Name} Rattus norvegicus |

|Study1.animal12 |whole_organism |MO |{Term Name} Rattus norvegicus |

3.3.1 First Section

Study Identifier

A unique identifier: temporary identifier supplied by users or generated by repository / database.

It could be (but no necessarily) an identifier complying with LSID specifications.

Study Title

A concise phrase used to encapsulate the purpose and goal of the study.

Study Description

A textual description of the study, including section such as objective or goals.

Study Design

A controlled term allowing classification of the study. Terms for this field should come from OBI.

Study Design Term Source REF

Study Design Term Source REF has to match one the Term Source Names declared in the annotation section of the Reference file.

Contact

The contact should be declared in the Contact section of the Reference file and referenced here.

PubMed ID REF

The PubMed IDs of the publication(s) associated with this investigation (where available).

Publication DOI REF

A Digital Object Identifier (DOI) for each publication (where available).

Study Public Release Date

To provide the date on which the study should be released publicly.

3.3.2 Second Section

Source Name

Sources are considered as the starting biological material used in a study. Source items can be qualified using the following header: Characteristics [], Term Source REF, Unit, Provider, Description, and Comment [].

Sample Name

Samples represent major outputs resulting from a protocol application but which cannot be treated as an Extract or a Labeled Extract. Sample items can be qualified using the following header: Characteristics [], Term Source REF, Unit and Comment [].

Characteristics []

Used as a qualifying field following Source Name, Sample Name. This column contains terms describing each material according to the characteristics category indicated in the column header. For example, a column headed "Characteristics [OrganismPart]" would contain individual OrganismPart terms. These terms may be user-defined (the default), from an external ontology source (indicated using a Term Source REF column), or a measurement (indicated using a Unit [] column).

Protocol REF

This column contains references to Protocol Names defined in the Reference File, or accession numbers of protocols already present in public repositories, such ArrayExpress or Pride. Valid qualifying headers for Protocol REF item are Parameter [Value], Performer, Date, and Comment [] (which is optional)

Factor Value []

Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. Finally, FactorValue should either match biomaterial Characteristics or Protocol Parameter.

3.4 Assay file

In this file the fields are grouped in two sections. In the First Section the fields are organized on a per-column and in the Second Section on a per-row basis. Example:Double click to Activate and browse the example:

[pic]

1 First Section

Study Identifier

This allows cross-referencing the Study file to ensure efficient information tracking.

Assay Measurement/Endpoint Type

This field helps qualifying the endpoint, what is being measured, i.e. are gene expression, protein expression, methylation status, hepatic function, DNA damage. Terms for this field should come from OBI.

Assay Measurement/Endpoint Term Source REF

Source REF have to match one the Term Source Names declared in the Annotation Section of the Reference file.

Technology Type

To describe the kind of technology used to perform an assay. Example are DNA micro hybridization or Mass Spectrometry which are technologies which can be used to monitor gene expression or genotype for the first one and perform protein identification or metabolite profiling in the second case. Terms for this field should come from OBI.

Technology Type Term Source REF

Source REF have to match one the Term Source Names declared in the annotation section of the Reference file.

Contacts

The contact should be declared in the Contact section of the Reference file and referenced here.

3.4.1 Second Section

This section depends on the Assay Measurement/Endpoint Type and Technology Type fields. The following subsections provide a list of fields focusing on microarray, gel electrophoresis and mass spectrometry. However, additional work is required to complete this section and to define the format of the resulting data matrices for the different technologies, like it has been done for the microarray (FGDM in MAGE-TAB), peptide and protein identification.

3.4.1.1 Technology Type: DNA microarray

When dealing with DNA microarray technology, the allowed fields essentially correspond to those defined by the MAGE-TAB format (3).

Extract Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[]

Labeled Extract Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material

Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional)

Material Type

Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, total_RNA. Valid qualifying header for Label item is Term REF. Terms for this field should come from OBI (for ArrayExpress submissions this term should be an instance of LabelCompound from the MGED Ontology).

Label

Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy3, Cy5, biotin, alexa_546.

Valid qualifying headers for Label item are Term REF. Terms for this field should come from OBI (for ArrayExpress submissions this term should be an instance of LabelCompound from the MGED Ontology).

Hybridization Name

Used as an identifier within the ISA-TAB document.This column contains user-defined names for each Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is mandatory) and Comment [] (which is optional)

Array Design REF

This column contains references to the array design used for individual hybridizations. For ArrayExpress submissions this should be a valid accession number, i.e. "A-AFFY-33" but for the purpose of data exchange, it should be a unambiguous name such as a commercial name HG-U133A-2 in the case of an Affymetrix array.

Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional).

The values in this field are used as identifiers. They must match the references provided in the array description file (ADF). Submitters, curators and software tools should consider the use of public accessions for this value.

Scan Name

Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan Event

Valid qualifying headers for Scan Name item are Description and Comment [] (which are optional)

Image File

This optional column contains a list of image files, one for each row of the Assay file, linking these image files to their respective hybridizations. Note that ArrayExpress does not store image data due to size constraints on the database. However, in the context of an infrastructure intending to use the format for data exchange purposes, this column is valuable to include links to image files stored on local web server. Valid qualifying headers for Derived Array Data Matrix File item is Comment [] (which is optional)

Normalization Name

Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Scan Name item are Description and Comment [] (which are optional)

Array Data File

This column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective hybridizations. Valid qualifying header for Array Data File Name item is Comment [] (which is optional)

Derived Array Data File

This column contains a list of processed data files, one for each row of the SDRF file, linking these data files to their respective hybridizations. Valid qualifying header for Derived Array Data File item is Comment [] (which is optional)

Array Data Matrix File

This column contains a list of raw data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to individual hybridization via the Data Matrix format itself. Valid qualifying header for Array Data Matrix File item is Comment [] (which is optional)

Derived Array Data Matrix File

This column contains a list of processed data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to each hybridization (or scan, or normalization) via the Data Matrix format itself.

Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional)

Factor Value []

Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and factor values.

3.4.1.2 Technology Type: Gel Electrophoresis

Extract Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[]

Labeled Extract Name (where relevant)

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract. Valid qualifying headers for LabeledExtract item are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [].

Material Type

Used as an attribute column following Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, fraction.

Label

Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy2, Cy3, Cy5. A valid qualifying header for Label item is Comment []

Electrophoresis Gel Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each electrophoresis gel. Valid qualifying headers for Electrophoresis Gel Name item are Protocol, Parameter [] Comment [] (which is optional). For specific 2D applications, the following headers can be used instead:

[First Dimension Gel=Isoelectrofocusing]

[Second Dimension Gel=SDS-PAGE]

Scan Name

Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan event. Valid qualifying headers for Scan Name item are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)

Normalization Name

Used as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Normalization Name item are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)

Image File

This optional column contains a list of image files, one for each row of the Assay file, linking these image files to their respective electrophoresis events.

Raw Data File

This column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective gels. A valid qualifying headers for Raw Data File item is Comment [] (which is optional)

Processed Data File

This column contains a list of processed data files, one for each row of the Assay file, linking these data files to their respective gel runs. Valid qualifying headers for Processed Data File item is Comment [] (which is optional)

Spot Picking File

This column contains a file name pointing to files hosting protein spot coordinates and metadata for use by spot picking instruments, typically for downstream analysis by Mass spectrometry.

Factor Value []

Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and Factor Values.

3.4.1.3 Technology Type: Mass Spectrometry

Extract Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type, Description, Comment []

Labeled Extract Name (where relevant)

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material. Valid qualifying headers for LabeledExtract item are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional)

Material Type

Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: column fraction, gel excised spot.

Label

Used as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: P33, C14. Valid qualifying headers for Label item are and Term Source REF

Mass Spectrometry Run Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Assay. Valid qualifying headers are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)

The following columns can be used to annotate Assay Name columns:

Analyzer type

A field to report additional information about the analyzer. Terms should be coming from a controlled terminology. A valid qualifying header is Term Source REF.

Detector

A field to report additional information about the detector. Terms should be coming from a controlled terminology. A Valid qualifying header is Term Source REF

Raw Spectral Data File

This column contains a list of raw data files, one for each row of the Assay file.

Processed Spectral Data File

This column contains a list of raw data files, one for each row of the Assay file.

Normalized Spectral Data File

This column contains a list of raw data files, one for each row of the Assay file.

When Mass Spectrometry is used in proteomics the following information will be required.

Peptides File

This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).

Protein File

This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).

PTMs File

This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).

PTM Codes File

This data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).

Factor Value []

Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues.

When Mass Spectrometry is used in metabol/nomics a list of requirements is under development.

3.4.1.4 Technology Type: NMR Spectroscopy

Extract Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type, Description, Comment []

Material Type

Used as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, protein_extract. Valid qualifying header for Material is Term REF

NMR Run Name

Used as an identifier within the ISA-TAB document. This column contains user-defined names for each NMR run. The following columns can be used to annotate NMR Run Name columns:

Instrument

This column contains a reference to instruments declared in the protocol section.

Free Induction Decay Data File

This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Free Induction Decay Data (Assay Data File) item is Comment [] (which is optional). Refer to the Annex for a list of data format generated by most commonly used NMR instruments.

Acquisition Parameter Data File [NMR pulse sequence]

This column contains a list of files detailing acquisition parameters in particular, a file containing the NMR pulse sequence must be provided. Refer to the Annex for a list of data format generated by most commonly used NMR instruments.

Processed Spectral Data File

This column contains a list of processed spectral data files, one for each row of the Assay file. Valid qualifying header for Processed Spectral Data Name item is Comment [] (which is optional)

Normalized Spectral Data File

This column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Raw Spectral Data (Assay Data File) Name item is Comment [] (which is optional)

Factor Value []

Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues.

When NMR Spectroscopy is used in metabol/nomics the requirements for the final list of metabolites (identification and quantification) is under development.

4 Annex

Partial list of NMR instrument output formats and that would be valid for reporting on NMR spectroscopy raw data (FID) and acquisition metadata

|Vendor Data Format |Application |Spectrum File |Required Parameter Files |Optional Parameter |

| | | | |Files |

|Bruker UXNMR/XWIN-NMR |1D NMR |fid, 1r |acqus, procs |title, intrng |

| |2D NMR |ser, 2rr |acqus, acqu2s, procs, |title |

| | | |proc2s | |

|JCAMP-DX |1D NMR |*.dx; *.jdx |- |- |

| |2D NMR |*.dx; *.jdx |- |- |

|JEOL EX/GX |1D NMR |*.gxd |*.gxp |- |

|JEOL AL95 |2D NMR |*.als |- |- |

|JEOL Alpha |2D NMR |*.nmfid, *.nmf, *.nmdata, *.nmd |- |- |

|Varian FDF |1D NMR |*.fdf |procpar |- |

| |2D NMR |fid0001.fdf |procpar |- |

|Varian VNMR peaks |2D NMR |*.txt |- |- |

|Varian VNMR |1D NMR |fid, data, phasefile |procpar |text |

| |2D NMR |fid, phasefile |procpar |text |

-----------------------

Study 2

Study 1

Investigation

Reference

ADF, for microarray applications only

All data files resulting from Assays, including raw data files a, processed and normalized data files

ISArchive

Assay(s)

Assay(s)

The Investigation file is optional and only required to group two - or more- related Study, like in this example.

resulting from Assays, including raw data files a, processed and normalized data files

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download