Introduction: - Duke University



Supplementary Material for:

GSEA: A Gene Set Approach to Analyzing Molecular Profiles

Author List

February 7, 2005.

Contents of Supplemental Material:

1 Additional figures and tables for examples. 1

2 Detailed description of the GSEA method. 12

2.1 Description of complementary statistics. 1516

2.2 Description of GSEA output. 1617

2.3 Theoretical properties of gene tag and sample label permutation. 1920

3 Additional applications of GSEA. 2122

3.1 Diabetes. 23

3.2 Downs. 2425

4 Summary of top enrichment results for examples in paper 2526

4.1 Gender S1 2526

4.2 Gender S2 2627

4.3 P53 S2 2728

4.4 P53 S3 2829

4.5 Leukemia S1 2930

4.6 Leukemia S2 3031

4.7 Lung A S2. 3132

4.8 Lung B S2. 3233

5 Defining gene sets and gene set databases. 3334

6 Running GSEA with the GSEAPACK R package. 3435

7 Running GSEA under GenePattern. 3536

Additional figures and tables for examples.

This section includes supplemental figures SF0-SF2 and tables ST1-ST5 not included in the main body of the paper.

[pic]

Figure SF0. This figure compares the empirical null distribution for 5 selected gene sets (Diabetes example) before and after scaling normalization. This normalization is accomplished by dividing each null (and observed) ES score by the mean of the positive or negative scores for that gene set according to their sign. This procedure appropriately aligns the null distributions for gene sets of different sizes, prior to multiple hypotheses testing, and is motivated by the asymptotic multiplicative scaling of the Kolmogorov-Smirnov distribution as a function of size.

[pic]

Figure SF1. This figure compares the empirical null and observed distributions in the Diabetes example for a randomly generated collection of 1000 gene sets (top) and the functional gene sets (S2 database) before and after normalization (i.e., area under positive and negative density distributions equal to one). The random gene sets (top) obtain roughly equal numbers of positive and negative enrichment scores. Thus, the separate normalization of positive and negative scores makes little difference. In contrast, when the S2 gene sets are used (bottom) a larger number of sets attain negative scores. This reflects the fact that the behavior of curated and experimental gene sets is not necessarily balanced for all phenotypes. The independent normalization of positive and negative scores helps to reduce this natural imbalance when comparing observed and null score distributions. Similar imbalance can be produced by the data itself, e.g., when the distribution of genes, whose expression is positively and negatively correlated with phenotype, is unbalanced.

[pic]

Figure SF3. This figure shows the enrichment plots for the chr21 gene set in the Downs syndrome dataset using un-weighted (p=0), weighted (p=1) and over-weighted (p=2) enrichment statistics. We used GSEA to analyze gene expression profiles from bone marrow of individuals with Downs syndrome (DS, n=14) and control individuals (n=25) [Aravind add ref]. When we probe the dataset with GSEA and the un-weighted p=0 statistic using the set of all the 243 genes on chr21, we find a small enrichment signal. For p=1 the enrichment statistic is much higher but the set is still not significant when adjusting for multiple hypothesis testing (FDR = .8). For p=2 the set achieves an even higher score and significance (FDR ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download