Dose-Response Analysis of RNA-Seq Profiles in Archival ...



Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed ParaffinEmbedded (FFPE) Samples Susan D. Hester*, Virunya Bhat?, Brian N. Chorley*, Gleta Carswell*, Wendell Jones?, Leah C. Wehmas*, Charles E. Wood*§ *National Health and Environmental Effects Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, NC; ?NSF International, Ann Arbor, MI; ?Genomics-Bioinformatics, Expression Analysis Genomic Services, Q2 Solutions, Durham, NC §To whom correspondence should be addressed at MD-B105-03, 109 T.W. Alexander Drive, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27709 USA; Email: wood.charles@ Running Title: RNA-seq dose-response analysis in FFPE samples Key Words: RNA-seq, transcriptomics, dose-response, FFPE, benchmark dose, archival resources ABSTRACT [250 out of 250] Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses using RNA-sequencing in paired FFPE and frozen (FROZ) samples from two archival studies in mice, one <2 years old and the other >20 years old. Experimental treatments included 3 different doses of di(2-ethylhexyl)phthalate or dichloroacetic acid for the recently archived and older studies, respectively. Total RNA was ribo-depleted and sequenced using the Illumina HiSeq platform. In the recently archived study, FFPE samples had 35% lower total counts compared to FROZ samples but high concordance in fold-change values of differentially expressed genes (DEGs) (r2 = 0.99), highly enriched pathways (90% overlap with FROZ), and benchmark dose estimates for preselected target genes (2% difference vs FROZ). In contrast, older FFPE samples had markedly lower total counts (3% of FROZ) and poor concordance in global DEGs and pathways. However, counts from FFPE and FROZ samples still positively correlated (r2 = 0.84 across all transcripts) and showed comparable dose responses for more highly expressed target genes. These findings highlight potential applications and issues in using RNA-sequencing data from FFPE samples. Recently archived FFPE samples were highly similar to FROZ samples in sequencing quality metrics, DEG profiles, and dose-response parameters, while further methods development is needed for older lower-quality FFPE samples. This work should help advance the use of archival resources in chemical safety and translational science. Abbreviations: Akaike’s Information Criterion (AIC); Analysis of variance (ANOVA); adverse outcome pathway (AOP); transcriptional Benchmark Dose (BMDT); apical Benchmark Dose (BMDA); dichloroacetic acid (DCA); differentially expressed gene (DEG); di(2-ethylhexyl) phthalate (DEHP); Environmental Protection Agency (EPA); Expression Analysis (EA); External RNA Controls Consortium (ERCC); false discovery rate (FDR); formalin-fixed paraffin-embedded (FFPE); fresh-frozen (FROZ); Gene-Specific Analysis (GSA); Ingenuity Pathway Analysis (IPA); mode of action (MOA); quantitative polymerase chain reaction (qPCR); principal component analysis (PCA); palmitoyl-CoA oxidase (PCO); peroxisome proliferator-activated receptor-alpha (PPARα); RNA Integrity Number (RIN); RNA-sequencing (RNA-seq); reads per kilobase of transcript per million mapped reads (RPKM); ribosomal RNA (rRNA); RNA-Seq by Expectation Maximization (RSEM); Spliced Transcripts Alignment to a Reference (STAR) INTRODUCTION High-throughput whole-genome sequencing is a rapidly advancing field that provides unique opportunities in toxicological, translational, and clinical science. One of the most promising new applications of this technology is genomic profiling of archival samples. Biorepositories have immense potential value in characterizing molecular targets of environmental chemicals and pharmaceuticals without the need for de novo studies. Most samples in tissue banks are stored as formalin-fixed paraffin-embedded (FFPE) blocks. It has been estimated that over 1 billion FFPE samples exist globally across government, academic, industry, medical and veterinary school, and hospital archives (Blow 2007) and that more than 20 million FFPE samples are archived annually just in the U.S. (Waldron et al. 2012). To date, however, use of these resources for genomic research has been limited by a lack of methods for generating reliable quantitative data (Bass et al. 2014; Greytak et al. 2015; von Ahlfen et al. 2007). The major issue in FFPE genomics is RNA quality. The standard fixative used for preserving tissue samples is formaldehyde, which may alter nucleic acids through fragmentation, protein cross-links, adducts, and other chemical modifications (Adiconis et al. 2013; Bass et al. 2014; Greytak et al. 2015; Kashofer et al. 2013; Turashvili et al. 2012; Evers et al. 2011a; Karmakar et al. 2015). Tissue processing and embedding can further degrade RNA from FFPE tissues (Evers et al. 2011b). The cumulative effect is a highly fragmented and potentially “non-functional” mix of RNA species with low integrity, typically < 2.5 using the RNA Integrity Number (RIN) index (Webster et al. 2015). Importantly, the degree of RNA degradation in FFPE tissues can vary widely based on tissue type, time to fixation, fixation protocol, and age of the FFPE block, despite similar RIN values (Webster et al. 2015; Greytak et al. 2015; Ludyga et al. 2012; Ribeiro-Silva et al. 2007). Global transcriptomic sequencing (RNA-seq) offers a novel way to address many of the problems associated with degraded RNA. For high-quality fresh-frozen (FROZ) samples, RNAseq has been shown to provide a potentially more quantitative, unbiased, and comprehensive approach to measuring transcript levels compared to quantitative polymerase chain reaction (qPCR) and chip-based genomic analyses (Wang et al. 2014; Zhao et al. 2014). RNA-seq also offers distinct advantages when using fragmented RNA from FFPE tissue samples, as shown in several recent studies (Webster et al. 2015; Auerbach et al. 2014; Hedegaard et al. 2014). This work has established that RNA-seq can provide genomic profiles in at least some types of FFPE samples that are highly concordant with paired FROZ samples (Webster et al. 2015; Mittempergher et al. 2011; Zhao et al. 2014). However, prior analyses used samples from only a single treatment group (toxicology studies) or from case surveys (clinical studies). To date there has not been an evaluation of RNA-seq data from FFPE samples using quantitative doseresponse metrics. In addition, it is not clear how age in block may affect the quality of RNA-seq data from older FFPE or FROZ samples. These are important issues given the need for more standardized approaches to evaluating transcriptional responses, potential differences that may exist across archival samples due to variability in fixation protocols and storage, and the fact that conventional methods to assess RNA quality do not distinguish between high- and low-quality FFPE RNA for sequencing. In a previous study, we examined the effects of formalin fixation time and RNA preparation methods on RNA-seq profiles from FFPE liver tissues (Webster et al. 2015). This work showed that RNA-seq following ribosomal RNA (rRNA) depletion can provide strong correlation in RNA profiles between recently collected FFPE and FROZ samples. Here we applied current RNA-seq methods to evaluate dose-related responses in FFPE liver samples. We examined transcriptomic responses for two reference agents, di-2-ethylhexyl phthalate (DEHP) and dichloroacetic acid (DCA), which induce distinctive hepatic effects in mice. Both chemicals disrupt classical pathways of mitochondrial oxidative phosphorylation and fatty acid metabolism (Corton et al. 2014; U.S. EPA 2003), leading to upregulation of defined sets of target genes that can be used in dose-response analyses, including transcriptional benchmark dose (BMDT) estimation (Bhat et al. 2013; Lake et al. 2016). Our study included both recent (< 2-year) and aged (> 20-year) samples, allowing us to also evaluate the effects of block age on RNA-seq metrics. The goal of this work was to inform new ways to access genomic information from archival samples. MATERIALS AND METHODS Study Design Archival samples for this work came from two short-term studies in mice, one recent (Study 1, < 2 years old) and one older (Study 2, ~21 years old). Target agents were DEHP (CAS No: 117-81-7; 99.5% purity, Chem Service, West Chester, PA) and DCA (CAS 79-43-6, Aldrich, Milwaukee, WI) for the recent and older studies, respectively. DEHP was administered in the feed for 7 days at dietary doses of 0, 1500 (1.5k), 3000 (3k), and 6000 (6k) ppm (as described in Lake et al. 2016), while DCA was administered in the drinking water for 6 days at doses of 0, 1.0, 2.0, and 3.5 g/L (as described in DeAngelo et al. 1991). Both studies were conducted in 4 week-old male B6C3F1 mice maintained under standard housing conditions at the U.S. Environmental Protection Agency (U.S. EPA) AAALAC-accredited animal facility (Research Triangle Park, NC). Animal care and use procedures were conducted under protocols approved by the Institutional Animal Care and Use Committee of the U.S. EPA. Paired FROZ and FFPE liver samples were collected in 2013 for the DEHP study and 1994 for the DCA study. A total of 80 sample pairs were used for Study 1 (n=16 FROZ, n=16 FFPE, n=4 per dose group) and Study 2 (n=24 FROZ, n=24 FFPE, n=6 per dose group). FFPE samples were fixed initially in either fresh 4% paraformaldehyde (i.e., methanol-free 10% formalin) (Study 1) or commercial 10% buffered formalin (Study 2) for ~18 hours followed by 70% ethanol. Both the paraformaldehyde and formalin solutions contained approximately the same amount of formaldehyde (3-4% weight/volume). After histologic processing (within 18 months of collection for both studies), FFPE blocks were either stored at room temperature in a research laboratory (Study 1) or at ambient temperature in a warehouse setting (U.S. EPA Archives in Arlington, VA) (Study 2). At the time of RNA isolation, tissue age in block was ~16 months for Study 1 and ~21 years for Study 2. Corresponding FROZ samples were collected in liquid nitrogen and stored at -70 to -80°C. FROZ samples from both studies came from left lateral, caudate, and right medial lobes (mixed), while FFPE blocks contained either left lateral, caudate, and right medial liver lobes (Study 1) or left lateral and right medial lobes (Study 2). RNA Isolation Total RNA was isolated from FROZ samples using procedures described previously (Lake et al. 2016). Briefly, ~20 mg of liver tissue per sample was homogenized in RNAzol?RT (Molecular Research Center, Cincinnati, OH), purified with the RNeasy MinElute column protocol (Qiagen GmbH, Hilden, Germany), evaluated for integrity using an Agilent 2100 Bioanalyzer (Agilent Technologies GmbH, Berlin, Germany), and quantitated using a Qubit fluorometer (ThermoFisher Scientific, Waltham, MA). For FFPE samples, two sections 10-μm in thickness were collected from each tissue block using a Leica RM 2155 microtome (Leica Biosystems, Buffalo Grove, IL); treated with deparaffinization solution, lysis buffer, and protease (Qiagen); incubated at 56°C for 15 minutes and 80°C for 15 minutes; and then treated with DNAse followed by ethanol. Samples were then transferred onto an RNeasy MinElute spin column (Qiagen GmbH, Hilden, Germany), and total RNA was eluted in 15 μl of buffer and quantitated as for FROZ samples. Library Preparation and Sequencing Total RNA was converted into cDNA libraries using the Illumina TruSeq Stranded Total RNA with Ribo-Zero Gold Library Prep Kit (#RS-122-2303, Illumina, San Diego, CA). Briefly, rRNA was removed from 100 ng of total RNA by hybridization to a biotinylated probe selective for rRNA species, followed by streptavidin bead binding, purification, and fragmentation by heat in the presence of divalent cations. Double-stranded cDNA was generated by synthesizing the second strand with dUTP in place of dTTP. A single ‘A’ base was added to the 3’ end to facilitate ligation of sequencing adapters, which contained a single ‘T’ base overhang. Adapterligated cDNA was amplified by polymerase chain reaction (PCR) to increase the amount of sequence-ready library. During this amplification the polymerase stalls when it encounters a ‘U’ base, rendering the second strand a poor template. The first strand thus serves as template for amplified sequence, preserving strand information. Final cDNA libraries were analyzed for size distribution and integrity using an Agilent Bioanalyzer (DNA 1000 kit #5067-1504), quantitated by qPCR (KAPA Library Quant Kit #KK4824, KAPA Biosystems, Wilmington, MA), and normalized to 2 nM in preparation for sequencing. Samples were sequenced at Expression Analysis (EA) (EA Genomic Services, Q2 Solutions – a Quintiles Quest Joint Venture, Durham, NC) using the Illumina HiSeq platform. Pooled libraries were bound to the surface of a 2x50bp-paired end flow cell, and each bound template molecule was clonally amplified up to 1000-fold to create individual clusters. Four fluorescentlylabeled nucleotides were then passed over the surface of the flow cell and incorporated into each nucleic acid chain. Each nucleotide label acts as a terminator for polymerization, ensuring that a single base is added to each nascent chain during each cycle. Fluorescence was measured during each cycle to identify the base that was added to each cluster. The dye is then enzymatically removed to allow incorporation of the next nucleotide during the subsequent cycle. Samples for Studies 1 and 2 were run in independent batches. RNA-Seq Analysis After sequencing, basecall files were converted into FASTQ output files using Illumina CASAVA software and other tools developed at EA (available at ). FASTQ data were demultiplexed, and sequencing adapters and other low quality bases were clipped to prepare reads for alignment. Reads were aligned to sequences from the External RNA Controls Consortium (ERCC) to assess the success of library construction and sequencing. A subset of the reads (~1 million) was aligned to other spiked-in control sequences (PhiX and other Illumina controls used during library preparation), residual sequences (globin and rRNA), and poly-A/T sequences that persisted after clipping. Reads were also aligned to a sampling of intergenic regions to assess the level of DNA contamination. Quality parameters for input RNA and RNA-seq data were evaluated for all samples (Table 1, Supplemental Table 1). One FROZ sample in Study 1 (1.5k ppm dose group) was identified as an outlier based on global principal component analysis (PCA) and highly variable genome coverage of RNA-seq data (> 4-fold higher than other samples) and dropped from subsequent analyses (Supplemental Fig. 1A). Two different RNA-seq analysis pipelines were used to generate data matrices, RNA-Seq by Expectation Maximization (RSEM) v9 (Li and Dewey 2011), which was run at EA, and Partek Flow NGS? v4.0.15 (Partek Inc., St. Louis, MO) (Li et al. 2010), which was run at EPA. Both pipelines used Spliced Transcripts Alignment to a Reference (STAR) v2.4 for alignment and Expectation-Maximization algorithm for quantitation of gene counts. Global PCA plots and dissimilarity dendrograms were based on log-transformed upper quartile-normalized gene counts generated by RSEM. All other transcript-level analyses were conducted using the Partek Flow pipeline. Briefly, clipped FASTQ files were aligned to the Mus musculus reference genome (GRCm38/mm10) and transcriptome (RefSeq transcript 2014-10-17). Differentially expressed genes (DEGs) were then assessed using Partek Gene-Specific Analysis (GSA) with a false discovery rate (FDR)-adjusted p value < 0.05. Reads were evaluated using both raw gene counts and reads per kilobase of exon per million mapped reads (RPKM)-normalized counts (Mortazavi et al. 2008; Li et al. 2014a) within the GSA algorithm. Subsequent analyses were used to evaluate genic distribution mapping and target genes responses, including regression plots, PCA by treatment group, pathway analyses, individual gene counts, and benchmark dose (BMD) analyses. Raw (i.e., non-RPKMnormalized) counts for Acot and Cyp4a series genes are presented in supplemental materials for reference. For correlation analysis of dose groups in Study 1, normalized counts were segregated into matrices for FROZ and FFPE pairs and reduced to those transcripts with average RPKM values > 10. Only transcripts remaining in both matrices were considered for further analysis. The log2 fold-change (FC) ratio (average treated to average control) was calculated for all common DEGs at each dose level and analyzed by simple regression between FROZ and FFPE samples. Pathways analysis of DEGs was conducted using Ingenuity Pathway Analysis (IPA) Knowledgebase (Qiagen Ingenuity, 2014). Full RNA-seq FASTQ datasets are available through the Gene Expression Omnibus (ncbi.nlm.geo/) Accession Number GSE78962. Dose-Response Modeling Benchmark dose analysis was used to estimate the threshold dose at which specific genes were differentially altered from controls. Transcriptional benchmark dose (BMDT) analysis of liver RNA-seq data was conducted using BMDExpress version 1.4.1 (Thomas et al. 2007; Yang et al. 2007) () with modification. Briefly, the estimated consumed intake (mg/kg-day) served as the dose metric for both DEHP and DCA. Normalized linear RNA read counts for Acot and Cyp4a series genes were fit as continuous data to the Hill, linear, 2° polynomial, and power dose response models. Each model was run assuming constant variance and the benchmark response factor was set to 1.349 multiplied by the standard deviation in the control animals to estimate a BMDT with a 10% increase in tail area (Thomas et al. 2007). For best-fit model selection, a nested likelihood ratio test was performed on the linear and polynomial models. If the more complex model provided a significantly improved fit (p < 0.05), the more complex model was selected. The simpler model was selected if the more complex model did not provide a significantly improved fit (Posada and Buckley 2004). Models with a global goodness-of-fit p value < 0.1 were excluded. To select the best globally-fitting model with the least complexity, the Akaike’s Information Criterion (AIC) for the selected polynomial model was then compared with the AIC for the Hill and power models. The model with the lowest AIC was selected as the final model. In cases where the ‘k’ slope parameter for the Hill model was out of bounds, the Hill model was excluded from the final selection. Unacceptable fits were considered probe sets with a BMDT value greater than the highest dose and/or a global goodness-of-fit p value < 0.1. Other Statistical Analyses Other data were analyzed using SAS statistical package v9.4 (SAS Institute, Cary, NC) and Microsoft Excel (Redmond, WA). Continuous data were first screened for normality and homogeneity of variance. Analysis of variance (ANOVA) was used to determine group differences for data meeting these criteria. For nonparametric data, group differences were evaluated using a Kruskal-Wallis ANOVA followed by Wilcoxon rank-sum pairwise tests. Pairwise comparisons included each treatment group versus the respective control group within each sample type (i.e., FROZ or FFPE). A Bonferroni multiple-comparison correction was applied to all pairwise p values to adjust for the number of direct comparisons within each treatment arm. Dose response trends in count number were evaluated using a linear-trend test. Counts between FFPE and FROZ conditions were analyzed using simple regression analysis. RESULTS Case Study 1: DEHP (< 2 year-old samples) RNA-seq quality metrics. Mean RIN values were higher for FROZ (7.5) compared to FFPE (2.2) samples (p < 0.01), while 260/280 ratios did not differ significantly by sample type (Table 1). Total clusters, reads, read quality and length, and genes detected were comparable between FFPE and FROZ samples (< 5% difference for all) (Table 1, Supplemental Table 1). The percent of reads aligned to either the transcriptome or genome was high for both sample types but significantly lower for FFPE (86%) compared to FROZ (95%) samples (p < 0.01). One of the most notable differences was in the number of reads aligned to the transcriptome, which was 29% lower for FFPE (32 million) compared to FROZ (45 million) samples (p < 0.01). Other differences included a 5-fold higher overall rate of base insertions or deletions in FFPE (0.046%) compared to FROZ (0.009%) samples and modestly higher C/T ratio among mapped bases. Global transcriptional responses. Principal component and cluster analyses showed distinct grouping of FFPE and FROZ samples (Fig. 1A; Supplemental Fig. 1A, 1B). However, variability among individual samples (Fig. 1A) and between control and DEHP-treated groups (Supplemental Fig. 1A) was similar within each preservation method, indicating a consistent bias for FFPE samples. Unsupervised cluster analysis showed low overall dissimilarity (1 - correlation < 0.05) across all samples (Supplemental Fig. 1B). Globally, there was a modest decrease in the percent of reads fully aligned within exonic regions in FFPE (36%) compared to FROZ (50%) samples (and corresponding increase in the percent of intronic and intergenic alignments), but these changes were relatively stable across individual samples (Fig. 1B). When stratified by dose group, total aligned reads were 24-36% lower and total (raw) counts were 3447% lower in FFPE compared to FROZ samples (Supplemental Fig. 1C). Differentially expressed genes. A total of 2300 unique DEGs were identified with RNA-seq analysis of FROZ samples compared to 2581 (+12%) in FFPE samples (Supplemental Table 2). Of the FROZ DEGs, 1680 (73%) were represented in FFPE DEGs. Overlap between FFPE and FROZ DEGs ranged from 70% to 86% across DEHP dose groups. DEGs represented in both FROZ and FFPE samples with > 1.5 FC from control and FDR-adjusted p < 0.05 showed a strong positive relationship in log2 FC values observed at each dose level of DEHP (r2 > 0.98 for all) (Fig. 1C), indicating consistent patterns of gene induction and repression between FROZ and FFPE samples. Pathway level responses. Signaling pathways altered by DEHP were evaluated in IPA using DEG lists identified independently in FROZ and FFPE samples. Nine out of the 10 highestranked canonical pathways were the same in both FROZ and FFPE samples, and the ratio of DEGs to total genes in each pathway was highly correlated (r2 = 0.91) (Table 2). Enriched pathways included mitochondrial dysfunction, oxidative phosphorylation, LPS/IL-1-mediated inhibition of RXR function, and fatty acid β-oxidation, consistent with known cellular effects of PPARα activation (Corton et al. 2014). Target gene responses. We next examined concordance in RNA-seq counts for individual genes in the Acot and Cyp4a gene families, which are established targets of DEHP (Lake et al. 2016). Total raw count numbers were consistently lower for target genes in FFPE compared to FROZ samples, similar to global differences in aligned reads and total counts. This decrease was marginally higher in DEHP-treated groups (-38%, -45%, and -41% for 1.5k, 3k, and 6k doses, respectively) compared to controls (-31%) across all 21 Acot and Cyp4a targets represented (Supplemental Table 3). Highly concordant dose relationships were maintained across high-, medium-, and low-expressed transcripts, represented here by Acot1, Acot2, and Acot3, respectively, when analyzed as RPKM-normalized counts (Fig. 2A), raw counts, total raw counts divided by all total counts (per sample), or FC difference in counts from controls. RPKMnormalized count numbers were strongly correlated between FROZ and FFPE pairs (r2 > 0.99 for Acot1, Acot2, and Acot3). Similar dose response patterns between FROZ and FFPE samples were also observed for genes in the Cyp4a series (Supplemental Table 3), including Cyp4a10, Cyp4a14, and Cyp4a31, which were preselected as high-, medium-, and low-expressed targets, respectively. These genes increased with DEHP dose and showed high dynamic range in counts and FC values and highly concordant dose-response patterns between FROZ and FFPE samples (Fig. 2B). Benchmark dose estimates of target pathways and genes. Transcriptional BMDs were determined as a measure of potency for key pathways and genes altered by DEHP treatment. FROZ and FFPE samples had similar BMDT estimates for known target pathways of DEHP (Corton et al. 2014), including PPARα activation (-5% for FFPE), fatty acid β-oxidation (+8% for FFPE), and cholesterol biosynthesis (-14% for FFPE) (Supplemental Fig. 2). BMDT estimates were also similar for reference gene targets, including Acot (-8% for FFPE on average) and Cyp4a (+25% for FFPE on average) series (Fig. 3). Transcriptional dose responses in both FFPE and FROZ samples at 7 days were compared to estimated potency for the induction of early apical effects at 7 and 28 days and for tumorigenic effects at 2 years (Wood et al. 2014; Lake et al. 2016). Lower BMDT estimates for Acot and Cyp4a gene targets were 43 mg/kg-day for FROZ samples and 57 mg/kg-day for FFPE samples, compared to 37 mg/kg-day for BMDA estimates of early apical events (Supplemental Table 4). BMDT and BMDA estimates were also concordant for individual biochemical and cellular parameters. For example, the BMDA for palmitoyl-CoA oxidase (PCO) activity at 28 days was 114 mg/kg-day, which was similar to the BMDT values for Acot1 at 7 days (119 and 87 mg/kgday for FROZ and FFPE samples, respectively). The Acot1 gene codes for an acyl-CoA thioesterase enzyme with high PCO hydrolase activity, highlighting alignment between genomic and non-genomic responses. The BMDT estimates for the most sensitive gene target at 7 days, Cyp4a10 (43 and 57 mg/kg-day for FROZ and FFPE samples, respectively), were also comparable to those for hepatocellular carcinoma (71 mg/kg-day) and hepatocellular adenoma or carcinoma (35 mg/kg-day) induced by DEHP at 2 years, likely via PPARα activation (Corton et al. 2014; Wood et al. 2014). Case Study 2: DCA (21 year-old samples) RNA-seq quality metrics. Overall RNA quality was lower in Study 2 compared to Study 1 for both FFPE and FROZ samples. For FROZ samples, Study 2 had lower RINs (3.1) compared to Study 1 (7.5) and lower total reads (-30%), mapped reads (-18%), and reads aligned to the transcriptome (-39%) on RNA-seq (Table 1, Supplemental Table 1). However, the 21 year-old FROZ samples still had high overall read alignment (96%) and mapping (95%) and comparable values to Study 1 for read quality score, number of genes detected, and % reads mapped (< 5% difference for all). Post-alignment gene region distributions were highly stable in the older FROZ samples with 49% of the total reads mapping to exonic regions. Within Study 2, FFPE and FROZ samples differed markedly in total reads (-30%), genes detected (-67%), mapped reads (-79%), and reads aligned to the transcriptome (-88%). Similar to Study 1, there was also a 5-fold greater rate of insertions and deletions in FFPE compared to FROZ samples. Of note, there were 3 FROZ samples in the 2.0 g/l dose group that had low total reads, alignments, and counts (~70% lower than overall average); reasons for this discrepancy are unclear but all 3 low-read samples were run on the same sequencer lane as 5 FFPE samples, raising the possibility that lower-quality FFPE RNA may result in lower complexity of the sequencing library (Head et al. 2015) or interfere in some other way with sequencing reactions of higher-quality RNA when run together. Global transcriptional responses. Global analyses further highlighted the large differences between older FROZ and FFPE samples. FROZ samples had distinct grouping by dose and relatively small within-group variation, while FFPE samples showed a clear loss of dose group separation on PCA (Supplemental Fig. 3A). Dissimilarity (1 - correlation) on cluster analysis was < 0.05 for FROZ samples and > 0.6 for FFPE samples, in contrast to Study 1 (Supplemental Fig. 3B). Decreases in total raw counts in FFPE compared to FROZ samples were consistent across dose groups as in Study 1, but there was far greater magnitude of change (-96 to 98% across all groups) (Supplemental Fig. 3C). Despite these marked differences between sample types, there was positive correlation in overall counts across all transcripts detected (r2 = 0.84) (Fig. 4A). This correlation was only slightly lower for all genes with < 100 total counts (r2 = 0.78) but absent at lower count thresholds (e.g., r2 = 0.12 at < 20 counts) (Supplemental Fig. 4). The distribution of aligned reads to fully exonic regions was moderately lower in FFPE samples (33%) compared FROZ samples (49%) and did not differ across dose groups (Fig. 4B). Differentially expressed genes and pathways. A total of 6402 unique DEGs were identified on RNA-seq analysis of FROZ samples compared to 188 in FFPE samples (-97%). FROZ DEG numbers were 946, 4374, and 5114 for 1.0, 2.0, and 3.5 g/l DCA groups, respectively. Of the FROZ DEGs, only 76 (1%) were represented in FFPE DEGs. Pathway level analysis of FROZ DEGs for DCA showed enrichment for many of the same pathways seen for DEHP in Study 1, including mitochondrial dysfunction and oxidative phosphorylation (data not shown). However, no concordance was noted among the highest-ranked pathways between older FROZ and FFPE samples in Study 2. Target gene responses. For DCA targets, we again selected Acot and Cyp4a series genes based on pilot microarray data (not shown here) and previously published evidence that DCA may induce fatty acid metabolizing genes in the liver (Corton et al. 2014, Yoo et al. 2015). Raw target gene counts were dramatically lower in FFPE compared to FROZ samples (-99% overall), to the point that only two preselected targets (Cyp4a10 and Cyp4a14) had > 100 counts/group (Supplemental Table 4). For the latter genes, there were significant dose trends in FFPE samples (p < 0.001) and positive correlation between FROZ and FFPE pairs (r2 ≥ 0.67 for both) (Fig. 4C). Benchmark dose estimates of target genes. Acceptable BMDT estimates were identified for all preselected Acot (267-366 mg/kg-day) and Cyp4a (277-284 mg/kg-day) gene targets in FROZ samples. In FFPE samples, acceptable fits were identified only for Acot2 (420 mg/kg-day) (data not shown) and Cyp4a10 and Cyp4a14 (290-305 mg/kg-day) (Fig. 5A, 5B). Targets with unacceptable fits (BMDT > high dose) all had low count numbers (< 75 total raw counts/group), while the two genes with the highest number of counts, Cyp4a10 and Cyp4a14, had BMDT values in FFPE samples that were similar to those identified in FROZ samples (+4% and +7%, respectively). DISCUSSION Archival FFPE samples represent a vast resource that to date has been largely unused for genomic research. The goal of this study was to evaluate transcriptomic dose responses in FFPE samples using current RNA-seq methods. Our findings demonstrate strong quantitative concordance between recently archived FFPE and FROZ samples, including high correlation in expression changes across and within dose groups, DEG number and overlap, target pathways, and BMDT estimates from RNA-seq profiles. Older FFPE samples (> 20 years old) had markedly lower read counts but still showed a strong positive relationship with FROZ samples in global counts and dose responses for highly expressed target genes. These findings indicate that RNAseq profiles from FFPE samples can be used in pathway discovery, development, and doseresponse analysis, but that additional methods may be needed to improve RNA/data quality for older specimens. Our findings highlight several considerations when using RNA-seq in FFPE samples. The first issue is age in block, which clearly impacted RNA quality and dose responses observed on RNA-seq. We found large differences in read coverage and other metrics between recently archived and older FFPE samples and higher variability across older FFPE samples, despite similar fixation conditions and consistent processing and storage. (All older FFPE samples were kept in the same cardboard box for > 20 years.) Prior genomic studies have reported mixed effects of storage time or age in block. Several microarray and qPCR studies found lower RNA integrity and reduced amplification or hybridization for FFPE samples stored ≥ 2 years (reviewed in Bass et al. 2014), while more recent RNA-seq studies have not reported clear effects of age in block. In Webster et al. (2015), ribo-depleted RNA from 8-, 19-, and 26-yearold control FFPE blocks had similar genomic distribution of mapped reads and percent GC content. However, samples did cluster by age, and the percent of reads mapping to exonic regions (~10-15%) was lower than recently archived (33-40%) and older (22-50%) FFPE samples analyzed here. Archival samples used in Webster et al. (2015) were kept in formalin for extended periods of time before processing (up to 6 months), and it is possible that much of the age-dependent RNA degradation had already occurred by 8 years but was not readily apparent given that no treatment responses were evaluated. In Hedegaard et al. (2014), storage time (remarkably) did not reduce concordance in RNA-seq expression between FROZ and FFPE human cancer specimens stored for 2 to 13 years. The authors also reported genic distributions for older FFPE cancer samples up to 20 years old, which had 10-38% of reads mapped to exonic regions and comparable heterogeneity to that observed in older FFPE samples here. While more work is needed to identify age or storage factors, it is clear from our results that time in block should be considered when using older or variably aged specimens and that comprehensive quality control data should be presented for any archival RNA-seq experiment. A second issue is lower aligned reads/counts. Our data from recently archived FFPE samples showed decreased aligned reads to the transcriptome (-29%) and total counts for all genes (-35%) compared to FROZ samples but consistent patterns across control and treatment groups, so that no clear bias in dose responses was observed. Given the dynamic range of count data, we also did not observe any loss of sensitivity, at least for the primary target genes surveyed. In contrast, older FFPE samples had dramatically lower aligned reads to the transcriptome (-88%) and total gene counts (-97%), raising the question of minimum count numbers required for valid quantitation. A recent report by the SEQC/MAQC-III Consortium estimated that approximately 16 counts are needed for detection and quantitation of a given transcript (2014). On a per sample basis, this equates to at least 96 counts per group in Study 2, which fits with the observed sharp decline in correlation between FROZ and FFPE samples below 100 total raw counts (at a transcript or dose group level). For BMDT analysis, similar estimates were observed between FROZ and FFPE samples for high-expressed target genes such as Cyp4a10 and Cyp4a14 (801 and 1180 counts/group, respectively) but not for other targets (all < 150 counts/group), suggesting a minimum threshold range of ~200-1000 counts/group may be needed for reliable BMDT results. A related issue is the extent to which greater read depths may compensate for decreased count numbers in lower-quality FFPE samples. The SEQC/MAQC-III Consortium (2014) reported that 10 million mapped fragments were generally required for differential expression analysis of more strongly expressed genes. In the current study, average reads aligned to the transcriptome far exceeded this threshold for recently archived FFPE samples (31.7 million) but not older FFPE samples (3.4 million), likely explaining the poor concordance in DEGs and target pathways. Interestingly, the SEQC/MAQC-III Consortium continued to identify new (lowexpressed) genes even at extremely high read depths of > 1 billion fragments, suggesting that greater depth of sequencing (beyond the target read depth of 25-30 million used here) may improve gene detection and count metrics in lower-quality samples. Several recent studies have demonstrated the usefulness of BMDT estimates for short-term genomic markers in predicting longer-term phenotypic outcomes (Thomas et al. 2013; Bhat et al. 2013; Geter et al. 2014; Lake et al. 2016). Here, in recently archived samples the 7-day BMDT potency estimates for marker genes and pathways were highly similar between FROZ and FFPE samples and concordant with previous BMDA estimates for endpoints such as PCO activity at 4 weeks and hepatocellular carcinoma at 2 years (Wood et al. 2014). In contrast, BMDT plots from older lower-quality FFPE samples from Study 2 had higher variability, overlapping error bars, and linear slopes. Still, BMDT estimates for the most highly expressed target genes (Cyp4a10 and Cyp4a14) had acceptable fits and were only 4-7% different between FROZ and FFPE samples. This work supports that idea that RNA-seq data from FFPE samples can be used to identify chemical/drug targets and pathways and readily adapted to BMDT analyses, even when there is some loss of genomic signal. Findings presented here add to current guidance for using RNA-seq with archival FFPE samples. General pre-analytical considerations include a minimal postmortem interval to fixation, RNase-free conditions during sectioning, reducing excess paraffin prior to RNA extraction, and short fixation time in formalin (≤ 24 hours if possible) (Bass et al. 2014; Greytak et al. 2015; Webster et al. 2015). A ribo-depletion protocol is recommended for library preparation (Li et al. 2014b; Webster et al. 2015). Poly-A selection of mRNAs can bias sequence data from degraded samples and decrease sequencing efficiency due to rRNAs, which comprise > 90% of total RNA samples. It is also recommended to optimize any new FFPE sequencing protocols using a matched fresh or frozen cohort (Greytak et al. 2015) and to use storage conditions with low temperature variation and ambient moisture levels (Waldron et al. 2012). Because of variability across older FFPE samples, higher sample sizes (≥ 6/group) and target sequencing depth (> 30 million reads) may also be warranted for lower-quality FFPE RNA. Our results indicate that FFPE samples may vary widely in their suitability for quantitative genomic analyses. Currently, however, there are no clear metrics for evaluating the quality of RNA from FFPE samples, given that RIN values are all typically < 2.5. This issue is important for determining the adequacy of degraded samples for RNA-seq, the amount of RNA to use (given “non-functional” RNA), and the most appropriate analytical approaches for sample sets with variable quality. Findings from older samples in this study also highlight the need for better methods to improve the quality of RNA from formalin-fixed samples prior to sequencing. Previous studies have proposed different buffer conditions (Evers et al. 2011a) or biofunctional catalysts (Karmakar et al. 2015) to “demodify” RNA damaged by formalin fixation, but to date no studies have specifically examined RNA-seq profiles using these methods. Finally, there is a need going forward for better bioinformatics approaches to enhance analysis of RNA-seq data generated from lower-quality FFPE samples. Latter methods may include modified data normalization protocols, exclusion of small transfer RNA and pre-rRNA species, and other corrections to reduce systematic variation (Li et al. 2014a). Toxicological science is moving toward more prospective evaluations based on quantitative pathway-based models (Simon et al. 2014; Meek et al. 2014). These approaches require clear linkage between early molecular endpoints and the ultimate adverse health outcome of interest. Genomic data provide an important biological calibrator for relating early and late effects and for bridging data across different model systems. Archival FFPE samples represent an important resource for addressing these gaps, since in many cases the pathologic or phenotypic outcome is already known (Waldron et al. 2012). By facilitating the use of samples in-hand, the methods described here should support “higher-throughput” target identification and development. Other applications of FFPE genomics include read-across between preclinical and clinical samples, dose modeling, and biomarker development and validation. SUPPLEMENTARY DATA Supplementary data are available online at . FUNDING This work was supported by the U.S. EPA Office of Research and Development. DISCLOSURES Wendell Jones is employed by EA Genomic Services, Q2 Solutions. ACKNOWLEDGEMENTS The authors would like to thank staff at EA for generation of RNA-seq data and bioinformatics support; Dr. William Ward for assistance with RNA-seq data analyses; and Dr. Russell S. Thomas and Dr. J. Christopher Corton for critical review of this manuscript. The authors also thank Dr. Anthony B. DeAngelo and Michael H. George for providing samples for the 21 yearold study. The research described in this article has been reviewed by the U.S. EPA and approved for publication. Approval does not signify that the contents necessarily reflect the views and the policies of the Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. AUTHOR CONTRIBUTIONS Author contributions include the following: S.D.H. and C.E.W. designed and coordinated the study and drafted the manuscript; S.D.H., W.J., L.C.W., and C.E.W. performed statistical analyses; V.B. conducted benchmark dose analyses; and B.N.C. and G.C. assisted with sample preparation, data management, and interpretation of results. All authors read and approved the final manuscript. REFERENCES Adiconis, X., Borges-Rivera, D., Satija, R., DeLuca, D.S., Busby, M.A., Berlin, A.M., Sivachenko, A., Thompson, D.A., Wysoker, A., Fennell, T., Gnirke, A., Pochet, N., Regev, A., and Levin, J.Z. (2013). Comparative analysis of RNA sequencing methods for degraded or lowinput samples. Nat Methods 10(7), 623-9. Auerbach, S.S., Phadke, D.P., Mav, D., Holmgren, S., Gao, Y., Xie, B., Shin, J.H., Shah, R.R., Merrick, B.A., and Tice, R.R. (2015). RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights. J Appl Toxicol 35(7), 766-80. Bass, B.P., Engel, K.B., Greytak, S.R., and Moore, H.M. (2014). A review of preanalytical factors affecting molecular, protein, and morphological analysis of formalin-fixed, paraffinembedded (FFPE) tissue: how well do you know your FFPE specimen? Arch Pathol Lab Med 138(11), 1520-30. Bhat, V.S., Hester, S.D., Nesnow, S., and Eastmond, D.A. (2013). Concordance of transcriptional and apical benchmark dose levels for conazole-induced liver effects in mice. Toxicol Sci 136(1), 205-15. Blow, N. (2007). Tissue preparation: Tissue issues. Nature 448(7156), 959-63. Corton, J.C., Cunningham, M.L., Hummer, B.T., Lau, C., Meek, B., Peters, J.M., Popp, J.A., Rhomberg, L., Seed, J., and Klaunig, J.E. (2014). Mode of action framework analysis for receptormediated toxicity: The peroxisome proliferator-activated receptor alpha (PPARα) as a case study. Critic Rev Toxicol 44, 1-49. David, R.M., Moore, M.R., Cifone, M.A., Finney D.C., and Guest, D. (1999). Chronic peroxisome proliferation and hepatomegaly associated with the hepatocellular tumorigenesis of di (2ethylhexyl) phthalate and the effects of recovery. Toxicol Sci 50, 195-205. DeAngelo, A.B., Daniel, F.B., Stober, J.A., and Olson, G.R. (1991). The carcinogenicity of dichloroacetic acid in the male B6C3F1 mouse. Fundam Appl Toxicol 16, 337-47. Evers, D.L., Fowler, C.B., Cunningham, B.R., Mason, J.T., and O'Leary, T.J. (2011a). The effect of formaldehyde fixation on RNA: optimization of formaldehyde adduct removal. J Mol Diagn 13(3), 282-8. Evers, D.L., He, J., Kim, Y.H., Mason, J.T., and O'Leary, T.J. (2011b). Paraffin embedding contributes to RNA aggregation, reduced RNA yield, and low RNA quality. J Mol Diagn 13(6), 687-94. Gallego Romero, I., Pai, A.A., Tung, J., and Gilad, Y. (2014). RNA-seq: impact of RNA degradation on transcript quantification. BMC Biol. 12, 42. Geter, D.R., Bhat, V.S., Gollapudi, B.B., Sura, R., and Hester, S.D. (2014). Dose-response modeling of early molecular and cellular key events in the CAR-mediated hepatocarcinogenesis pathway. Toxicol Sci 138(2), 425-45. Greytak, S.R., Engel, K.B., Bass, B.P., and Moore, H.M. (2015). Accuracy of Molecular Data Generated with FFPE Biospecimens: Lessons from the Literature. Cancer Res 75(8), 1541-7. Head, S.R., Komori, H.K., LaMere, S.A., Whisenant, T., Van Nieuwerburgh, F., Salomon, D.R., and Ordoukhanian, P. (2014). Library construction for next-generation sequencing: overviews and challenges. Biotechniques 56(2), 61-77. Hedegaard, J., Thorsen, K., Lund, M.K., Hein, A.M., Hamilton-Dutoit, S.J., Vang, S., Nordentoft, I., Birkenkamp-Demtr?der, K., Kruh?ffer, M., Hager, H., et al. (2014). Nextgeneration sequencing of RNA and DNA isolated from paired fresh-frozen and formalin-fixed paraffin-embedded samples of human cancer and normal tissue. PLoS One 9(5), e98187. Ludyga, N., Grünwald, B., Azimzadeh, O., Englert, S., H?fler, H., Tapio, S., and Aubele, M. (2012). Nucleic acids from long-term preserved FFPE tissues are suitable for downstream analyses. Virchows Arch 460(2), 131-40. Meek, M.E., Boobis, A., Cote, I., Dellarco, V., Fotakis, G., Munn, S., Seed, J., and Vickers, C. (2014). New developments in the evolution and application of the WHO/IPCS framework on mode of action/species concordance analysis. J Applied Toxicol 34(1), 1-18. Mittempergher, L., de Ronde, J.J., Nieuwland, M., Kerkhoven, R.M., Simon, I., Rutgers, E.J., Wessels, L.F., and Van't Veer, L.J. (2011). Gene expression profiles from formalin fixed paraffin embedded breast cancer tissue are largely comparable to fresh frozen matched tissue. PLoS One 6(2), e17163. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7), 621-8. Karmakar, S., Harcourt, E.M., Hewings, D.S., Scherer, F., Lovejoy, A.F., Kurtz, D.M., Ehrenschwender, T., Barandun, L.J., Roost, C., Alizadeh, A.A., and Kool, E.T. (2015). Organocatalytic removal of formaldehyde adducts from RNA and DNA bases. Nat Chem 7(9), 752-8. Kashofer, K., Viertler, C., Pichler, M., and Zatloukal, K. (2013). Quality control of RNA preservation and extraction from paraffin-embedded tissue: implications for RT-PCR and microarray analysis. PLoS One 8(7), e70714. Lake, A.D., Wood, C.E., Bhat, V.S., Chorley, B.N., Carswell, G.K., Sey, Y., Kenyon, E., Padmos, B., Moore, T.M., Tennant, A., Schmid, J.M., George, B.J., Ross, D., Hughes, M.F., Corton, J.C., Simmons, J.E., McQueen, C.A., and Hester, S.D. (2016). Dose and Effect Thresholds for Early Key Events in a PPARα–Mediated Mode of Action. Toxicol Sci 149, 312-25. Li, B., Ruotti, V., Stewart, R.M., Thompson, J.A., and Dewey, C.N. (2010). RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493-500. Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323. Li, S., Labaj, P.P., Zumbo, P., Sykacek, P., Shi, W., Shi, L., Phan, J., Wu, P.Y., Wang, M., Wang, C., et al. (2014a). Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol 32, 888-95. Li, S., Tighe, S.W., Nicolet, C.M., Grove, D., Levy, S., Farmerie, W., Viale, A., Wright, C., Schweitzer, P.A., Gao, Y., et al. (2014b). Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol 32, 915-25. Mittempergher, L., de Ronde, J.J., Nieuwland, M., Kerkhoven, R.M., Simon, I., Rutgers, E.J., Wessels, L.F., and Van't Veer, L.J. (2011). Gene expression profiles from formalin fixed paraffin embedded breast cancer tissue are largely comparable to fresh frozen matched tissue. PLoS One 6(2), e17163. Parrish, J.M., Austin, E.W., Stevens, D.K., Kinder, D.H., Bull, R.J. (1996). Haloacetate-induced oxidative damage to DNA in the liver of male B6C3F1 mice. Toxicology 110(1-3), 103-11. Posada, D., and Buckley, T.R. (2004). Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 53(5), 793-808. Ribeiro-Silva, A., Zhang, H., and Jeffrey, S.S. (2007). RNA extraction from ten year old formalin-fixed paraffin-embedded breast cancer samples: a comparison of column purification and magnetic bead-based technologies. BMC Mol Biol 8, 118. SEQC/MAQC-III Consortium. (2014). A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32, 903-14. Simon, T.W., Simons, S.S., Preston, R.J., Boobis, A.R., Cohen, S.M., Doerrer, N.G., FennerCrisp, P.A., McMullin, T.S., McQueen, C.A., and Rowlands, J.C. (2014). The use of mode of action information in risk assessment: Quantitative key events/dose-response framework for modeling the dose-response for key events. Crit Rev Toxicol 44(S3), 17-43. Thai, S.F., Allen, J.W., DeAngelo, A.B., George, M.H., and Fuscoe, J.C. (2003). Altered gene expression in mouse livers after dichloroacetic acid exposure. Mutat Res 543(2), 167-80. Thomas, R.S., Pluta, L., Yang, L., and Halsey, T.A. (2007). Application of genomic biomarkers to predict increased lung tumor incidence in 2-year rodent cancer bioassays. Toxicol Sci 97(1), 55-64. Thomas, R.S., O'Connell, T.M., Pluta, L., Wolfinger, R.D., Yang, L., and Page, T.J. (2007). A comparison of transcriptomic and metabonomic technologies for identifying biomarkers predictive of two-year rodent cancer bioassays. Toxicol Sci 96(1), 40-6. Thomas, R.S., Wesselkamper, S.C., Wang, N.C., Zhao, Q.J., Petersen, D.D., Lambert, J.C., Cote, I., Yang, L., Healy, E., Black, M.B., Clewell, H.J., Allen, B.C., and Andersen, M.E. (2013). Temporal Concordance between Apical and Transcriptional Points of Departure for Chemical Risk Assessment. Toxicol Sci 134(1), 180-94. Turashvili, G., Yang, W., McKinney, S., Kalloger, S., Gale, N., Ng, Y., Chow, K., Bell, L., Lorette, J., Carrier, M., Luk, M., Aparicio, S., Huntsman, D., and Yip, S. (2012). Nucleic acid quantity and quality from paraffin blocks: defining optimal fixation, processing and DNA/RNA extraction techniques. Exp Mol Pathol 92(1), 33-43. U.S. EPA. (2003). Toxicological review of dichloroacetic acid in support of summary information on Integrated Risk Information System (IRIS). National Center for Environmental Assessment, Washington, D.C. EPA/635/R-03/007. Available at: . Accessed Mar 2016. von Ahlfen, S., Missel, A., Bendrat, K., and Schlimpberger, M. (2007). Determinants of RNA quality from FFPE samples. PLoS ONE 2, e1261. Waldron, L., Simpson, P., Parmigiani, G., and Huttenhower, C. (2012). Report on emerging technologies for translational bioinformatics: a symposium on gene expression profiling for archival tissues. BMC Cancer 12(1), 124-7. Wang, C., Gong, B., Bushel, P.R., Thierry-Mieg, J., Thierry-Mieg, D., Xu, J., Fang, H., Hong, H., Shen, J., Su, Z., et al. (2014). The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32, 926-32. Webster, A.F., Zumbo, P., Fostel, J., Gandara, J., Hester, S.D., Recio, L., Williams, A., Wood, C.E., Yauk, C.L., and Mason, C. (2015). Mining the archives: a cross-platform analysis of gene expression profiles in archival formalin-fixed paraffin-embedded (FFPE) tissue. Toxicol Sci 148(2), 460-72. Wood, C.E., Jokinen, M.P., Johnson, C.L., Olson, G.R., Hester, S., George, M., Chorley, B.N., Carswell, G., Carter, J.H., Wood, C.R., Bhat, V.S., Corton, J.C., and DeAngelo, A.B. (2014). Comparative Time Course Profiles of Phthalate Stereoisomers in Mice. Toxicol Sci 139(1), 21-34. Yang, L., Allen, B.C., and Thomas, R.S. (2007). BMDExpress: a software tool for the benchmark dose analyses of genomic data. BMC Genomics 8, 387. Yoo, H.S., Bradford, B.U., Kosyk, O., Shymonyak, S., Uehara, T., Collins, L.B., Bodnar, W.M., Ball, L.M., Gold, A., and Rusyn, I. (2015). Comparative analysis of the relationship between trichloroethylene metabolism and tissue-specific toxicity among inbred mouse strains: liver effects. J Toxicol Environ Health A 78(1), 15-31. Zhao, W., He, X., Hoadley, K.A., Parker, J.S., Hayes, D.N., and Perou, C.M. (2014). Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling. BMC Genomics 15, 419. TABLES Table 1. RNA and RNA-seq quality measures. Table 2. Concordance in top-ranked canonical pathways between FFPE and FROZ samples following DEHP exposure in Study 1. FIGURES Figure 1. Concordance in RNA-seq profiles between 2 year-old FFPE and FROZ samples. A, Principal component analysis (PCA) using RSEM-based output shows a consistent shift in FFPE compared to FROZ samples. Control FROZ samples are located at the upper right (o). The asterisk (*) indicates single FROZ outlier sample with high coverage. B, Gene distribution by individual sample shows modest shifts in read alignment to different genomic regions for FFPE samples but general consistency within each sample type. C, Regression plots show high correlation in log2 fold-change (FC) values for common differentially expressed genes between FFPE and FROZ samples, stratified by DEHP dose group. Figure 2. Concordance in dose response of individual target genes in 2 year-old FFPE and FROZ samples. RPKM-normalized counts for preselected genes in Acot (A) and Cyp4a (B) series genes show high concordance between sample types. Bars indicate mean ± SEM. Asterisks (*) indicate p < 0.001 for linear trend tests. Figure 3. Transcriptional BMD estimates for target Acot series genes in 2 year-old FFPE and FROZ samples. Best-fit models and BMDT values were similar between FROZ (A) and FFPE (B) samples for Acot1, Acot2, and Acot3. Figure 4. Concordance in RNA-seq profiles and dose responses in 21 year-old samples. A, Simple regression of log2-transformed counts for FROZ and FFPE samples across all transcripts detected shows a strong association by sample type. B, Gene distribution by individual sample shows variation in reads aligned to exonic regions among FFPE samples. C, Normalized count data for Cyp4a gene targets are plotted for FROZ and FFPE samples and FROZ x FFPE correlation. Asterisks (*) indicate p < 0.001 for linear trend tests. Figure 5. Transcriptional benchmark doses (BMDT) estimates for target Cyp4a series genes in Study 2. Average BMDT values were similar for more highly expressed genes (Cyp4a10 and Cyp4a14) in FROZ (A) compared to FFPE (B) samples. SUPPLEMENTARY MATERIAL Supplemental Table 1. RNA-seq quality parameters (not included in Table 1). Supplemental Table 2. Differentially expressed genes between FFPE and FROZ samples in Study 1. Supplemental Table 3. Total raw RNA-seq counts for all genes represented in the Acot and Cyp4a series. Supplemental Table 4. Apical benchmark dose estimates following DEHP exposure. Supplemental Table 5. Total raw counts on RNA-seq for all Acot and Cyp4a series genes following DCA exposure in FFPE and FROZ samples in Study 2. Supplemental Figure 1. Global profiles of RNA-seq data from paired 2 year-old FFPE and FROZ samples in Study 1. A, Unsupervised PCA of RNA-seq data using Partek Flow shows a similar relationship between control and DEHP dose groups for FROZ and FFPE samples, highlighted by blue and red clouds, respectively. B, Hierarchial cluster dendrogram shows grouping of samples by preservation type and treatment group. Overall dissimilarity index was low (1 – correlation < 0.05). The asterisk (*) indicates single outlier sample with high coverage. C, Total reads aligned to the transcriptome (left) and total raw counts (right) were lower in FFPE compared to FROZ samples, but these shifts were relatively consistent across dose groups. Values indicate mean ± SEM. Supplemental Figure 2. Pathway BMDT estimates (mg/kg-day) for FROZ and FFPE samples following DEHP exposure in Study 1. DEGs and pathways were identified using Partek GSA and IPA, respectively. Fold-change differences in count numbers for DEGs in a given pathway were used to generate BMDT values. The top and bottom of each box correspond the median (quartile 2, Q2) and lower quartile (Q1) BMD for the pathway, respectively, while bottom of each whisker (Min) is the lowest value (for a given gene) in that pathway that had an acceptable fit. Supplemental Figure 3. Global profiles of RNA-seq data from paired 21 year-old FFPE and FROZ samples in Study 2. A, Unsupervised PCA of RNA-seq data shows discrete clustering by dose group for FROZ (left) but not FFPE (right) samples. B, Hierarchical cluster dendrogram shows high dissimilarity in FFPE samples (dissimilarity index > 0.6), in contrast to FROZ samples (dissimilarity index < 0.05). C, Total reads aligned to the transcriptome (left) and total counts (right) were markedly lower in FFPE compared to FROZ samples across all dose groups. Values indicate mean ± SEM. Supplemental Figure 4. Simple regression analysis of log2-transformed counts between FROZ and FFPE samples in Study 2. Transcripts were limited to those with total counts across all groups < 100 (left) and < 20 (right). Table 1. RNA and RNA-seq quality measures.1Study 1 (< 2 years old)Study 2 (> 20 years old)FROZFFPEFROZFFPE(n=16)(n=16)(n=24)(n=24)mean(SD)mean(SD)mean(SD)mean(SD)RNA 260/2802.04(0.02)2.01(0.04)1.72 (0.05)1.88 (0.04)b,cRNA integrity number7.45(0.58)2.21(0.28)b3.09 (0.49)2.41 (0.50)b,cTotal reads (in millions)273.9(12.7)72.2(10.0)51.8 (14.2)36.1 (23.5)cMean read length (bp)350.3(0.2)50.1(0.3)a50.9 (0.2)50.6 (0.2)b,cQuality score436.9(0.7)36.6(0.4)35.6 (1.6)33.3 (1.4)b,cNumber of genes detected13784(551)13409(535)13170 (795)4364 (3528)b,c% All reads mapped598.8(0.5)91.0(2.4)b95.0 (2.3)31.3 (20.4)b,cReads aligned to transcriptome (in millions44.9(7.9)31.7(4.6)b27.5 (8.1)3.4 (4.4)b,c% Alignment to transcriptome760.7(1.7)44.1(3.8)b52.4 (3.5)7.0 (7.0)b,cMapped reads (in millions)870.5(12.3)62.6(9.1)60.0 (17.4)12.9 (15.4)b,c% Reads aligned995.3(1.2)86.7(2.8)b95.9 (2.2)31.6 (20.6)b,cInsertion rate (%)100.0031(0.0004)0.0236(0.0030)b0.0041 (0.0005)0.0201 (0.0124)bDeletion rate (%)110.0060(0.0015)0.0219(0.0044)b0.0037 (0.0009)0.0185 (0.0076)b1abcLetters indicate adjusted p < 0.05 or p < 0.01 compared to respective FROZ group or p < 0.01 compared to the Study 1 FFPE group. 2 Total reads: Average number of reads per sample (in millions) 3 Mean read length: The average base pair length of each read, typically shorter than the max because of sequence clipping 4 Quality score: Average read quality for reads 1 and 2 5 % All reads mapped: Percent of reads mapped either to the transcriptome or the genome 6 Reads aligned to transcriptome: The number of reads aligned to the transcriptome (in millions) 7 Percent alignment to transcriptome: The percent of reads aligned to the transcriptome 8 Mapped reads: The number of reads aligned to either the transcriptome or the genome (in millions) 9 Percent reads aligned: The percent of reads aligned to the transcriptome or the genome 10 Insertion rate: The total number of inserted bases, divided by the total number of mapped bases 11 Deletion rate: The total number of deleted bases, divided by the total number of mapped basesTable 2. Concordance between FROZ and FFPE samples in canonical pathways altered by DEHP exposure in Study 1.1 FROZFFPEPathwayrank -log (p value)ratiorank -log (p value)ratioMitochondrial Dysfunction123.60.43229.20.49Oxidative Phosphorylation223.50.53130.50.62FXR/RXR Activation315.20.4188.60.34LPS/IL-1 Mediated Inhibition of RXR Function414.20.32312.40.32Acute Phase Response Signaling514.20.35107.70.29LXR/RXR Activation612.40.38266.00.30Superpathway of Cholesterol Biosynthesis711.90.70411.20.70Fatty Acid β-oxidation I811.90.67510.00.63Valine Degradation I98.60.7298.10.72Protein Ubiquitination Pathway108.50.2569.50.271 Ingenuity Pathway Analysis (IPA) was performed on genes differentially expressed between Control group and any DEHP group. The most statistically significant canonical pathways identified in the FROZ RNA-seq DEG list are shown according to their -log (p value) and ratio of list genes found in each pathway over the total number of genes in that pathway. A p value of 0.05 corresponds to a -log (p value) of 1.30.0326136Study 1 (< 2 years old)FROZFFPEStudy 2 (> 20 years old)FROZFFPE(n=16)(n=16)(n=24)(n=24)-22859387963Supplemental Table 1. RNA-seq quality measures (not included in Table 1).1mean(SD)mean(SD)mean(SD)mean(SD)Total clusters (in millions)32.0(9.4)30.4(7.8)25.9 (7.1)18.1 (11.8)% clipped reads23.5(1.2)5.0(1.3)2.9 (3.7)40.8 (13.0)b,dClipped reads due to length (in millions)30.84(0.26)1.34(0.35)a0.12 (1.2)1.25 (0.87)bReads filtered on quality (in millions)0.45(0.18)0.39(0.10)0.37 (1.6)0.48 (0.09)a,c% rRNA43.6(0.8)19.6(3.0)b6.2 (2.1)11.4 (11.2)d% Genes detected544.9(1.8)43.6(1.8)42.1 (2.5)14.0 (11.3)b,dMean mapping quality628.0(3.9)37.8(3.8)b177.9 (7.7)161.8 (16.0)a,d% Transcriptome mapped762.8(1.7)46.2(3.9)b53.9 (2)10.3 (7.8)b,d% (G+C) aligned843.0(0.8)44.2(1.0)a46.9 (0.4)50.3 (2.6)b,d(C/T) ratio90.82(0.03)0.88(0.04)a0.95 (0.02)0.98 (0.07)d% Mapped bases 'A'24.9(0.6)22.6(1.5)a23.8 (1.1)41.7 (7.3)b,d 'C'23.6(0.4)25.9(1.5)a24.3 (0.6)18.9 (3.2)b,d 'G'23.5(0.6)25.3(1.3)23.9 (0.5)17.9 (3.4)b,d 'T'28.0(0.4)26.2(1.3)28.0 (0.5)21.6 (1.7)b,dLetters indicate adjusted ap < 0.05 or bp < 0.01 compared to respective FROZ group or cp < 0.05 or dp < 0.01compared to the Study 1 FFPE group.% Clipped reads: Percent of reads removed because of clippingClipped reads: Number of reads removed due to short length (<25 bp) after quality clipping% rRNA: Percent of reads aligning to ribosomal RNAs% Genes detected out of 30743 total genes queried for Study 1 and 31252 genes queried for Study 2 6 Mean mapping quality: A measure of how well a read aligned to the sequence; mean of quality values; subject to skewing because of ambiguous reads.% Transcriptome mapped: Percentage of reads that are mapped to the transcriptome%(G+C) Aligned: Percentage of 'G' and 'C' bases that are aligned(C/T) Ratio: Ratio of 'C' to 'T' basesSupplemental Table 2. Differentially expressed genes between FFPE and FROZ samples in Study 1.FROZ vs FFPE # overlap % overlap Con vs DEHP 1.5k623105653886%Con vs DEHP 3k17291733121270%Con vs DEHP 6k16061857118774%Con vs any group23002581168073%FROZ1FFPE1. Total raw RNA-seq counts for all genes represented in the Acot and Cyp4a series.1FROZFFPE% difference (FROZ vs FFPE)DEHP (ppm) >Con1.5k3k6kCon1.5k3k6kCon1.5k3k6kAcot154410863256333217340577671667220741-26-29-35-36Acot212942741136317942863292849113290-33-23-25-26Acot3791440513749466297329772841-22-32-42-43Acot4753270166186471682233448964713-9-14-26-27Acot50452302723na-50411Acot61940647711223666-42-45-44-14Acot71511833052981641121651338-39-46-55Acot83866848811086281493663715-27-28-25-34Acot9180156231261828180112-54-48-65-57Acot1070556010625223443-65-61-43-59Acot11361231200292353241185269-24-8-8Acot128071978315054131065174691583447598-36-29-45-42Acot131212149920301704605699859804-50-53-58-53Cyp4a104577172649328730359400296799143174396197850-35-43-47-45Cyp4a12a1910630679358533482212848234122575425106-33-24-28-28Cyp4a12b1607858211292105921097722296368773-32-16-15-17Cyp4a1427462691255452635798822260164828287232319891-18-39-47-45Cyp4a29123210010-100-100-50Cyp4a30b335530400-100-20-100Cyp4a311815364967910909109387260087401-40-28-38-32Cyp4a3223268875121061267021217339923410088-9-17-24-20Sum total425025271921010511108676729333328769555674620460-31-38-45-431 Dose levels for DEHP are expressed in ppm of feed.EndpointUnitBMDA (mg/kg-day)ReferenceRelative liver weight (7d)mg/kg body weight48Lake et al. 2016Hepatocellular cytoplasmic alteration (7d)incidence116Lake et al. 2016Liver palmitoyl CoA (PCO) activity (28d)nmol/min/mg*10-6114Wood et al. 2014Liver cell proliferation (28d)labeling index37Wood et al. 2014Hepatocellular carcinoma (2yr)incidence71Wood et al. 2014Hepatocellular carcinoma or adenoma (2yr)incidence35Wood et al. 2014-22097629003. Apical benchmark dose estimates following dietary DEHP exposure. . Total raw counts for all Acot and Cyp4a series genes detected in Study 2.1 RNA-seq FROZRNA-seq FFPE% difference (FROZ vs FFPE)DCA dose >Con1.02.03.5Con1.02.03.5Con1.02.03.5Acot136544738231746533876126-99-92-98-99Acot21041381746805977161-93-95-100-99Acot315244831256404635-100-85-99-100Acot47125721308930549463-99-98-100-99Acot5004160002nana-100-85Acot64138772660010-100-100-98-100Acot75733013673762410-100-99-100-100Acot84863865877991023-100-100-100-100Acot922525622326802800-100-89-100-100Acot1022040400-100100na-100Acot113732512252744020-99-100-99-100Acot1270726795611714165467929102-99-99-100-99Acot131366127994218979047-99-100-100-100Cyp4a101432243897765342653213059601816-99-87-99-99Cyp4a12a37230401872454140799508462235305-99-99-99-99Cyp4a12b217231204589572319632881-99-98-99-99Cyp4a1442411351231884611443056813252795-93-50-99-99Cyp4a2900100000nana-100naCyp4a30b12611170000-100-100-100-100Cyp4a3117631742931609034513957-98-86-97-100Cyp4a32209823005750185533457156186-98-98-97-99Sum total5487859993276041950437690167429685639-99-97-99-991 Dose levels for DCA are expressed in g/l of drinking water.Supplemental Fig. 2 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download