Table of Contents - Monash University



Software DocumentationVersion 0.9January 2019Linden Gearing, Helen Cumming, Ross Chapman, Alex Finkel, Isaac Woodhouse, Kevin Luu, Sam Forster and Paul HertzogCentre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical ResearchTable of Contents TOC \o "1-3" Table of Contents PAGEREF _Toc534297033 \h iiTable of figures PAGEREF _Toc534297034 \h ivIntroduction PAGEREF _Toc534297035 \h 1Scan PAGEREF _Toc534297036 \h 2Enrichment analysis PAGEREF _Toc534297037 \h 4Background gene list selection PAGEREF _Toc534297038 \h 4Enrichment calculations PAGEREF _Toc534297039 \h 4Proximal enrichment analysis PAGEREF _Toc534297040 \h 6Acquisition and Installation PAGEREF _Toc534297041 \h 7System requirements PAGEREF _Toc534297042 \h 7Prerequisite installation PAGEREF _Toc534297043 \h 7Windows/Mac/Linux PAGEREF _Toc534297044 \h 7Java installation PAGEREF _Toc534297045 \h 7Program Acquisition PAGEREF _Toc534297046 \h 7Installing the Program PAGEREF _Toc534297047 \h 8Windows PAGEREF _Toc534297048 \h 8Linux PAGEREF _Toc534297049 \h 8Mac PAGEREF _Toc534297050 \h 8GUI workflow PAGEREF _Toc534297051 \h 9Scan PAGEREF _Toc534297052 \h 10Importing a gene list PAGEREF _Toc534297053 \h 10Importing transcription factor models PAGEREF _Toc534297054 \h 11Selecting a deficit PAGEREF _Toc534297055 \h 11The user interface PAGEREF _Toc534297056 \h 12Site interface PAGEREF _Toc534297057 \h 14Transcription factor panel PAGEREF _Toc534297058 \h 15Sliders PAGEREF _Toc534297059 \h 15Saving data and images PAGEREF _Toc534297060 \h 16Enrichment analysis PAGEREF _Toc534297061 \h 17Enrichment interface PAGEREF _Toc534297062 \h 17Enrichment plot PAGEREF _Toc534297063 \h 18Saving data and images PAGEREF _Toc534297064 \h 19Subsequent enrichments PAGEREF _Toc534297065 \h 19Proximal enrichment analysis PAGEREF _Toc534297066 \h 20Selecting sites of interest PAGEREF _Toc534297067 \h 20Performing a new scan and enrichment PAGEREF _Toc534297068 \h 20Command line workflow PAGEREF _Toc534297069 \h 21General parameters PAGEREF _Toc534297070 \h 21Required: PAGEREF _Toc534297071 \h 21Optional: PAGEREF _Toc534297072 \h 21Scan parameters PAGEREF _Toc534297073 \h 21Input files: PAGEREF _Toc534297074 \h 21Parameters: PAGEREF _Toc534297075 \h 22Output files: PAGEREF _Toc534297076 \h 22Enrichment parameters PAGEREF _Toc534297077 \h 22Input files: PAGEREF _Toc534297078 \h 22Parameters: PAGEREF _Toc534297079 \h 22Output files: PAGEREF _Toc534297080 \h 22References PAGEREF _Toc534297081 \h 23Table of figures TOC \c "Figure" Figure 1: Program workflow PAGEREF _Toc534297461 \h 1Figure 2: Transcription factor motif PAGEREF _Toc534297462 \h 2Figure 3: Site prediction PAGEREF _Toc534297463 \h 3Figure 4: Enrichment statistics PAGEREF _Toc534297464 \h 5Figure 5: Proximal enrichment analysis PAGEREF _Toc534297465 \h 6Figure 9: The start panel PAGEREF _Toc534297466 \h 8Figure 10: Scan load box PAGEREF _Toc534297467 \h 9Figure 11: JASPAR matrix format PAGEREF _Toc534297468 \h 10Figure 12: The promoter panel user interface PAGEREF _Toc534297469 \h 11Figure 13: The site interface PAGEREF _Toc534297470 \h 13Figure 14: Options for transcription factor and site display. PAGEREF _Toc534297471 \h 14Figure 15: The enrichment load box PAGEREF _Toc534297472 \h 16Figure 16: Enrichment interface PAGEREF _Toc534297473 \h 17Figure 17: Interactive enrichment scatter plot PAGEREF _Toc534297474 \h 18Figure 18: The proximal enrichment load box PAGEREF _Toc534297475 \h 19IntroductionWelcome to CiiiDER, a software package for predicting and analysing transcription factor binding sites.The CiiiDER workflow consists of two types of analysis ( REF _Ref447095985 \h \* MERGEFORMAT Figure 1):Identification of potential transcription factor binding sites in regulatory regions (Scan)Identification of over- or under-represented transcription factors compared to a background (Enrichment)This user guide gives some background information about these analyses, how to interact with the graphical user interface (GUI) and run analyses from the command line.Figure 1: Program workflowAnalyses can be performed using the graphical user interface or from the command line.ScanCiiiDER uses an implementation of the MATCH algorithm ADDIN EN.CITE <EndNote><Cite><Author>Kel</Author><Year>2003</Year><RecNum>3</RecNum><DisplayText>(Kel et al., 2003)</DisplayText><record><rec-number>3</rec-number><foreign-keys><key app="EN" db-id="zprw2wat9wz50vea9wfpp5e59xrwvv0tserf" timestamp="1446009093">3</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Kel, A. E.</author><author>Gossling, E.</author><author>Reuter, I.</author><author>Cheremushkin, E.</author><author>Kel-Margoulis, O. V.</author><author>Wingender, E.</author></authors></contributors><auth-address>BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenbuttel, Germany. ake@biobase.de</auth-address><titles><title>MATCH: A tool for searching transcription factor binding sites in DNA sequences</title><secondary-title>Nucleic Acids Res</secondary-title></titles><periodical><full-title>Nucleic Acids Res</full-title></periodical><pages>3576-9</pages><volume>31</volume><number>13</number><keywords><keyword>Algorithms</keyword><keyword>Binding Sites</keyword><keyword>Internet</keyword><keyword>Regulatory Sequences, Nucleic Acid</keyword><keyword>Sequence Analysis, DNA/*methods</keyword><keyword>*Software</keyword><keyword>Transcription Factors/*metabolism</keyword><keyword>User-Computer Interface</keyword></keywords><dates><year>2003</year><pub-dates><date>Jul 1</date></pub-dates></dates><isbn>1362-4962 (Electronic)&#xD;0305-1048 (Linking)</isbn><accession-num>12824369</accession-num><urls><related-urls><url>;(Kel et al., 2003) to predict transcription factor binding sites in a query set of DNA sequences. The scan produces a map showing the location of all potential sites across the sequences.Transcription factors are represented by position frequency matrices (PFMs; REF _Ref447096006 \h \* MERGEFORMAT Figure 2) in either JASPAR PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5LaGFuPC9BdXRob3I+PFllYXI+MjAxODwvWWVhcj48UmVj

TnVtPjExNTwvUmVjTnVtPjxEaXNwbGF5VGV4dD4oS2hhbiBldCBhbC4sIDIwMTgpPC9EaXNwbGF5

VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjExNTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxr

ZXkgYXBwPSJFTiIgZGItaWQ9Inpwcncyd2F0OXd6NTB2ZWE5d2ZwcDVlNTl4cnd2djB0c2VyZiIg

dGltZXN0YW1wPSIxNTM2ODAxMTQ5Ij4xMTU8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUg

bmFtZT0iSm91cm5hbCBBcnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9y

cz48YXV0aG9yPktoYW4sIEEuPC9hdXRob3I+PGF1dGhvcj5Gb3JuZXMsIE8uPC9hdXRob3I+PGF1

dGhvcj5TdGlnbGlhbmksIEEuPC9hdXRob3I+PGF1dGhvcj5HaGVvcmdoZSwgTS48L2F1dGhvcj48

YXV0aG9yPkNhc3Ryby1Nb25kcmFnb24sIEouIEEuPC9hdXRob3I+PGF1dGhvcj52YW4gZGVyIExl

ZSwgUi48L2F1dGhvcj48YXV0aG9yPkJlc3N5LCBBLjwvYXV0aG9yPjxhdXRob3I+Q2hlbmVieSwg

Si48L2F1dGhvcj48YXV0aG9yPkt1bGthcm5pLCBTLiBSLjwvYXV0aG9yPjxhdXRob3I+VGFuLCBH

LjwvYXV0aG9yPjxhdXRob3I+QmFyYW5hc2ljLCBELjwvYXV0aG9yPjxhdXRob3I+QXJlbmlsbGFz

LCBELiBKLjwvYXV0aG9yPjxhdXRob3I+U2FuZGVsaW4sIEEuPC9hdXRob3I+PGF1dGhvcj5WYW5k

ZXBvZWxlLCBLLjwvYXV0aG9yPjxhdXRob3I+TGVuaGFyZCwgQi48L2F1dGhvcj48YXV0aG9yPkJh

bGxlc3RlciwgQi48L2F1dGhvcj48YXV0aG9yPldhc3Nlcm1hbiwgVy4gVy48L2F1dGhvcj48YXV0

aG9yPlBhcmN5LCBGLjwvYXV0aG9yPjxhdXRob3I+TWF0aGVsaWVyLCBBLjwvYXV0aG9yPjwvYXV0

aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPkNlbnRyZSBmb3IgTW9sZWN1bGFyIE1l

ZGljaW5lIE5vcndheSAoTkNNTSksIE5vcmRpYyBFTUJMIFBhcnRuZXJzaGlwLCBVbml2ZXJzaXR5

IG9mIE9zbG8sIDAzMTggT3NsbywgTm9yd2F5LiYjeEQ7Q2VudHJlIGZvciBNb2xlY3VsYXIgTWVk

aWNpbmUgYW5kIFRoZXJhcGV1dGljcywgRGVwYXJ0bWVudCBvZiBNZWRpY2FsIEdlbmV0aWNzLCBC

QyBDaGlsZHJlbiZhcG9zO3MgSG9zcGl0YWwgUmVzZWFyY2ggSW5zdGl0dXRlLCBVbml2ZXJzaXR5

IG9mIEJyaXRpc2ggQ29sdW1iaWEsIDk1MCAyOHRoIEF2ZSBXLCBWYW5jb3V2ZXIsIEJDIFY1WiA0

SDQsIENhbmFkYS4mI3hEO1VuaXZlcnNpdHkgb2YgR3Jlbm9ibGUgQWxwZXMsIENOUlMsIENFQSwg

SU5SQSwgQklHLUxQQ1YsIDM4MDAwIEdyZW5vYmxlLCBGcmFuY2UuJiN4RDtJTlNFUk0sIFVNUjEw

OTAgVEFHQywgTWFyc2VpbGxlLCBGLTEzMjg4LCBGcmFuY2UuJiN4RDtBaXgtTWFyc2VpbGxlIFVu

aXZlcnNpdGUsIFVNUjEwOTAgVEFHQywgTWFyc2VpbGxlLCBGLTEzMjg4LCBGcmFuY2UuJiN4RDtH

aGVudCBVbml2ZXJzaXR5LCBEZXBhcnRtZW50IG9mIFBsYW50IEJpb3RlY2hub2xvZ3kgYW5kIEJp

b2luZm9ybWF0aWNzLCBUZWNobm9sb2dpZXBhcmsgOTI3LCA5MDUyIEdoZW50LCBCZWxnaXVtLiYj

eEQ7VklCIENlbnRlciBmb3IgUGxhbnQgU3lzdGVtcyBCaW9sb2d5LCBUZWNobm9sb2dpZXBhcmsg

OTI3LCA5MDUyIEdoZW50LCBCZWxnaXVtLiYjeEQ7QmlvaW5mb3JtYXRpY3MgSW5zdGl0dXRlIEdo

ZW50LCBHaGVudCBVbml2ZXJzaXR5LCBUZWNobm9sb2dpZXBhcmsgOTI3LCA5MDUyIEdoZW50LCBC

ZWxnaXVtLiYjeEQ7SW5zdGl0dXRlIG9mIENsaW5pY2FsIFNjaWVuY2VzLCBGYWN1bHR5IG9mIE1l

ZGljaW5lLCBJbXBlcmlhbCBDb2xsZWdlIExvbmRvbiwgTG9uZG9uIFcxMiAwTk4sIFVLLiYjeEQ7

Q29tcHV0YXRpb25hbCBSZWd1bGF0b3J5IEdlbm9taWNzLCBNUkMgTG9uZG9uIEluc3RpdHV0ZSBv

ZiBNZWRpY2FsIFNjaWVuY2VzLCBMb25kb24gVzEyIDBOTiwgVUsuJiN4RDtUaGUgQmlvaW5mb3Jt

YXRpY3MgQ2VudHJlLCBEZXBhcnRtZW50IG9mIEJpb2xvZ3kgYW5kIEJpb3RlY2ggUmVzZWFyY2gg

JmFtcDsgSW5ub3ZhdGlvbiBDZW50cmUsIFVuaXZlcnNpdHkgb2YgQ29wZW5oYWdlbiwgREsyMjAw

IENvcGVuaGFnZW4gTiwgRGVubWFyay4mI3hEO1NhcnMgSW50ZXJuYXRpb25hbCBDZW50cmUgZm9y

IE1hcmluZSBNb2xlY3VsYXIgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBCZXJnZW4sIE4tNTAwOCBC

ZXJnZW4sIE5vcndheS4mI3hEO0RlcGFydG1lbnQgb2YgQ2FuY2VyIEdlbmV0aWNzLCBJbnN0aXR1

dGUgZm9yIENhbmNlciBSZXNlYXJjaCwgT3NsbyBVbml2ZXJzaXR5IEhvc3BpdGFsIFJhZGl1bWhv

c3BpdGFsZXQsIDAzMTAgT3NsbywgTm9yd2F5LjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxl

PkpBU1BBUiAyMDE4OiB1cGRhdGUgb2YgdGhlIG9wZW4tYWNjZXNzIGRhdGFiYXNlIG9mIHRyYW5z

Y3JpcHRpb24gZmFjdG9yIGJpbmRpbmcgcHJvZmlsZXMgYW5kIGl0cyB3ZWIgZnJhbWV3b3JrPC90

aXRsZT48c2Vjb25kYXJ5LXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVzPC9zZWNvbmRhcnktdGl0bGU+

PC90aXRsZXM+PHBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1bGwt

dGl0bGU+PC9wZXJpb2RpY2FsPjxwYWdlcz5EMjYwLUQyNjY8L3BhZ2VzPjx2b2x1bWU+NDY8L3Zv

bHVtZT48bnVtYmVyPkQxPC9udW1iZXI+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+PHB1Yi1kYXRl

cz48ZGF0ZT5KYW4gNDwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjEzNjItNDk2MiAo

RWxlY3Ryb25pYykmI3hEOzAzMDUtMTA0OCAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+

MjkxNDA0NzM8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHBzOi8v

d3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzI5MTQwNDczPC91cmw+PC9yZWxhdGVkLXVybHM+

PC91cmxzPjxjdXN0b20yPlBNQzU3NTMyNDM8L2N1c3RvbTI+PGVsZWN0cm9uaWMtcmVzb3VyY2Ut

bnVtPjEwLjEwOTMvbmFyL2dreDExMjY8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3Jk

PjwvQ2l0ZT48L0VuZE5vdGU+

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5LaGFuPC9BdXRob3I+PFllYXI+MjAxODwvWWVhcj48UmVj

TnVtPjExNTwvUmVjTnVtPjxEaXNwbGF5VGV4dD4oS2hhbiBldCBhbC4sIDIwMTgpPC9EaXNwbGF5

VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjExNTwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxr

ZXkgYXBwPSJFTiIgZGItaWQ9Inpwcncyd2F0OXd6NTB2ZWE5d2ZwcDVlNTl4cnd2djB0c2VyZiIg

dGltZXN0YW1wPSIxNTM2ODAxMTQ5Ij4xMTU8L2tleT48L2ZvcmVpZ24ta2V5cz48cmVmLXR5cGUg

bmFtZT0iSm91cm5hbCBBcnRpY2xlIj4xNzwvcmVmLXR5cGU+PGNvbnRyaWJ1dG9ycz48YXV0aG9y

cz48YXV0aG9yPktoYW4sIEEuPC9hdXRob3I+PGF1dGhvcj5Gb3JuZXMsIE8uPC9hdXRob3I+PGF1

dGhvcj5TdGlnbGlhbmksIEEuPC9hdXRob3I+PGF1dGhvcj5HaGVvcmdoZSwgTS48L2F1dGhvcj48

YXV0aG9yPkNhc3Ryby1Nb25kcmFnb24sIEouIEEuPC9hdXRob3I+PGF1dGhvcj52YW4gZGVyIExl

ZSwgUi48L2F1dGhvcj48YXV0aG9yPkJlc3N5LCBBLjwvYXV0aG9yPjxhdXRob3I+Q2hlbmVieSwg

Si48L2F1dGhvcj48YXV0aG9yPkt1bGthcm5pLCBTLiBSLjwvYXV0aG9yPjxhdXRob3I+VGFuLCBH

LjwvYXV0aG9yPjxhdXRob3I+QmFyYW5hc2ljLCBELjwvYXV0aG9yPjxhdXRob3I+QXJlbmlsbGFz

LCBELiBKLjwvYXV0aG9yPjxhdXRob3I+U2FuZGVsaW4sIEEuPC9hdXRob3I+PGF1dGhvcj5WYW5k

ZXBvZWxlLCBLLjwvYXV0aG9yPjxhdXRob3I+TGVuaGFyZCwgQi48L2F1dGhvcj48YXV0aG9yPkJh

bGxlc3RlciwgQi48L2F1dGhvcj48YXV0aG9yPldhc3Nlcm1hbiwgVy4gVy48L2F1dGhvcj48YXV0

aG9yPlBhcmN5LCBGLjwvYXV0aG9yPjxhdXRob3I+TWF0aGVsaWVyLCBBLjwvYXV0aG9yPjwvYXV0

aG9ycz48L2NvbnRyaWJ1dG9ycz48YXV0aC1hZGRyZXNzPkNlbnRyZSBmb3IgTW9sZWN1bGFyIE1l

ZGljaW5lIE5vcndheSAoTkNNTSksIE5vcmRpYyBFTUJMIFBhcnRuZXJzaGlwLCBVbml2ZXJzaXR5

IG9mIE9zbG8sIDAzMTggT3NsbywgTm9yd2F5LiYjeEQ7Q2VudHJlIGZvciBNb2xlY3VsYXIgTWVk

aWNpbmUgYW5kIFRoZXJhcGV1dGljcywgRGVwYXJ0bWVudCBvZiBNZWRpY2FsIEdlbmV0aWNzLCBC

QyBDaGlsZHJlbiZhcG9zO3MgSG9zcGl0YWwgUmVzZWFyY2ggSW5zdGl0dXRlLCBVbml2ZXJzaXR5

IG9mIEJyaXRpc2ggQ29sdW1iaWEsIDk1MCAyOHRoIEF2ZSBXLCBWYW5jb3V2ZXIsIEJDIFY1WiA0

SDQsIENhbmFkYS4mI3hEO1VuaXZlcnNpdHkgb2YgR3Jlbm9ibGUgQWxwZXMsIENOUlMsIENFQSwg

SU5SQSwgQklHLUxQQ1YsIDM4MDAwIEdyZW5vYmxlLCBGcmFuY2UuJiN4RDtJTlNFUk0sIFVNUjEw

OTAgVEFHQywgTWFyc2VpbGxlLCBGLTEzMjg4LCBGcmFuY2UuJiN4RDtBaXgtTWFyc2VpbGxlIFVu

aXZlcnNpdGUsIFVNUjEwOTAgVEFHQywgTWFyc2VpbGxlLCBGLTEzMjg4LCBGcmFuY2UuJiN4RDtH

aGVudCBVbml2ZXJzaXR5LCBEZXBhcnRtZW50IG9mIFBsYW50IEJpb3RlY2hub2xvZ3kgYW5kIEJp

b2luZm9ybWF0aWNzLCBUZWNobm9sb2dpZXBhcmsgOTI3LCA5MDUyIEdoZW50LCBCZWxnaXVtLiYj

eEQ7VklCIENlbnRlciBmb3IgUGxhbnQgU3lzdGVtcyBCaW9sb2d5LCBUZWNobm9sb2dpZXBhcmsg

OTI3LCA5MDUyIEdoZW50LCBCZWxnaXVtLiYjeEQ7QmlvaW5mb3JtYXRpY3MgSW5zdGl0dXRlIEdo

ZW50LCBHaGVudCBVbml2ZXJzaXR5LCBUZWNobm9sb2dpZXBhcmsgOTI3LCA5MDUyIEdoZW50LCBC

ZWxnaXVtLiYjeEQ7SW5zdGl0dXRlIG9mIENsaW5pY2FsIFNjaWVuY2VzLCBGYWN1bHR5IG9mIE1l

ZGljaW5lLCBJbXBlcmlhbCBDb2xsZWdlIExvbmRvbiwgTG9uZG9uIFcxMiAwTk4sIFVLLiYjeEQ7

Q29tcHV0YXRpb25hbCBSZWd1bGF0b3J5IEdlbm9taWNzLCBNUkMgTG9uZG9uIEluc3RpdHV0ZSBv

ZiBNZWRpY2FsIFNjaWVuY2VzLCBMb25kb24gVzEyIDBOTiwgVUsuJiN4RDtUaGUgQmlvaW5mb3Jt

YXRpY3MgQ2VudHJlLCBEZXBhcnRtZW50IG9mIEJpb2xvZ3kgYW5kIEJpb3RlY2ggUmVzZWFyY2gg

JmFtcDsgSW5ub3ZhdGlvbiBDZW50cmUsIFVuaXZlcnNpdHkgb2YgQ29wZW5oYWdlbiwgREsyMjAw

IENvcGVuaGFnZW4gTiwgRGVubWFyay4mI3hEO1NhcnMgSW50ZXJuYXRpb25hbCBDZW50cmUgZm9y

IE1hcmluZSBNb2xlY3VsYXIgQmlvbG9neSwgVW5pdmVyc2l0eSBvZiBCZXJnZW4sIE4tNTAwOCBC

ZXJnZW4sIE5vcndheS4mI3hEO0RlcGFydG1lbnQgb2YgQ2FuY2VyIEdlbmV0aWNzLCBJbnN0aXR1

dGUgZm9yIENhbmNlciBSZXNlYXJjaCwgT3NsbyBVbml2ZXJzaXR5IEhvc3BpdGFsIFJhZGl1bWhv

c3BpdGFsZXQsIDAzMTAgT3NsbywgTm9yd2F5LjwvYXV0aC1hZGRyZXNzPjx0aXRsZXM+PHRpdGxl

PkpBU1BBUiAyMDE4OiB1cGRhdGUgb2YgdGhlIG9wZW4tYWNjZXNzIGRhdGFiYXNlIG9mIHRyYW5z

Y3JpcHRpb24gZmFjdG9yIGJpbmRpbmcgcHJvZmlsZXMgYW5kIGl0cyB3ZWIgZnJhbWV3b3JrPC90

aXRsZT48c2Vjb25kYXJ5LXRpdGxlPk51Y2xlaWMgQWNpZHMgUmVzPC9zZWNvbmRhcnktdGl0bGU+

PC90aXRsZXM+PHBlcmlvZGljYWw+PGZ1bGwtdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L2Z1bGwt

dGl0bGU+PC9wZXJpb2RpY2FsPjxwYWdlcz5EMjYwLUQyNjY8L3BhZ2VzPjx2b2x1bWU+NDY8L3Zv

bHVtZT48bnVtYmVyPkQxPC9udW1iZXI+PGRhdGVzPjx5ZWFyPjIwMTg8L3llYXI+PHB1Yi1kYXRl

cz48ZGF0ZT5KYW4gNDwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxpc2JuPjEzNjItNDk2MiAo

RWxlY3Ryb25pYykmI3hEOzAzMDUtMTA0OCAoTGlua2luZyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+

MjkxNDA0NzM8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJscz48dXJsPmh0dHBzOi8v

d3d3Lm5jYmkubmxtLm5paC5nb3YvcHVibWVkLzI5MTQwNDczPC91cmw+PC9yZWxhdGVkLXVybHM+

PC91cmxzPjxjdXN0b20yPlBNQzU3NTMyNDM8L2N1c3RvbTI+PGVsZWN0cm9uaWMtcmVzb3VyY2Ut

bnVtPjEwLjEwOTMvbmFyL2dreDExMjY8L2VsZWN0cm9uaWMtcmVzb3VyY2UtbnVtPjwvcmVjb3Jk

PjwvQ2l0ZT48L0VuZE5vdGU+

ADDIN EN.CITE.DATA (Khan et al., 2018) or TRANSFAC PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5NYXR5czwvQXV0aG9yPjxZZWFyPjIwMDY8L1llYXI+PFJl

Y051bT4xPC9SZWNOdW0+PERpc3BsYXlUZXh0PihNYXR5cyBldCBhbC4sIDIwMDYpPC9EaXNwbGF5

VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjE8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5

IGFwcD0iRU4iIGRiLWlkPSJ6cHJ3MndhdDl3ejUwdmVhOXdmcHA1ZTU5eHJ3dnYwdHNlcmYiIHRp

bWVzdGFtcD0iMTQ0NjAwMTgzMyI+MTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1l

PSJKb3VybmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxh

dXRob3I+TWF0eXMsIFYuPC9hdXRob3I+PGF1dGhvcj5LZWwtTWFyZ291bGlzLCBPLiBWLjwvYXV0

aG9yPjxhdXRob3I+RnJpY2tlLCBFLjwvYXV0aG9yPjxhdXRob3I+TGllYmljaCwgSS48L2F1dGhv

cj48YXV0aG9yPkxhbmQsIFMuPC9hdXRob3I+PGF1dGhvcj5CYXJyZS1EaXJyaWUsIEEuPC9hdXRo

b3I+PGF1dGhvcj5SZXV0ZXIsIEkuPC9hdXRob3I+PGF1dGhvcj5DaGVrbWVuZXYsIEQuPC9hdXRo

b3I+PGF1dGhvcj5LcnVsbCwgTS48L2F1dGhvcj48YXV0aG9yPkhvcm5pc2NoZXIsIEsuPC9hdXRo

b3I+PGF1dGhvcj5Wb3NzLCBOLjwvYXV0aG9yPjxhdXRob3I+U3RlZ21haWVyLCBQLjwvYXV0aG9y

PjxhdXRob3I+TGV3aWNraS1Qb3RhcG92LCBCLjwvYXV0aG9yPjxhdXRob3I+U2F4ZWwsIEguPC9h

dXRob3I+PGF1dGhvcj5LZWwsIEEuIEUuPC9hdXRob3I+PGF1dGhvcj5XaW5nZW5kZXIsIEUuPC9h

dXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJlc3M+QklPQkFTRSBHbWJI

LCBIYWxjaHRlcnNjaGUgU3RyYXNzZSAzMywgRC0zODMwNCBXb2xmZW5idXR0ZWwsIEdlcm1hbnku

IHZtYUBiaW9iYXNlLmRlPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+VFJBTlNGQUMgYW5k

IGl0cyBtb2R1bGUgVFJBTlNDb21wZWw6IHRyYW5zY3JpcHRpb25hbCBnZW5lIHJlZ3VsYXRpb24g

aW4gZXVrYXJ5b3RlczwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwv

c2Vjb25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMg

QWNpZHMgUmVzPC9mdWxsLXRpdGxlPjwvcGVyaW9kaWNhbD48cGFnZXM+RDEwOC0xMDwvcGFnZXM+

PHZvbHVtZT4zNDwvdm9sdW1lPjxudW1iZXI+RGF0YWJhc2UgaXNzdWU8L251bWJlcj48a2V5d29y

ZHM+PGtleXdvcmQ+QW5pbWFsczwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcy9nZW5ldGlj

czwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcyBQcm90ZWlucy9jaGVtaXN0cnkvbWV0YWJv

bGlzbTwva2V5d29yZD48a2V5d29yZD5CaW5kaW5nIFNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPkRO

QS9jaGVtaXN0cnkvbWV0YWJvbGlzbTwva2V5d29yZD48a2V5d29yZD4qRGF0YWJhc2VzLCBHZW5l

dGljPC9rZXl3b3JkPjxrZXl3b3JkPkRyb3NvcGhpbGEgUHJvdGVpbnMvY2hlbWlzdHJ5L21ldGFi

b2xpc208L2tleXdvcmQ+PGtleXdvcmQ+RHJvc29waGlsYSBtZWxhbm9nYXN0ZXIvZ2VuZXRpY3M8

L2tleXdvcmQ+PGtleXdvcmQ+KkdlbmUgRXhwcmVzc2lvbiBSZWd1bGF0aW9uPC9rZXl3b3JkPjxr

ZXl3b3JkPkh1bWFuczwva2V5d29yZD48a2V5d29yZD5JbnRlcm5ldDwva2V5d29yZD48a2V5d29y

ZD5NaWNlPC9rZXl3b3JkPjxrZXl3b3JkPlByb3RlaW4gU3RydWN0dXJlLCBUZXJ0aWFyeTwva2V5

d29yZD48a2V5d29yZD5SYXRzPC9rZXl3b3JkPjxrZXl3b3JkPipSZWd1bGF0b3J5IFNlcXVlbmNl

cywgTnVjbGVpYyBBY2lkPC9rZXl3b3JkPjxrZXl3b3JkPlN5c3RlbXMgSW50ZWdyYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+VHJhbnNjcmlwdGlvbiBGYWN0b3JzL2NoZW1pc3RyeS8qbWV0YWJvbGlz

bTwva2V5d29yZD48a2V5d29yZD5UcmFuc2NyaXB0aW9uLCBHZW5ldGljPC9rZXl3b3JkPjxrZXl3

b3JkPlVzZXItQ29tcHV0ZXIgSW50ZXJmYWNlPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5

ZWFyPjIwMDY8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW4gMTwvZGF0ZT48L3B1Yi1kYXRlcz48

L2RhdGVzPjxpc2JuPjEzNjItNDk2MiAoRWxlY3Ryb25pYykmI3hEOzAzMDUtMTA0OCAoTGlua2lu

Zyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MTYzODE4MjU8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMTYzODE4

MjU8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+UE1DMTM0NzUwNTwvY3VzdG9t

Mj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9uYXIvZ2tqMTQzPC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5NYXR5czwvQXV0aG9yPjxZZWFyPjIwMDY8L1llYXI+PFJl

Y051bT4xPC9SZWNOdW0+PERpc3BsYXlUZXh0PihNYXR5cyBldCBhbC4sIDIwMDYpPC9EaXNwbGF5

VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjE8L3JlYy1udW1iZXI+PGZvcmVpZ24ta2V5cz48a2V5

IGFwcD0iRU4iIGRiLWlkPSJ6cHJ3MndhdDl3ejUwdmVhOXdmcHA1ZTU5eHJ3dnYwdHNlcmYiIHRp

bWVzdGFtcD0iMTQ0NjAwMTgzMyI+MTwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBuYW1l

PSJKb3VybmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3JzPjxh

dXRob3I+TWF0eXMsIFYuPC9hdXRob3I+PGF1dGhvcj5LZWwtTWFyZ291bGlzLCBPLiBWLjwvYXV0

aG9yPjxhdXRob3I+RnJpY2tlLCBFLjwvYXV0aG9yPjxhdXRob3I+TGllYmljaCwgSS48L2F1dGhv

cj48YXV0aG9yPkxhbmQsIFMuPC9hdXRob3I+PGF1dGhvcj5CYXJyZS1EaXJyaWUsIEEuPC9hdXRo

b3I+PGF1dGhvcj5SZXV0ZXIsIEkuPC9hdXRob3I+PGF1dGhvcj5DaGVrbWVuZXYsIEQuPC9hdXRo

b3I+PGF1dGhvcj5LcnVsbCwgTS48L2F1dGhvcj48YXV0aG9yPkhvcm5pc2NoZXIsIEsuPC9hdXRo

b3I+PGF1dGhvcj5Wb3NzLCBOLjwvYXV0aG9yPjxhdXRob3I+U3RlZ21haWVyLCBQLjwvYXV0aG9y

PjxhdXRob3I+TGV3aWNraS1Qb3RhcG92LCBCLjwvYXV0aG9yPjxhdXRob3I+U2F4ZWwsIEguPC9h

dXRob3I+PGF1dGhvcj5LZWwsIEEuIEUuPC9hdXRob3I+PGF1dGhvcj5XaW5nZW5kZXIsIEUuPC9h

dXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJlc3M+QklPQkFTRSBHbWJI

LCBIYWxjaHRlcnNjaGUgU3RyYXNzZSAzMywgRC0zODMwNCBXb2xmZW5idXR0ZWwsIEdlcm1hbnku

IHZtYUBiaW9iYXNlLmRlPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+VFJBTlNGQUMgYW5k

IGl0cyBtb2R1bGUgVFJBTlNDb21wZWw6IHRyYW5zY3JpcHRpb25hbCBnZW5lIHJlZ3VsYXRpb24g

aW4gZXVrYXJ5b3RlczwvdGl0bGU+PHNlY29uZGFyeS10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwv

c2Vjb25kYXJ5LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPk51Y2xlaWMg

QWNpZHMgUmVzPC9mdWxsLXRpdGxlPjwvcGVyaW9kaWNhbD48cGFnZXM+RDEwOC0xMDwvcGFnZXM+

PHZvbHVtZT4zNDwvdm9sdW1lPjxudW1iZXI+RGF0YWJhc2UgaXNzdWU8L251bWJlcj48a2V5d29y

ZHM+PGtleXdvcmQ+QW5pbWFsczwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcy9nZW5ldGlj

czwva2V5d29yZD48a2V5d29yZD5BcmFiaWRvcHNpcyBQcm90ZWlucy9jaGVtaXN0cnkvbWV0YWJv

bGlzbTwva2V5d29yZD48a2V5d29yZD5CaW5kaW5nIFNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPkRO

QS9jaGVtaXN0cnkvbWV0YWJvbGlzbTwva2V5d29yZD48a2V5d29yZD4qRGF0YWJhc2VzLCBHZW5l

dGljPC9rZXl3b3JkPjxrZXl3b3JkPkRyb3NvcGhpbGEgUHJvdGVpbnMvY2hlbWlzdHJ5L21ldGFi

b2xpc208L2tleXdvcmQ+PGtleXdvcmQ+RHJvc29waGlsYSBtZWxhbm9nYXN0ZXIvZ2VuZXRpY3M8

L2tleXdvcmQ+PGtleXdvcmQ+KkdlbmUgRXhwcmVzc2lvbiBSZWd1bGF0aW9uPC9rZXl3b3JkPjxr

ZXl3b3JkPkh1bWFuczwva2V5d29yZD48a2V5d29yZD5JbnRlcm5ldDwva2V5d29yZD48a2V5d29y

ZD5NaWNlPC9rZXl3b3JkPjxrZXl3b3JkPlByb3RlaW4gU3RydWN0dXJlLCBUZXJ0aWFyeTwva2V5

d29yZD48a2V5d29yZD5SYXRzPC9rZXl3b3JkPjxrZXl3b3JkPipSZWd1bGF0b3J5IFNlcXVlbmNl

cywgTnVjbGVpYyBBY2lkPC9rZXl3b3JkPjxrZXl3b3JkPlN5c3RlbXMgSW50ZWdyYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+VHJhbnNjcmlwdGlvbiBGYWN0b3JzL2NoZW1pc3RyeS8qbWV0YWJvbGlz

bTwva2V5d29yZD48a2V5d29yZD5UcmFuc2NyaXB0aW9uLCBHZW5ldGljPC9rZXl3b3JkPjxrZXl3

b3JkPlVzZXItQ29tcHV0ZXIgSW50ZXJmYWNlPC9rZXl3b3JkPjwva2V5d29yZHM+PGRhdGVzPjx5

ZWFyPjIwMDY8L3llYXI+PHB1Yi1kYXRlcz48ZGF0ZT5KYW4gMTwvZGF0ZT48L3B1Yi1kYXRlcz48

L2RhdGVzPjxpc2JuPjEzNjItNDk2MiAoRWxlY3Ryb25pYykmI3hEOzAzMDUtMTA0OCAoTGlua2lu

Zyk8L2lzYm4+PGFjY2Vzc2lvbi1udW0+MTYzODE4MjU8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJl

bGF0ZWQtdXJscz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMTYzODE4

MjU8L3VybD48L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+UE1DMTM0NzUwNTwvY3VzdG9t

Mj48ZWxlY3Ryb25pYy1yZXNvdXJjZS1udW0+MTAuMTA5My9uYXIvZ2tqMTQzPC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA (Matys et al., 2006) format. The mapping of matrix to sequence is performed with a user-defined deficit between 0 and 1, which determines the rigour of the mapping (0 for a perfect match; 1 for no match).Figure 2: Transcription factor motifPosition frequency matrix and sequence logo representation of a Mafb transcription factor motif from JASPAR (MA0117.2), showing the highly conserved core region (black bar) and the variable flanking regions.Transcription factors often have a core binding region that is highly conserved, which is flanked by areas of higher variability. CiiiDER defines the core as the five most conserved consecutive bases (which is calculated using the sum of information vector values; see ADDIN EN.CITE <EndNote><Cite><Author>Kel</Author><Year>2003</Year><RecNum>3</RecNum><DisplayText>(Kel et al., 2003)</DisplayText><record><rec-number>3</rec-number><foreign-keys><key app="EN" db-id="zprw2wat9wz50vea9wfpp5e59xrwvv0tserf" timestamp="1446009093">3</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Kel, A. E.</author><author>Gossling, E.</author><author>Reuter, I.</author><author>Cheremushkin, E.</author><author>Kel-Margoulis, O. V.</author><author>Wingender, E.</author></authors></contributors><auth-address>BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenbuttel, Germany. ake@biobase.de</auth-address><titles><title>MATCH: A tool for searching transcription factor binding sites in DNA sequences</title><secondary-title>Nucleic Acids Res</secondary-title></titles><periodical><full-title>Nucleic Acids Res</full-title></periodical><pages>3576-9</pages><volume>31</volume><number>13</number><keywords><keyword>Algorithms</keyword><keyword>Binding Sites</keyword><keyword>Internet</keyword><keyword>Regulatory Sequences, Nucleic Acid</keyword><keyword>Sequence Analysis, DNA/*methods</keyword><keyword>*Software</keyword><keyword>Transcription Factors/*metabolism</keyword><keyword>User-Computer Interface</keyword></keywords><dates><year>2003</year><pub-dates><date>Jul 1</date></pub-dates></dates><isbn>1362-4962 (Electronic)&#xD;0305-1048 (Linking)</isbn><accession-num>12824369</accession-num><urls><related-urls><url>;(Kel et al., 2003)).To predict binding sites, CiiiDER splits sequences into overlapping regions of five bases and first matches these smaller regions with the core of the transcription factor model. If a core match is found, then the window is increased to incorporate the full length of the transcription factor binding site matrix and a whole matrix match is calculated. A potential site must have a core match and matrix match below the deficit cut-off.CiiiDER performs these calculations simultaneously for DNA sequences using threading techniques to employ multiple processors, which greatly increases the overall analysis speed.Figure 3: Site predictionThe JASPAR Sox10 transcription factor PFM (MA0442.1) is shown.Enrichment analysisAlthough CiiiDER can efficiently identify putative transcription factor binding sites, there are many binding sites present in any given DNA sequence. Some of these are false positives, randomly generated in the DNA sequence, that are not bound by the transcription factor. Others may be true binding sites in certain biological contexts, but not others. By comparing the predicted binding sites present in a list of co-regulated genes to those found in an appropriate background list of genes, CiiiDER can identify which transcription factors are over- or under- represented and are therefore may be playing important roles in regulating the genes of interest.Background gene list selectionSelection of the background gene set is important as it can vastly alter the specificity of the answers that CiiiDER can provide. Multiple enrichment analyses can be performed using different background to ask different questions.It is best practice to ensure that the background is as close to the co-regulated gene set as possible. As an example, when analysing genes that change in macrophages in response to bacterial infection, the best background would be genes that were expressed in macrophages but were unchanged during infection; the transcription factors identified are likely to be solely involved in regulating those genes altered in response to infection. A background containing genes randomly selected from the whole genome would be less appropriate, as the enriched transcription factors identified would contain those required for normal gene expression in the macrophage as well as those factors involved in response to infection.If the co-regulated gene set comes from a microarray or RNA-seq experiment, then it is best practice to construct the background from genes that are expressed and have very low fold change. Larger gene sets increase the power of the calculations. In general, we recommend using at least 100 genes of interest. The background gene list should be at least as large as the size of the co-regulated list.Enrichment calculationsP-valuesSignificantly over- and under-represented transcription factors can be found using two statistics. The main test is a Fisher’s exact test, which gives the gene coverage P-value ( REF _Ref314913592 \h \* MERGEFORMAT Figure 4A). This test compares the numbers of sequences that are bound and unbound by a transcription factor.This statistic is important as it distinguishes situations where a single gene in the co-regulated gene set contains many copies of the transcription factor and the majority of the genes do not contain site for the transcription factor and the more interesting situation where the majority of the genes contain binding sites.P-values are calculated for each transcription factor at different deficit cut-offs.The other test is a Mann-Whitney U test to compare the distributions of the number of sites in the search and background gene set ( REF _Ref314913592 \h \* MERGEFORMAT Figure 4B). The resulting P-value is referred to as the site count P-value.Figure 4: Enrichment statisticsTwo statistics can be used to compare search and background genes. The gene coverage P-value uses a Fisher’s exact test on the numbers of bound and unbound genes. The site count P-value uses a Mann-Whitney U test on the distribution of sites per gene.Enrichment and proportion boundTwo additional calculations are made for the enrichment results, either at a defined deficit or at the deficit that gives the most significant gene coverage P-value. If a given transcription factor has binding sites in nS out of NS genes and in nB out of NB background genes then:log2Enrichment=log2nS+12NS+12÷nB+12NB+12This is greater than zero if the transcription factor is over-represented, occurring in a greater proportion of search genes than of background genes, and less than zero if the transcription factor is under-represented. The larger the value, the greater the level of over- or under-representation. Note that 1/2 is added to the numerators and denominators to avoid zeroes.Average log2Proportion Bound=12log2nS+12NS+12+12log2nB+12NB+12This is equal to zero if the transcription factor site is predicted in every gene in both gene sets, otherwise it is less than zero (e.g. ?1 if sites are present in 1/2 genes, ?2 if 1/4 genes).Proximal enrichment analysisIt is possible to extend the analysis described above to examine whether transcription factor sites are enriched near other transcription factor sites ( REF _Ref315519839 \h \* MERGEFORMAT Figure 5), which may indicate that they are acting co-operatively. Transcription factor sites of interest are identified in a search gene set and a background gene set. The regions surrounding these sites are then scanned with different transcription factor models and an enrichment analysis is performed to find over- or under-represented factors in these windows. Note that this analysis is currently only available from the GUI, not the command line.Figure 5: Proximal enrichment analysisRegions surrounding transcription factor binding sites of interest are assessed for enrichment of other transcription factors, compared to a background.Acquisition and Installation System requirementsCiiiDER works best on desktop computers.Prerequisite installationIn order to run the CiiiDER JAR file, the computer must have the Java Runtime Environment (JRE) (most computers will already have this installed). CiiiDER requires JRE version 1.6 or above. To determine which version is installed on your computer, try the Java website version test: that this test does not work with some browsers (such as Google Chrome).Windows/Mac/LinuxAlternatively, if you are comfortable using the command line then simply type into your terminal or command line window:java –versionIf Java is installed you will see the output:java version "1.8.0_60"Java(TM) SE Runtime Environment (build 1.8.0_60-b27)Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)To open command prompt in windows: Open command prompt by pressing ‘Windows’ + R shortcut to bring up the ‘Run’ dialogue, then type in ‘cmd’ and click OK. In command prompt, type “java -version”. Java installationIf you do not already have JRE or the correct version of JRE running on your machine, then it can be downloaded from the Java website (en), along with installation instructions.Program Acquisition Download from . Mouse and human genomes can also be pre-packaged with CiiiDER. GUI workflowThis section provides a guide to using the graphical user interface. Open the CiiiDER program (or run the JAR file) and press the “START” button to commence a new analysis.Figure 9: The start panelScanA new dialog box will open requesting a gene list, transcription factors and a deficit. The “Run Scan” button will begin the site prediction for the chosen gene list; the “Run Enrichment” button can be used to queue an enrichment analysis to immediately follow this site prediction, but this is not essential (see page PAGEREF _Ref314992708 \h 20).Figure 10: Scan load boxImporting a gene listGene lists can be loaded from a file or pasted into the text box. FASTA sequences, gene symbols, Ensembl IDs, GTF and BED files are all acceptable. DNA sequences in FASTA format require no further input; otherwise, the program requires a genome to extract DNA sequences flanking the transcription start site of the gene (or sequences surrounding the start position given in a GTF file or the midpoint of BED file regions). The default sequence extraction takes 1,500 bases upstream and 500 bases downstream. Available genomes are stored in the Genomes folder within the CiiiDER program folder and appear in a dropdown menu. The most recent Ensembl genomes and annotations for key species will be available to download from the CiiiDER website. If no files are present in the Genomes folder or the “Advanced selection…” option is chosen, a genome file (FASTA format) and an annotation file (GTF or GLM format) must be loaded (these can be obtained from the Ensembl FTP website).Importing transcription factor modelsPFM transcription factor models are required to perform the scan. CiiiDER provides the JASPAR CORE non-redundant vertebrate transcription factors PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5NYXRoZWxpZXI8L0F1dGhvcj48WWVhcj4yMDE2PC9ZZWFy

PjxSZWNOdW0+MTk8L1JlY051bT48RGlzcGxheVRleHQ+KE1hdGhlbGllciBldCBhbC4sIDIwMTYp

PC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjE5PC9yZWMtbnVtYmVyPjxmb3JlaWdu

LWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0ienBydzJ3YXQ5d3o1MHZlYTl3ZnBwNWU1OXhyd3Z2

MHRzZXJmIiB0aW1lc3RhbXA9IjE0NTMxODEzMjciPjE5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJl

Zi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+

PGF1dGhvcnM+PGF1dGhvcj5NYXRoZWxpZXIsIEEuPC9hdXRob3I+PGF1dGhvcj5Gb3JuZXMsIE8u

PC9hdXRob3I+PGF1dGhvcj5BcmVuaWxsYXMsIEQuIEouPC9hdXRob3I+PGF1dGhvcj5DaGVuLCBD

LiBZLjwvYXV0aG9yPjxhdXRob3I+RGVuYXksIEcuPC9hdXRob3I+PGF1dGhvcj5MZWUsIEouPC9h

dXRob3I+PGF1dGhvcj5TaGksIFcuPC9hdXRob3I+PGF1dGhvcj5TaHlyLCBDLjwvYXV0aG9yPjxh

dXRob3I+VGFuLCBHLjwvYXV0aG9yPjxhdXRob3I+V29yc2xleS1IdW50LCBSLjwvYXV0aG9yPjxh

dXRob3I+WmhhbmcsIEEuIFcuPC9hdXRob3I+PGF1dGhvcj5QYXJjeSwgRi48L2F1dGhvcj48YXV0

aG9yPkxlbmhhcmQsIEIuPC9hdXRob3I+PGF1dGhvcj5TYW5kZWxpbiwgQS48L2F1dGhvcj48YXV0

aG9yPldhc3Nlcm1hbiwgVy4gVy48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1

dGgtYWRkcmVzcz5DZW50cmUgZm9yIE1vbGVjdWxhciBNZWRpY2luZSBhbmQgVGhlcmFwZXV0aWNz

IGF0IHRoZSBDaGlsZCBhbmQgRmFtaWx5IFJlc2VhcmNoIEluc3RpdHV0ZSwgRGVwYXJ0bWVudCBv

ZiBNZWRpY2FsIEdlbmV0aWNzLCBVbml2ZXJzaXR5IG9mIEJyaXRpc2ggQ29sdW1iaWEsIFZhbmNv

dXZlciwgVjVaIDRINCwgQkMsIENhbmFkYS4mI3hEO0xhYm9yYXRvaXJlIFBoeXNpb2xvZ2llIENl

bGx1bGFpcmUgJmFtcDsgVmVnZXRhbGUsIFVuaXZlcnNpdGUgR3Jlbm9ibGUgQWxwZXMsIENOUlMs

IENFQSwgaVJUU1YsIElOUkEsIDM4MDU0IEdyZW5vYmxlLCBGcmFuY2UuJiN4RDtDb21wdXRhdGlv

bmFsIFJlZ3VsYXRvcnkgR2Vub21pY3MsIE1SQyBDbGluaWNhbCBTY2llbmNlcyBDZW50cmUsIElt

cGVyaWFsIENvbGxlZ2UgTG9uZG9uLCBEdSBDYW5lIFJvYWQsIExvbmRvbiBXMTIgME5OLCBVSy4m

I3hEO0NvbXB1dGF0aW9uYWwgUmVndWxhdG9yeSBHZW5vbWljcywgTVJDIENsaW5pY2FsIFNjaWVu

Y2VzIENlbnRyZSwgSW1wZXJpYWwgQ29sbGVnZSBMb25kb24sIER1IENhbmUgUm9hZCwgTG9uZG9u

IFcxMiAwTk4sIFVLIGIubGVuaGFyZEBjc2MubXJjLmFjLnVrLiYjeEQ7VGhlIEJpb2luZm9ybWF0

aWNzIENlbnRyZSwgRGVwYXJ0bWVudCBvZiBCaW9sb2d5IGFuZCBCaW90ZWNoIFJlc2VhcmNoIGFu

ZCBJbm5vdmF0aW9uIENlbnRyZSwgQ29wZW5oYWdlbiBVbml2ZXJzaXR5LCBPbGUgTWFhbG9lcyBW

ZWogNSwgREstMjIwMCwgRGVubWFyayBhbGJpbkBiaW5mLmt1LmRrLiYjeEQ7Q2VudHJlIGZvciBN

b2xlY3VsYXIgTWVkaWNpbmUgYW5kIFRoZXJhcGV1dGljcyBhdCB0aGUgQ2hpbGQgYW5kIEZhbWls

eSBSZXNlYXJjaCBJbnN0aXR1dGUsIERlcGFydG1lbnQgb2YgTWVkaWNhbCBHZW5ldGljcywgVW5p

dmVyc2l0eSBvZiBCcml0aXNoIENvbHVtYmlhLCBWYW5jb3V2ZXIsIFY1WiA0SDQsIEJDLCBDYW5h

ZGEgd3lldGhAY21tdC51YmMuY2EuPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+SkFTUEFS

IDIwMTY6IGEgbWFqb3IgZXhwYW5zaW9uIGFuZCB1cGRhdGUgb2YgdGhlIG9wZW4tYWNjZXNzIGRh

dGFiYXNlIG9mIHRyYW5zY3JpcHRpb24gZmFjdG9yIGJpbmRpbmcgcHJvZmlsZXM8L3RpdGxlPjxz

ZWNvbmRhcnktdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L3NlY29uZGFyeS10aXRsZT48L3RpdGxl

cz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwvZnVsbC10aXRsZT48

L3BlcmlvZGljYWw+PHBhZ2VzPkQxMTAtNTwvcGFnZXM+PHZvbHVtZT40NDwvdm9sdW1lPjxudW1i

ZXI+RDE8L251bWJlcj48ZGF0ZXM+PHllYXI+MjAxNjwveWVhcj48cHViLWRhdGVzPjxkYXRlPkph

biA0PC9kYXRlPjwvcHViLWRhdGVzPjwvZGF0ZXM+PGlzYm4+MTM2Mi00OTYyIChFbGVjdHJvbmlj

KSYjeEQ7MDMwNS0xMDQ4IChMaW5raW5nKTwvaXNibj48YWNjZXNzaW9uLW51bT4yNjUzMTgyNjwv

YWNjZXNzaW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL3d3dy5uY2JpLm5s

bS5uaWguZ292L3B1Ym1lZC8yNjUzMTgyNjwvdXJsPjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3Vz

dG9tMj5QTUM0NzAyODQyPC9jdXN0b20yPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkz

L25hci9na3YxMTc2PC9lbGVjdHJvbmljLXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9F

bmROb3RlPgB=

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5NYXRoZWxpZXI8L0F1dGhvcj48WWVhcj4yMDE2PC9ZZWFy

PjxSZWNOdW0+MTk8L1JlY051bT48RGlzcGxheVRleHQ+KE1hdGhlbGllciBldCBhbC4sIDIwMTYp

PC9EaXNwbGF5VGV4dD48cmVjb3JkPjxyZWMtbnVtYmVyPjE5PC9yZWMtbnVtYmVyPjxmb3JlaWdu

LWtleXM+PGtleSBhcHA9IkVOIiBkYi1pZD0ienBydzJ3YXQ5d3o1MHZlYTl3ZnBwNWU1OXhyd3Z2

MHRzZXJmIiB0aW1lc3RhbXA9IjE0NTMxODEzMjciPjE5PC9rZXk+PC9mb3JlaWduLWtleXM+PHJl

Zi10eXBlIG5hbWU9IkpvdXJuYWwgQXJ0aWNsZSI+MTc8L3JlZi10eXBlPjxjb250cmlidXRvcnM+

PGF1dGhvcnM+PGF1dGhvcj5NYXRoZWxpZXIsIEEuPC9hdXRob3I+PGF1dGhvcj5Gb3JuZXMsIE8u

PC9hdXRob3I+PGF1dGhvcj5BcmVuaWxsYXMsIEQuIEouPC9hdXRob3I+PGF1dGhvcj5DaGVuLCBD

LiBZLjwvYXV0aG9yPjxhdXRob3I+RGVuYXksIEcuPC9hdXRob3I+PGF1dGhvcj5MZWUsIEouPC9h

dXRob3I+PGF1dGhvcj5TaGksIFcuPC9hdXRob3I+PGF1dGhvcj5TaHlyLCBDLjwvYXV0aG9yPjxh

dXRob3I+VGFuLCBHLjwvYXV0aG9yPjxhdXRob3I+V29yc2xleS1IdW50LCBSLjwvYXV0aG9yPjxh

dXRob3I+WmhhbmcsIEEuIFcuPC9hdXRob3I+PGF1dGhvcj5QYXJjeSwgRi48L2F1dGhvcj48YXV0

aG9yPkxlbmhhcmQsIEIuPC9hdXRob3I+PGF1dGhvcj5TYW5kZWxpbiwgQS48L2F1dGhvcj48YXV0

aG9yPldhc3Nlcm1hbiwgVy4gVy48L2F1dGhvcj48L2F1dGhvcnM+PC9jb250cmlidXRvcnM+PGF1

dGgtYWRkcmVzcz5DZW50cmUgZm9yIE1vbGVjdWxhciBNZWRpY2luZSBhbmQgVGhlcmFwZXV0aWNz

IGF0IHRoZSBDaGlsZCBhbmQgRmFtaWx5IFJlc2VhcmNoIEluc3RpdHV0ZSwgRGVwYXJ0bWVudCBv

ZiBNZWRpY2FsIEdlbmV0aWNzLCBVbml2ZXJzaXR5IG9mIEJyaXRpc2ggQ29sdW1iaWEsIFZhbmNv

dXZlciwgVjVaIDRINCwgQkMsIENhbmFkYS4mI3hEO0xhYm9yYXRvaXJlIFBoeXNpb2xvZ2llIENl

bGx1bGFpcmUgJmFtcDsgVmVnZXRhbGUsIFVuaXZlcnNpdGUgR3Jlbm9ibGUgQWxwZXMsIENOUlMs

IENFQSwgaVJUU1YsIElOUkEsIDM4MDU0IEdyZW5vYmxlLCBGcmFuY2UuJiN4RDtDb21wdXRhdGlv

bmFsIFJlZ3VsYXRvcnkgR2Vub21pY3MsIE1SQyBDbGluaWNhbCBTY2llbmNlcyBDZW50cmUsIElt

cGVyaWFsIENvbGxlZ2UgTG9uZG9uLCBEdSBDYW5lIFJvYWQsIExvbmRvbiBXMTIgME5OLCBVSy4m

I3hEO0NvbXB1dGF0aW9uYWwgUmVndWxhdG9yeSBHZW5vbWljcywgTVJDIENsaW5pY2FsIFNjaWVu

Y2VzIENlbnRyZSwgSW1wZXJpYWwgQ29sbGVnZSBMb25kb24sIER1IENhbmUgUm9hZCwgTG9uZG9u

IFcxMiAwTk4sIFVLIGIubGVuaGFyZEBjc2MubXJjLmFjLnVrLiYjeEQ7VGhlIEJpb2luZm9ybWF0

aWNzIENlbnRyZSwgRGVwYXJ0bWVudCBvZiBCaW9sb2d5IGFuZCBCaW90ZWNoIFJlc2VhcmNoIGFu

ZCBJbm5vdmF0aW9uIENlbnRyZSwgQ29wZW5oYWdlbiBVbml2ZXJzaXR5LCBPbGUgTWFhbG9lcyBW

ZWogNSwgREstMjIwMCwgRGVubWFyayBhbGJpbkBiaW5mLmt1LmRrLiYjeEQ7Q2VudHJlIGZvciBN

b2xlY3VsYXIgTWVkaWNpbmUgYW5kIFRoZXJhcGV1dGljcyBhdCB0aGUgQ2hpbGQgYW5kIEZhbWls

eSBSZXNlYXJjaCBJbnN0aXR1dGUsIERlcGFydG1lbnQgb2YgTWVkaWNhbCBHZW5ldGljcywgVW5p

dmVyc2l0eSBvZiBCcml0aXNoIENvbHVtYmlhLCBWYW5jb3V2ZXIsIFY1WiA0SDQsIEJDLCBDYW5h

ZGEgd3lldGhAY21tdC51YmMuY2EuPC9hdXRoLWFkZHJlc3M+PHRpdGxlcz48dGl0bGU+SkFTUEFS

IDIwMTY6IGEgbWFqb3IgZXhwYW5zaW9uIGFuZCB1cGRhdGUgb2YgdGhlIG9wZW4tYWNjZXNzIGRh

dGFiYXNlIG9mIHRyYW5zY3JpcHRpb24gZmFjdG9yIGJpbmRpbmcgcHJvZmlsZXM8L3RpdGxlPjxz

ZWNvbmRhcnktdGl0bGU+TnVjbGVpYyBBY2lkcyBSZXM8L3NlY29uZGFyeS10aXRsZT48L3RpdGxl

cz48cGVyaW9kaWNhbD48ZnVsbC10aXRsZT5OdWNsZWljIEFjaWRzIFJlczwvZnVsbC10aXRsZT48

L3BlcmlvZGljYWw+PHBhZ2VzPkQxMTAtNTwvcGFnZXM+PHZvbHVtZT40NDwvdm9sdW1lPjxudW1i

ZXI+RDE8L251bWJlcj48ZGF0ZXM+PHllYXI+MjAxNjwveWVhcj48cHViLWRhdGVzPjxkYXRlPkph

biA0PC9kYXRlPjwvcHViLWRhdGVzPjwvZGF0ZXM+PGlzYm4+MTM2Mi00OTYyIChFbGVjdHJvbmlj

KSYjeEQ7MDMwNS0xMDQ4IChMaW5raW5nKTwvaXNibj48YWNjZXNzaW9uLW51bT4yNjUzMTgyNjwv

YWNjZXNzaW9uLW51bT48dXJscz48cmVsYXRlZC11cmxzPjx1cmw+aHR0cDovL3d3dy5uY2JpLm5s

bS5uaWguZ292L3B1Ym1lZC8yNjUzMTgyNjwvdXJsPjwvcmVsYXRlZC11cmxzPjwvdXJscz48Y3Vz

dG9tMj5QTUM0NzAyODQyPC9jdXN0b20yPjxlbGVjdHJvbmljLXJlc291cmNlLW51bT4xMC4xMDkz

L25hci9na3YxMTc2PC9lbGVjdHJvbmljLXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9F

bmROb3RlPgB=

ADDIN EN.CITE.DATA (Mathelier et al., 2016) and PFMs from a large SELEX experimental dataset PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Kb2xtYTwvQXV0aG9yPjxZZWFyPjIwMTA8L1llYXI+PFJl

Y051bT4xMjwvUmVjTnVtPjxEaXNwbGF5VGV4dD4oSm9sbWEgZXQgYWwuLCAyMDEwKTwvRGlzcGxh

eVRleHQ+PHJlY29yZD48cmVjLW51bWJlcj4xMjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxr

ZXkgYXBwPSJFTiIgZGItaWQ9Inpwcncyd2F0OXd6NTB2ZWE5d2ZwcDVlNTl4cnd2djB0c2VyZiIg

dGltZXN0YW1wPSIxNDQ2MDY4NzA5Ij4xMjwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBu

YW1lPSJKb3VybmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3Jz

PjxhdXRob3I+Sm9sbWEsIEEuPC9hdXRob3I+PGF1dGhvcj5LaXZpb2phLCBULjwvYXV0aG9yPjxh

dXRob3I+VG9pdm9uZW4sIEouPC9hdXRob3I+PGF1dGhvcj5DaGVuZywgTC48L2F1dGhvcj48YXV0

aG9yPldlaSwgRy48L2F1dGhvcj48YXV0aG9yPkVuZ2UsIE0uPC9hdXRob3I+PGF1dGhvcj5UYWlw

YWxlLCBNLjwvYXV0aG9yPjxhdXRob3I+VmFxdWVyaXphcywgSi4gTS48L2F1dGhvcj48YXV0aG9y

PllhbiwgSi48L2F1dGhvcj48YXV0aG9yPlNpbGxhbnBhYSwgTS4gSi48L2F1dGhvcj48YXV0aG9y

PkJvbmtlLCBNLjwvYXV0aG9yPjxhdXRob3I+UGFsaW4sIEsuPC9hdXRob3I+PGF1dGhvcj5UYWx1

a2RlciwgUy48L2F1dGhvcj48YXV0aG9yPkh1Z2hlcywgVC4gUi48L2F1dGhvcj48YXV0aG9yPkx1

c2NvbWJlLCBOLiBNLjwvYXV0aG9yPjxhdXRob3I+VWtrb25lbiwgRS48L2F1dGhvcj48YXV0aG9y

PlRhaXBhbGUsIEouPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+RGVwYXJ0bWVudCBvZiBNb2xlY3VsYXIgTWVkaWNpbmUsIE5hdGlvbmFsIFB1YmxpYyBIZWFs

dGggSW5zdGl0dXRlIChLVEwpIGFuZCBHZW5vbWUtU2NhbGUgQmlvbG9neSBQcm9ncmFtLCBJbnN0

aXR1dGUgb2YgQmlvbWVkaWNpbmUgYW5kIEhpZ2ggVGhyb3VnaHB1dCBDZW50ZXIsIFVuaXZlcnNp

dHkgb2YgSGVsc2lua2ksIEJpb21lZGljdW0sIEhlbHNpbmtpLCBGaW5sYW5kLjwvYXV0aC1hZGRy

ZXNzPjx0aXRsZXM+PHRpdGxlPk11bHRpcGxleGVkIG1hc3NpdmVseSBwYXJhbGxlbCBTRUxFWCBm

b3IgY2hhcmFjdGVyaXphdGlvbiBvZiBodW1hbiB0cmFuc2NyaXB0aW9uIGZhY3RvciBiaW5kaW5n

IHNwZWNpZmljaXRpZXM8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+R2Vub21lIFJlczwvc2Vjb25k

YXJ5LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkdlbm9tZSBSZXM8L2Z1

bGwtdGl0bGU+PC9wZXJpb2RpY2FsPjxwYWdlcz44NjEtNzM8L3BhZ2VzPjx2b2x1bWU+MjA8L3Zv

bHVtZT48bnVtYmVyPjY8L251bWJlcj48a2V5d29yZHM+PGtleXdvcmQ+QWZmaW5pdHkgTGFiZWxz

PC9rZXl3b3JkPjxrZXl3b3JkPkJhc2UgU2VxdWVuY2U8L2tleXdvcmQ+PGtleXdvcmQ+QmluZGlu

ZyBTaXRlczwva2V5d29yZD48a2V5d29yZD5EbmE8L2tleXdvcmQ+PGtleXdvcmQ+SHVtYW5zPC9r

ZXl3b3JkPjxrZXl3b3JkPk1vbGVjdWxhciBTZXF1ZW5jZSBEYXRhPC9rZXl3b3JkPjxrZXl3b3Jk

PipTRUxFWCBBcHRhbWVyIFRlY2huaXF1ZTwva2V5d29yZD48a2V5d29yZD5UcmFuc2NyaXB0aW9u

IEZhY3RvcnMvKm1ldGFib2xpc208L2tleXdvcmQ+PC9rZXl3b3Jkcz48ZGF0ZXM+PHllYXI+MjAx

MDwveWVhcj48cHViLWRhdGVzPjxkYXRlPkp1bjwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxp

c2JuPjE1NDktNTQ2OSAoRWxlY3Ryb25pYykmI3hEOzEwODgtOTA1MSAoTGlua2luZyk8L2lzYm4+

PGFjY2Vzc2lvbi1udW0+MjAzNzg3MTg8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJs

cz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMjAzNzg3MTg8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+UE1DMjg3NzU4MjwvY3VzdG9tMj48ZWxlY3Ry

b25pYy1yZXNvdXJjZS1udW0+MTAuMTEwMS9nci4xMDA1NTIuMTA5PC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5Kb2xtYTwvQXV0aG9yPjxZZWFyPjIwMTA8L1llYXI+PFJl

Y051bT4xMjwvUmVjTnVtPjxEaXNwbGF5VGV4dD4oSm9sbWEgZXQgYWwuLCAyMDEwKTwvRGlzcGxh

eVRleHQ+PHJlY29yZD48cmVjLW51bWJlcj4xMjwvcmVjLW51bWJlcj48Zm9yZWlnbi1rZXlzPjxr

ZXkgYXBwPSJFTiIgZGItaWQ9Inpwcncyd2F0OXd6NTB2ZWE5d2ZwcDVlNTl4cnd2djB0c2VyZiIg

dGltZXN0YW1wPSIxNDQ2MDY4NzA5Ij4xMjwva2V5PjwvZm9yZWlnbi1rZXlzPjxyZWYtdHlwZSBu

YW1lPSJKb3VybmFsIEFydGljbGUiPjE3PC9yZWYtdHlwZT48Y29udHJpYnV0b3JzPjxhdXRob3Jz

PjxhdXRob3I+Sm9sbWEsIEEuPC9hdXRob3I+PGF1dGhvcj5LaXZpb2phLCBULjwvYXV0aG9yPjxh

dXRob3I+VG9pdm9uZW4sIEouPC9hdXRob3I+PGF1dGhvcj5DaGVuZywgTC48L2F1dGhvcj48YXV0

aG9yPldlaSwgRy48L2F1dGhvcj48YXV0aG9yPkVuZ2UsIE0uPC9hdXRob3I+PGF1dGhvcj5UYWlw

YWxlLCBNLjwvYXV0aG9yPjxhdXRob3I+VmFxdWVyaXphcywgSi4gTS48L2F1dGhvcj48YXV0aG9y

PllhbiwgSi48L2F1dGhvcj48YXV0aG9yPlNpbGxhbnBhYSwgTS4gSi48L2F1dGhvcj48YXV0aG9y

PkJvbmtlLCBNLjwvYXV0aG9yPjxhdXRob3I+UGFsaW4sIEsuPC9hdXRob3I+PGF1dGhvcj5UYWx1

a2RlciwgUy48L2F1dGhvcj48YXV0aG9yPkh1Z2hlcywgVC4gUi48L2F1dGhvcj48YXV0aG9yPkx1

c2NvbWJlLCBOLiBNLjwvYXV0aG9yPjxhdXRob3I+VWtrb25lbiwgRS48L2F1dGhvcj48YXV0aG9y

PlRhaXBhbGUsIEouPC9hdXRob3I+PC9hdXRob3JzPjwvY29udHJpYnV0b3JzPjxhdXRoLWFkZHJl

c3M+RGVwYXJ0bWVudCBvZiBNb2xlY3VsYXIgTWVkaWNpbmUsIE5hdGlvbmFsIFB1YmxpYyBIZWFs

dGggSW5zdGl0dXRlIChLVEwpIGFuZCBHZW5vbWUtU2NhbGUgQmlvbG9neSBQcm9ncmFtLCBJbnN0

aXR1dGUgb2YgQmlvbWVkaWNpbmUgYW5kIEhpZ2ggVGhyb3VnaHB1dCBDZW50ZXIsIFVuaXZlcnNp

dHkgb2YgSGVsc2lua2ksIEJpb21lZGljdW0sIEhlbHNpbmtpLCBGaW5sYW5kLjwvYXV0aC1hZGRy

ZXNzPjx0aXRsZXM+PHRpdGxlPk11bHRpcGxleGVkIG1hc3NpdmVseSBwYXJhbGxlbCBTRUxFWCBm

b3IgY2hhcmFjdGVyaXphdGlvbiBvZiBodW1hbiB0cmFuc2NyaXB0aW9uIGZhY3RvciBiaW5kaW5n

IHNwZWNpZmljaXRpZXM8L3RpdGxlPjxzZWNvbmRhcnktdGl0bGU+R2Vub21lIFJlczwvc2Vjb25k

YXJ5LXRpdGxlPjwvdGl0bGVzPjxwZXJpb2RpY2FsPjxmdWxsLXRpdGxlPkdlbm9tZSBSZXM8L2Z1

bGwtdGl0bGU+PC9wZXJpb2RpY2FsPjxwYWdlcz44NjEtNzM8L3BhZ2VzPjx2b2x1bWU+MjA8L3Zv

bHVtZT48bnVtYmVyPjY8L251bWJlcj48a2V5d29yZHM+PGtleXdvcmQ+QWZmaW5pdHkgTGFiZWxz

PC9rZXl3b3JkPjxrZXl3b3JkPkJhc2UgU2VxdWVuY2U8L2tleXdvcmQ+PGtleXdvcmQ+QmluZGlu

ZyBTaXRlczwva2V5d29yZD48a2V5d29yZD5EbmE8L2tleXdvcmQ+PGtleXdvcmQ+SHVtYW5zPC9r

ZXl3b3JkPjxrZXl3b3JkPk1vbGVjdWxhciBTZXF1ZW5jZSBEYXRhPC9rZXl3b3JkPjxrZXl3b3Jk

PipTRUxFWCBBcHRhbWVyIFRlY2huaXF1ZTwva2V5d29yZD48a2V5d29yZD5UcmFuc2NyaXB0aW9u

IEZhY3RvcnMvKm1ldGFib2xpc208L2tleXdvcmQ+PC9rZXl3b3Jkcz48ZGF0ZXM+PHllYXI+MjAx

MDwveWVhcj48cHViLWRhdGVzPjxkYXRlPkp1bjwvZGF0ZT48L3B1Yi1kYXRlcz48L2RhdGVzPjxp

c2JuPjE1NDktNTQ2OSAoRWxlY3Ryb25pYykmI3hEOzEwODgtOTA1MSAoTGlua2luZyk8L2lzYm4+

PGFjY2Vzc2lvbi1udW0+MjAzNzg3MTg8L2FjY2Vzc2lvbi1udW0+PHVybHM+PHJlbGF0ZWQtdXJs

cz48dXJsPmh0dHA6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMjAzNzg3MTg8L3VybD48

L3JlbGF0ZWQtdXJscz48L3VybHM+PGN1c3RvbTI+UE1DMjg3NzU4MjwvY3VzdG9tMj48ZWxlY3Ry

b25pYy1yZXNvdXJjZS1udW0+MTAuMTEwMS9nci4xMDA1NTIuMTA5PC9lbGVjdHJvbmljLXJlc291

cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA (Jolma et al., 2010). These files are stored in the Matrices folder within the CiiiDER program folder and are available for selection in a dropdown menu.Alternatively, any file can be loaded containing transcription factors in JASPAR or TRANSFAC format. Other transcription factors can be obtained from the JASPAR website (), including redundant vertebrate models and PFMs for different phyla. The current TRANSFAC models require a licence, although older versions are freely available ().Other transcription factor modelsTranscription factors from other sources may need to be reformatted to work with CiiiDER. PFMs in JASPAR format are preceded by annotation information beginning with the “>” greater-than symbol (as for FASTA sequences), which is immediately followed by a unique transcription factor identifier, then a space (or tab) and the transcription factor name.Each row of the PFM represents a base (ordered A, C, G and T) and each column represents the count or frequency of that base at that position within the binding site (columns are separated by a space or a tab). Only the numbers are essential; the square brackets and the bases shown are optional.>MA0004.1ArntA [ 4 19 0 0 0 0 ]C [16 0 20 0 0 0 ]G [ 0 1 0 20 0 20 ]T [ 0 0 0 0 20 0 ]Figure 11: JASPAR matrix formatSelecting a deficitIn order to predict whether a sequence contains a site for a transcription factor, CiiiDER compares the DNA sequence to the PFM and generates a score of similarity. If the DNA sequence matches the PFM perfectly then the deficit value is 0. Transcription factor binding sites are variable and binding sites rarely match the models perfectly, so it is useful to allow imperfect matches as well.Any deficit cut-off between 0 and 1 is accepted. For general scans we recommend using deficit values between 0.1 and 0.2, especially for large datasets (with hundreds of genes and transcription factors). It is important to note that for higher deficits, it is less likely that a sequence contains a true site, so it is a balance to maximise the number of true positive sites, while restricting the number of false positive sites. The user interfaceOnce the analysis is complete, the promoter panel will be displayed. The basic layout of the user interface is the same for the different stages of analysis. At the top of the screen there are a number of options to help navigate the project and save data and images at any stage during the various analyses.It is interactive allowing the user to change parameters and perform new analyses quickly and easily, as well as compare multiple analyses by clicking between separate panels.Figure 12: The promoter panel user interfaceMenus and toolbarThe toolbar provides the user with the tools to manage their projects, to save data and images, and begin new analyses.File. Contains options to begin new projects, open existing projects, to save data and images as well as alter general properties.New Project begins a new project.Open Project opens an existing saved project from a file chosen by the user.Save Project saves the project in its current form in a “.cdr” file that can be opened again later.Export Project saves all the data from the existing project in its current state, including all images and analysis reports and parameters used in the analysis. Importantly it does not save the project.Save Image saves the current image.Project Information shows files and parameters used in the current project.Settings this is also where general visual properties are set including the number of computer processes you wish to use (by default this is one fewer than the number available), so this option is only useful if you wish to limit the processing power CiiiDER can access). The user can also specify how tall each promoter appears in the image (the default value is set to 30 pixels) and how many transcription factors can be visualised at the one time (the default value is set to 10).Exit closes the program, with an option to save the current projectSeveral of these options are available as short cuts on the toolbar, as well as links to commence further analyses, Scan and Enrichment, as well as Proximal Enrichment Analysis (see later).Tab PanelsThese tabs represent each of the current analyses that are part of the project. Note that although there can be multiple Enrichment and Proximal Enrichment analyses, there can be only one gene list and one promoter panel. Performing site identification again (with new matrices or with a different deficit) removes all other panels.Gene List PanelThis panel includes the list of genes and their sequences that are currently being used in the site identification and enrichment analyses. If the user has simply provided a list of gene symbols then they can view the complete DNA sequences for the regions they have specified. At the bottom of the list are any genes for which the sequence could not be found.Site interfaceThis is where the predicted transcription factor binding sites are displayed on the current gene list ( REF _Ref464035113 \h \* MERGEFORMAT Figure 13). The top of the panel contains a scale bar to show the position of each transcription factor binding site, either relative to the position of the transcription start site (if applicable) or the 5’ position from the end of the sequence. It is interactive to allow the creation of an appropriate image for publication or display.Figure 13: The site interfaceThere are several options for displaying or reordering sequences:Click –?select a sequencePress and drag –?move a single sequence up or downCtrl or shift click –?select multiple sequencesRight click –?show options Set heightOrder alphabeticallyOrder by number of (displayed) sites (descending)Order by number of (displayed) transcription factors (descending)Show or hide selected promotersShow all promotersTranscription factor panelThis is where the predicted transcription factors are displayed ( REF _Ref464035130 \h \* MERGEFORMAT Figure 14). The top section contains the transcription factors that are currently being shown on the site interface, while those below are currently hidden from view. Ten transcription factors are displayed by default (this can be changed under File: Properties). Double-clicking on a transcription factor sends it from one list to the other.Several options are available by right-clicking on a displayed transcription factor:Choose colourHide – moves the TF to the hidden listHide all other transcription factors –?only sites for the selected transcription factor are displayedSave genes with this transcription factor – a text file containing the names of all genes containing predicted binding sitesShow only promoters with this transcription factor – to show all promoters again, right click on the site interfaceFigure 14: Options for transcription factor and site display.SlidersThis transcription factor panel contains two sliders and text boxes for changing the image ( REF _Ref464035130 \h \* MERGEFORMAT Figure 14). For each slider the user can enter the value in the text box or select on the slider using the mouse.Zoom level slider: This changes the size of the promoters as they appear on the screen allowing the user to see the overall patterns or zoom in to closely inspect a single site.Deficit cut-off slider: This slider can be used to restrict the displayed transcription factor sites to those with a lower deficit, i.e. those sites that better match their transcription factor model. Note that a new scan must be performed to view sites at a higher deficit.Saving data and imagesThe “Save Data” and “Save Image” buttons are used to save information related to the currently selected panel. After a scan, the positions and scores for every predicted site can be saved in CSV format and an image of the current state of the site interface can be saved in JPG, PNG or GIF format.Enrichment analysisTo begin an enrichment analysis, select the “Enrichment” button in the toolbar; this opens the enrichment load box for selecting a background gene list ( REF _Ref464035357 \h \* MERGEFORMAT Figure 15). This background can be provided in the same formats as the current gene list (see the scan load box). If necessary, the genome, upstream and downstream parameters will be populated with the previous values.It is possible to enter a name for the analysis, since multiple enrichments may be run within the project; by default, the analysis will be named after the background gene list file or numbered sequentially. At this stage a gene coverage enrichment P-value can also be set, which determines the transcription factor sites that will be displayed (the default value is 0.05, but this can also be changed later). The “Run” button will commence the analysis. Figure 15: The enrichment load boxUsed for supplying a background gene list for an enrichment analysis.Enrichment interfaceThe enrichment interface is very similar to the scan interface, with all the same options; however, there is an extra set of sliders and check boxes ( REF _Ref447280490 \h \* MERGEFORMAT Figure 16) that are used to filter the sites that are displayed.-6223051943000 Figure 16: Enrichment interfaceThe default enrichment interface and optional sliders.Most significant deficit: When this check box is ticked, all transcription factors are displayed at the deficit at which they are most significantly enriched (minimising the gene coverage P-value) and the deficit cut-off slider disappears. This is the default.Over-represented transcription factors: Transcription factors may be over-represented or under-represented in the search compared to the background. By default, only over-represented factors are displayed.Coverage P-value slider: Transcription factors will be displayed if their gene coverage P-Value is less than the chosen value. (For descriptions of P-values see page PAGEREF _Ref447279800 \h 4.)Site count P-value slider: When the Site Count P-value check box is ticked, transcription factors must also meet the site count P-value threshold.Enrichment plotThe enrichment results for all transcription factors can also be viewed as an interactive HTML graph, created using the plotly javascript API ( REF _Ref447280514 \h \* MERGEFORMAT Figure 17). These plots show log2(Enrichment) versus Average Log2(Proportion Bound).Figure 17: Interactive enrichment scatter plotIllustrates the degree of over- or under-representation of each transcription factor and the proportion of genes that contain binding sites. Transcription factors are coloured according to their gene coverage P-value and whether they are over- or under-represented (while transcription factors that do not meet the chosen P-value threshold are in grey); the size of each point is also proportional to log10(P-value). Hovering over each point displays its annotation information.Saving data and imagesThe following data can be saved from the enrichment panel in CSV format:Statistics at the most significant deficit (an HTML graph is also saved)Statistics at the current deficit (an HTML graph is also saved)Information about the currently displayed (and hidden) sitesInformation about all predicted binding sitesAn image of the current state of the site interface can be saved in JPG, PNG or GIF format.Subsequent enrichmentsTo run another analysis using a different background, click the “Enrichment” button in the toolbar again. The results for each enrichment are stored in different tabs.Proximal enrichment analysisThe “Proximal Enrichment Analysis” button on toolbar launches the proximal enrichment load box ( REF _Ref464036660 \h \* MERGEFORMAT Figure 18).Figure 18: The proximal enrichment load boxSelecting sites of interestThe first step is to choose a transcription factor of interest. The sequences around the predicted binding sites for this transcription are used for a new scan; the window size option defines the length of these sequences. The transcription factor deficit is the cut-off for inclusion of sites in the analysis. If an enrichment analysis has been performed, it is possible to choose the most significant deficit to select the sites.Performing a new scan and enrichmentA set of transcription factor models are then used with the new scan at the specified deficit. These default to those which were used for the original scan. Once these options have been selected and “Run Enrichment” is pressed, another window opens for selection of the background gene list (this is identical to the usual enrichment load box). The background is first scanned with the transcription factor of interest to find predicted sites, then sequences are extracted around these sites, to be used as background sequences for the enrichment analysis.The proximal enrichment analysis panel is identical to the enrichment panel, with the same options for saving data, images and generating mand line workflowCiiiDER can be run using Terminal (Mac, Linux) or Command Prompt (Windows) without the GUI. Run the JAR using ‘-n’ and supply a configuration (config) file with a “.ini” extension:java –jar CiiiDER.jar –n config.iniThe config file provides input file paths, parameters and output file paths for CiiiDER.Note that the proximal enrichment analysis is not currently implemented in the command line workflow.General parametersRequired:STARTPOINT: Start point of the analysis, either 1 (scan) or 2 (enrichment analysis).ENDPOINT: End point of the analysis. 1, (scan) or 2 (enrichment analysis).Optional:PROCESSORS: Number of processors to use for multithreaded analyses. Defaults to the maximum number of the computer.INPUTFOLDER: File path of a folder containing input files.OUTPUTFOLDER: File path of a folder for saving results files.DEBUGLOGFILE: File path or name for debug file for storing progress and errors. Defaults to run.log in the output folder.PROJECTOUTPUTFILE: File path to save a CiiiDER project file, which can be opened in the GUI (.cdr extension).Scan parametersThe scan requires always requires GENELISTFILENAME and MATRIXFILE. If the scan is also the ENDPOINT, then GENESCANRESULTS is also required.Input files:GENELISTFILENAME: File path for the gene list in FASTA, GTF, BED or TXT format.REFERENCEFASTA: Genome sequence file in FASTA format (.fa or .fasta extension). Required for gene names, Ensembl IDs, GTF files and BED files.GENELOOKUPMANAGER: Transcription start site reference file (.gtf or .glm extension). Required for gene names and Ensembl IDs.MATRIXFILE: Input file path for the transcription factor models in TRANSFAC or JASPAR format.Parameters:DEFICIT: Cut-off for site prediction. Default 0.15.UPSTREAMOFFSET: For GTF, BED or TXT format gene lists. Default 1500.DOWNSTREAMOFFSET: For GTF, BED or TXT format gene lists. Default 500.Output files:GENESCANRESULTS: Output file path for binding site data. BSL extension to save for further analysis or TXT extension to obtain a list of sites.Enrichment parametersThe enrichment analysis always requires GENELISTFILENAME and MATRIXFILE. If enrichment is the STARTPOINT, then results of scans must also be provided as BINDSITEFILENAME and BGBINDSITEFILENAME. If enrichment is the ENDPOINT, then ENRICHMENTOUTPUTFILE is required.Input files:BINDSITEFILENAME: Optional file path for the containing existing binding site data in BSL format.BGGENELISTFILENAME: Input file path for the background gene list in FASTA, GTF or TXT format.BGBINDSITEFILENAME: Optional file path for the containing existing binding site data in CSL format.Parameters:ENRICHMENTCOVERAGEPVALUE: Default 0.05ENRICHMENTSITEPVALUE: Default 1.0.Output files:ENRICHMENTOUTPUTFILE: Output file path for enrichment results.BGGENESCANRESULTS: Optional output file path for the background binding site data in CSL format.References ADDIN EN.REFLIST Jolma, A., Kivioja, T., Toivonen, J., Cheng, L., Wei, G., Enge, M., Taipale, M., Vaquerizas, J.M., Yan, J., Sillanpaa, M.J., et al. (2010). Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res 20, 861-873.Kel, A.E., Gossling, E., Reuter, I., Cheremushkin, E., Kel-Margoulis, O.V., and Wingender, E. (2003). MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 31, 3576-3579.Khan, A., Fornes, O., Stigliani, A., Gheorghe, M., Castro-Mondragon, J.A., van der Lee, R., Bessy, A., Cheneby, J., Kulkarni, S.R., Tan, G., et al. (2018). JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 46, D260-D266.Mathelier, A., Fornes, O., Arenillas, D.J., Chen, C.Y., Denay, G., Lee, J., Shi, W., Shyr, C., Tan, G., Worsley-Hunt, R., et al. (2016). JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 44, D110-115.Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., et al. (2006). TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34, D108-110. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download