A study of Molecular Recognition Elements ‘MoREs’



MoRFs

A DATASET OF

MOLECULAR RECOGNITION FEATURES

Amrita Mohan

Submitted to the faculty of the Bioinformatics Graduate Program

in partial fulfillment of the requirements

for the degree

Master of Science

in the School of Informatics,

Indiana University

December 2005

Accepted by the Faculty of Indiana University, in partial

fulfillment of the requirements for the degree of Master of Science

________________________

A. Keith Dunker, PhD, Chair

________________________

Vladimir Uversky, PhD

________________________

Predrag Radivojac, PhD

________________________

Narayanan B. Perumal, PhD

To my parents,

in recognition of their worth

Acknowledgements

The completion of this thesis would not have been possible without the support of many people. A heart felt thanks to my research advisor, Keith for his support, guidance and above all providing me with the financial means to complete this thesis and Masters degree. Many thanks to Volodya, who besides guiding my research work, sportingly read numerous revisions of this thesis draft and helped make sense of confusion. I have furthermore to thank my committee member, Peru who has unfailingly boosted my morale throughout the duration of this study. Last but in no means the least, a sincere thanks to Pedja, for being an incredible scientific help and a great friend in an advisor’s guise.

ABSTRACT

AMRITA MOHAN

MoRFs

A DATASET OF MOLECULAR RECOGITION FEATURES

The last decade has witnessed numerous proteomic studies which have predicted and successfully confirmed the existence of extended structurally flexible regions in protein molecules. Parallel to these advancements, the last five years of structural bioinformatics has also experienced an explosion of results on molecular recognition and its importance in protein-protein interactions. This work provides an extension to past and ongoing research efforts by looking specifically at the “flexibility and disorder” found in protein sequences involved in molecular recognition processes and known as, Molecular Recognition Elements or Molecular Recognition Features (MoREs or MoRFs, as we call them). MoRFs are relatively short in length (10 – 70 residues length); loosely structured protein regions within longer sequences that are largely disordered in nature. Interestingly, upon binding to other proteins, these MoRFs are able to undergo disorder-to-order transition. Thus, in our interpretation, MoRFs could serve as potential binding sites, and that this binding to another protein lends a functional advantage to the whole protein complex by enabling interaction with their physiological partner. There are at least three basic types of MoRFs: those that form α-helical structures upon binding, those that form β-strands (in which the peptide forms a β-sheet with additional β-strands provided by the protein partner), and those that form irregular structures when bound. Our proposed names for these structures are α-MoRF (also known as α-MoRE, alpha helical molecular recognition feature/element), β-MoRF (beta sheet molecular recognition feature/element), and I-MoRF (Irregular molecular recognition feature/element), respectively. The results presented in this work suggest that functionally significant residual structure can exist in MoRF regions prior to the actual binding event. We also demonstrate profound conformational preferences within MoRF regions for α-helices.

We believe that the results from this study would subsequently improve our understanding of protein-protein interactions especially those related to the molecular recognition, and may pave way for future work on the development of protein binding site predictions.

We hope that via the conclusions of this work, we would have demonstrated that within only a few of years of its conception, intrinsic protein disorder has gained wide-scale importance in the field of protein-protein interactions and can be strongly associated with molecular recognition.

Table of Contents

Acknowledgements iv

ABSTRACT v

Introduction 1

A. Introduction to subject 1

B. Importance of this subject 4

C. Knowledge Gap 5

Background 7

A. Relevant research 7

B. Goal to be tested 8

C. Intended research goal 8

Materials & Methods 10

A. Dataset of MoRFs 10

Results 13

A. MoRFs dataset statistics & length distribution 13

B. Secondary Structure Analysis 14

C. Amino acid Composition, Charge & Aromatics in MoRFs 17

(a) Amino acid compositions 17

(b) Net & Total charges, Aromatics and Proline Content 19

D. Order-Disorder Predictions and Functional classes 20

E. Presence of poly–proline type II hélices & Ramachandran Plot 23

Conclusions 25

Discussions 28

Appendix

A. MoRFs(or MoREs) and their partners………………………………...………...….31

B. MoRF update......................................................................................................38

References 51

CURRICULUM VITAE

Introduction

A. Introduction to subject

Traditional understanding of protein structure and function relationship relies on protein function being critically dependent on a well-defined three-dimensional protein structure. However, recent studies revealed that the true functional state for many proteins and protein domains is an intrinsically unstructured conformation [1-14]. This phenomenon has been described for both partially and wholly disordered proteins. Since these first observations, the field of protein disorder and protein functionality resulting from this disorder has been steadily progressing.

[pic]

Figure 1: Time-dependent increase in the number of PubMed hits dealing with intrinsically disordered proteins. The following set of keywords has been used to perform this search: intrinsically disordered, intrinsically unstructured, natively unfolded, intrinsically unfolded and intrinsically flexible.

Figure 1 reflects the rapidly growing interest in the domain of intrinsically disordered proteins. In fact, 110 papers discussing disordered proteins were published during the year of 2004 (and as many as 50 such papers were published during the first quarter of 2005) [15].

The conformation of natively disordered proteins closely mimics the observed denatured states of structured proteins [16, 17]. Past initiatives and efforts in the field of structural biology have proven that disordered proteins are common in various proteomes and their frequency increases with increasing complexity of the organisms [18]. This increased prediction of disorder in eukaryotes compared with the prokaryotes or the archaea has been suggested to be a consequence of the increased need for cell signaling and regulation [19-21]. The functional importance of protein disorder is further emphasized by its role in various signal transduction processes, cell-cycle regulation, gene expression and molecular recognition [2-4, 15]. The widespread prevalence and importance of these proteins has called for re-assessing the classical understanding of protein structure–function paradigm [1 -11, 21].

It has also long been recognized that the formation of protein-protein complexes is probably the most common phenomenon by virtue of which biological function is achieved. In this report we discuss a specialized subset of these protein-protein interactions, ‘Molecular Recognition Elements’ or “Molecular Recognition Features” which are protein regions that specifically participate in protein-protein interactions. Molecular recognition is defined as a process by which biological entities interact with each other or with small molecules, to form specific complexes. In case of proteins, this binding phenomenon enables a proteinaceous complex to participate in specialized activities and mediate select biochemical functions. Important aspects of signaling-related molecular recognition in comparison with other binding events are: (a) the unique combination of high specificity and low affinity; (b) binding diversity in which one region specifically recognizes differently shaped partners by structural accommodation at the binding interface; (c) binding commonality in which multiple, distinct sequences recognize a common binding site (with perhaps different folds). Besides this, another important feature of molecular recognition is that it coincides with more complex and functionally important mechanisms such as protein folding, signal transduction or the formation of multisubunit and supramolecular structures.

Some special examples of molecular recognition have been reported where one or both of the partners are very flexible or wholly disordered prior to binding and their interaction resulted in the formation of a regular structured protein complex. This phenomenon can only be explained by the complex being formed over a huge configurational space via the binding-induced folding. Obviously, in this case each residue constituting this complex is under the influence of numerous attractive and repulsive forces. This may explain why experimental analysis and detection of such a phenomenon is hard to undertake. The complexity of this problem can be gauged in close proximity to that of the problem of protein folding [5, 22].

One illustrative example of functionally important disordered protein that participates in a molecular recognition process is considered below. In 2004 Callaghan et al. [23] showed experimentally and by means of GlobPlot [24] and PONDR [52 - 54] that the C terminal domain of a full-length RNase E was predominantly disordered. Besides being highly enriched in the R, P, G, Q amino acids, this domain also included three isolated stretches that demonstrated increased propensity to be ordered when bound to RNA. It was also proposed that the second stretch amongst these was the one that truly interacts with the RNA.

B. Importance of this subject

The recognition of one protein molecule by another is an important phenomenon in all living systems. Enzymes are a good example of molecular recognition and substrate binding. This selective recognition process lies in the ‘complementary’ nature of the interacting surfaces and was termed initially as the so called ‘lock-and-key’ concept by Fischer more than a hundred years ago [41].

However, a modern view on molecular recognition, called induced fit [42], takes into account that the interacting molecules are flexible and can adapt their shape during the recognition process. Induced fit has been observed experimentally for many protein-ligand interactions.

Our work aims to study proteins participating in molecular recognition. To this end, a dataset of Molecular Recognition Features or Elements (which will be referred to as MoRFs from here onwards) was created and some characteristic features of MoRFs were described. We report and discuss the results from a few qualitative tests (such as amino acid compositions, order-disorder percentages etc.) performed on MoRF dataset. These results are also compared to those from representative disorder and globular protein datasets. We believe that this analysis would help us to better understand the physico-chemical and structural variations between molecular recognition elements and ordinary order – disorder datasets. Any notable differences may allow future characterization and prediction of MoRFs and subsequently improve our understanding of the structural changes that bring about the binding of a MoRF to its macromolecular target. A parallel advantage foreseen from the results of this study would be to have more accurate estimation of protein binding sites within MoRFs. This could ultimately lead to the design and development of a predictor of MoRFs. On the commercial side, we expect all these results can facilitate simpler design of drug compounds that influence the process of molecular recognition.

C. Knowledge Gap

We are now aware of the large body of evidence that supports an idea of functional importance of intrinsic disorder. However, there is an apparent lack of information on the various features and characteristics of MoRFs (such as amino acid compositions, order-disorder predisposition, charge, aromaticity etc). Little is also known about the mechanisms underlying the structural changes in MoRFs during their binding phase. In general, this problem has been difficult to approach experimentally, especially since studying the extremely flexible conformations of MoRFs poses a special challenge [40]. Computational approaches may help solve the problem in such situation. In doing so, we compared the actual bound structures to their inherent structural preferences. For this purpose, we collected MoRFs from PDB and determined their secondary structures in the bound state. It is our belief that the binding of MoRFs to their respective partners after undergoing a disorder-order transition is certainly template-driven and not a chance event.

Background

A. Relevant research

Molecular Recognition Features (MoRFs) are common in various proteomes and occupy a unique structural and functional niche in which function is a direct consequence of intrinsic disorder. The evidence that these intrinsically disordered proteins without a well defined folded structure do exist in vitro and in vivo is compelling and justifies considering them as a separate class within the protein universe. A number of reviews and papers have reported and discussed advances in the rapidly progressing field of intrinsically disordered proteins, with major focus towards gathering evidence for their unfolded nature prior to binding and discussing the functional benefits their malleable structural state provides [1-4, 12-18].

In their unbound form, many intrinsically disordered proteins have been traditionally considered to exist in a random coil state (non-alpha, non-beta conformations maintaining aperiodic phi and psi angles), since their structures closely mimic the unfolded state of globular proteins in the presence of high concentrations of strong denaturants [19- 21]. A closer look at natively unfolded proteins and some MoRFs however reveals this statement not to be correct. To begin with, a true irregular does not exist even under the harshest conditions [46, 47]. Hence, it is not surprising that many MoRFs have been reported to bear traces of residual structure. [12, 17] and upon interaction with their binding partners, MoRFs have the ability to undergo significant induced folding steps or disorder-to-order transition [1, 2, 12]. Such a molecular recognition mechanism, which is coupled to the folding process, has been noted to confer exceptional specificity and versatility [3, 26-28]. All these features explain the prevalence of structural disorder in signaling and regulatory proteins [28]. The interaction of MoRFs with their partners highlights the need and importance of comprehending the mechanism of their induced folding process. Since effective functioning of MoRFs requires fast formation of the folded state [49], their template-induced folding represents a special and interesting case of protein folding. The advantages of this binding mode have been studied in detail in the case of the transcription factor GCN4, where binding strength correlates with α-helicity of its critical DNA-binding segment [50, 51]. It has to be noted, however, that over-stabilization of a secondary structural element can also decrease the rate constant of complex formation, as was shown for the cyclin-dependent kinase inhibitor, p27 (Kip1).

B. Goal to be tested

The goal of this work is to discover signs of inherent secondary structure preferences, if any, in MoRFs prior to binding which could possibly influence their final structure in the ordered complex. In doing so, MoRF sequences will first be assessed by a secondary structure predictor, PHD [34, 35] and then compared to the bound structures.

C. Intended research goal

Our primary goal in this project is to design and develop a database of MoRFs and to study a few types and examples of molecular recognition elements from this database. We also carry out several qualitative tests on this database and compare the results with those from representative disordered (DISPROT [29]) and ordered datasets to look into any physico-chemical differences between their members. Our ultimate goals are: (a) To facilitate future characterization and prediction of MoRFs; (b) To help us have better knowledge about potential binding sites and (c) To gain further insight into the structural changes that bring about the binding of a MoRF to its macromolecular target. We also hope that by doing these analyses we provide a ground for future design of compounds that influence this process. Eventually, the results from these tests will not only help associate disorder with MoRFs but also show that structural disorder observed in MoRFs actually predisposes them for special functional modes, which are either a direct result of their fluctuating conformation or is realized via binding to one or several other proteins in a structurally adaptive process.

Materials & Methods

A. Dataset of MoRFs

Using the Seqres dataset available at the Protein Data Bank (PDB) [30], we collected protein segments shorter than 70 residues, which are bound to other proteins with lengths of 100 residues or more. Our choice for selecting protein chains with lengths less than 70 residues stemmed from the fact that such proteins would be less likely to form self-folding globular units and then interact with other proteins. In other words, such protein chains very likely do not have significant buried surface area and participate in the molecular recognition phenomena by forming parts of larger protein complexes. Using these criteria, we were able to prepare a starting dataset consisting of 2512 protein chains. The PDB files corresponding to these 2512 proteins were downloaded to obtain sequences, secondary structure, and information on Ramachandran’s phi and psi angles. The PDB Seqres dataset houses all the protein sequences available at PDB along with their residues observed in a protein crystal or in solution. These sequences also included residues not present in the crystal model (i.e., disordered, lacking electron density, cloning artifacts, His–tags, etc.). An obvious next step was to get rid of all chains with ambiguous sequence information from our initial working dataset (i.e., sequence containing X or Z annotations instead of real amino acids). We also removed protein chains with 10 or less residues since such short peptides may or may not be specific to larger sequences making the later steps of identifying sequences containing such MoRFs difficult. At the end of all these steps of data preprocessing 1261 chains (approx. 55000 residues with an average chain length of 44.9 residues) were remaining. Further, after removing redundancy amongst these 1261 protein chains our initial dataset gave us 372 non-redundant MoRFs.

Since these putative MoRFs were variable in their lengths we made use of Rost’s formula [31] to dynamically calculate the sequence identity threshold based on each chain’s length. A preliminary study based on the results from the redundancy check step showed that the minimum number of members per family was at least 1 and the maximum number of members for another family was 177 (Thrombin, Alpha-Thrombin). Figure 2 shows the distribution of cluster members within the MoRFs dataset.

[pic]

Using other database references (Swiss-Prot [32], PIR [33], and NCBI [71] listed in the respective PDB files for each of the MoRFs; we were able to extract 301 sequences containing these 372 MoRF chains. All but 53, of the total MoRFs were found to be fragments and parts of larger sequences. A final task after collecting and processing these MoRFs was to design a database for MoRFs. For this task we used MySQL as the backend and Perl scripts to load MoRFs and information about the MoRFs such as secondary structure, binding partner, length etc. Finally using the DSSP program [34], secondary structure assignments for MoRFs was made (results shown in Figure 5). Figure 3 represents several illustrative examples of MoRFs from our dataset.

[pic]

Figure 3: Some examples of complexes between MoRFs and their binding partners. The structures (PDB code in parenthesis) shown are: (a) α-helical MoRF p53 attached to MDM2 (b) extended β-MoRF Grim attached to Apoptosis Inhibitor (c) irregular-MoRF p53 attached to Cyclin A2 (d) Complex– MoRF ovomucoid attached to Trypsin. The structures have been visualized by the Swiss-PDB viewer.

Results

A. MoRFs dataset statistics & length distribution

Table 3 lists the number of MoRFs obtained after each data pre-processing step to reach a final non-redundant working dataset.

| |Number of MoRFs |

|Initial MoRFs obtained using PDB Seqres dataset (July 2004) |2512 |

|Filtering ambiguous data (X,Z), Removal of sequences with less than 10 |1261 |

|residues | |

|Sequence redundancy removal |372 |

Table 3: Number of MoRFs after each data processing step

Analysis of the lengths for all MoRFs showed that as many as two-thirds of these features had lengths between 10 and 20 residues and were relatively short in lengths in comparison to other proteins (Figure 4).

[pic]

B. Secondary Structure Analysis

We used the DSSP [34] program to determine the secondary structure assignments for each of the 372 MoRFs. The DSSP program was designed to standardize protein secondary structure assignments. It accepts as input, a single PDB entry file to assign secondary structure types (viz., helices, sheets and irregular) to each residue of this protein’s sequence.

Results showed that, 27% of this dataset (approximately 9000 residues) had α-helical conformation, 12% were β-sheet residues and approximately 48% of the residues with an irregular structure. The remaining 13% of the residues constituted missing coordinate data from PDB files confirming their disorder structure type. We compared these results with those from a similar size (approx. 9000 residues) control dataset consisting of single chain X-ray structures with a primitive space group (necessarily monomeric). The structures in the control dataset had no missing residues. Results of this analysis are shown in Figure 5.

[pic]

We observe a decreased overall preference for extended – beta conformation in MoRFs. This can be justified by an abundance of hydrophilic side-chains in them [2, 3, and 12]. The most pronounced difference between the secondary structure distributions of bound MoRFs (~12%) and monomeric proteins (~25%) is also seen in the extended - beta structural elements or sheet motifs.

The possibility that interactions with the partner protein influence the native conformational preferences of MoRFs was studied by comparing predicted secondary structure results to the DSSP assignments of the bound structures. We followed up with comparisons between MoRFs’ structure assignments and their predisposition to form particular secondary structure. For this we used the PHD algorithm [34, 35].

The PHD secondary structure algorithm uses a combination of multiple sequence alignments and neural networks to predict secondary structure elements for each residue of a given protein sequence. When a protein is input, this method finds all the homologs and builds a profile using multiple sequence alignment. It then feeds this profile into a series of neural networks to output the predictions.

As mentioned earlier, the goal of this exercise was to test our original hypothesis that protein complex formation influences or modifies the disordered state of MoRFs. To estimate the effect of partner proteins in modifying the inherent structural preferences of MoRFs upon binding, predicted secondary structures have been related to the observed conformations in the bound state. Results obtained from this experiment (Table 4) establish that the inherent secondary structural features of MoRFs were well preserved in their bound state. This is similar to globular proteins, where non-local interactions were found to have negligible effect on the predictability of secondary structures [68]. The most remarkable preference, as seen in the case of helices, predicts a substantial stability for these motifs and points to them as preformed structural elements in the solution state. In contrast, coils can be produced by random sequences almost as well as by the MoRF chains themselves. The correlation of the secondary structure preferences of MoRFs with and without their binding partners can help in future analysis and probing of the role of these structure elements.

|  |α-helix |β-sheet |Irregular |Disorder |

|DSSP (residues) |2469 |1118 |4359 |1147 |

|PHD |H:74%, |H:11%, |H: 21%, |H: 18%, |

| |B: 9%, |B: 55%, |B: 15%, I:64% |B: 10%, |

| |I: 17% |I: 34% | |I:72% |

Results revealed that α–MoRFs were predicted with higher confidence as against β-MoRFs or I-MoRFs. Also a high percentage of the originally assigned disordered residues were predicted to be irregular. Extended conformations can hardly be predicted from MoRF sequences, possibly due to the fact that they are less structurally defined while still in solution and have a tendency to become ordered only upon binding to the partner. As in the case of secondary structure assignment results, these results were also compared with those from our control dataset of monomeric proteins.

A region-wise analysis for different structural types of MoRFs (Table 5 & Figure 6) showed that there were in all 1880 regions of known secondary structures, 269 of which were helical in nature, 381 were sheets. More than half of the total regions or 991 were found to have an irregular conformation. The remaining 239 regions were disordered.

|Region Length |# of Disordered regions|# of Helical |# of Ext. Beta |# of Irregular |

|(in residues) | |regions |regions |regions |

|1 -9 |205 |167 |376 |847 |

|10 -19 |26 |76 |5 |128 |

|20 - 29 |5 |17 |0 |10 |

|30 - 69 |3 |9 |0 |6 |

| |239 |269 |381 |991 |

[pic]

C. Amino acid Composition, Charge & Aromatics in MoRFs

(a) Amino acid compositions

Comparisons between amino acid compositions for monomers and MoRFs show that MoRFs have increased levels of C, R, S, P and K. On the other hand they show decreased content of amino acids important for the formation of strong β-sheets (with low α-helical propensity) such as L, V, F, I, Y and D. (Figure 7(a)).

[pic]

Figure 7: (a) Amino acid composition of MoRFs (MoREs) vs. Monomers

The following histograms (Figure 7(b) & 7(c)) depict the relative composition of all MoRFs as well as the different structural categories of MoRFs with respect to proteins from the control monomeric dataset. It is interesting to note the significant enrichment of Cystine in the MoRF dataset (Figure 7(b)) in general when compared to the control dataset. Figure 7(c) also shows that Cystine seems to be more prevalent in β-MoRFs as compared to their α-helical and irregular counterparts. Based on these results it might be interesting to probe further for the presence of disulfide bonds in MoRF interactions.

[pic]

Figure 7: (b) Relative amino acid composition of MoRFs w.r.t. Monomers: Y axis shows the fractional difference of the amino acid compositions of MoRFs and Monomers i.e., (MoRFs – Monomers)/Monomers

[pic]

Figure 7: (c) Relative amino acid composition of different structural types (α-helical, β, and Irregular) of MoRFs w.r.t. Monomers.

(b) Net & Total charges, Aromatics and Proline Content

[pic]

Figure 8: Total and Net charges, Proline Percentage & Aromatics in MoRFs

Figure 8 displays the comparative results of features such as Total Charge (K + R + D + E), Net Charge (K + R - D – E), Proline composition and Aromatic content (F+W+Y) between MoRFs (MoREs) and monomeric chains. It is interesting to note that despite comparable total charges in both these classes of proteins, MoRFs tend to maintain higher net charge than monomers. This is similar to the case found in disordered proteins [2]. Proline content observed in MoRFs also exceeds the proline percentage found in monomers. This result also motivated us to explore the presence of polyproline II helices in MoRFs in later experiments. MoRFs also show higher proportions of aromatic amino acids unlike monomeric proteins. This can be reasoned well since the side chains of aromatic amino acids tend to make strong and specific interactions [69] and which would be expected to exist in the case of proteins involved in molecular recognition phenomena.

D. Order-Disorder Predictions and Functional classes

Order/Disorder predictions using VL-XT [52-53] and VL3 [54] predictors revealed that as much as 65% disorder was present in sequences containing MoRFs. It was also interesting to note that as many as 30% (2723 residues) of the irregular residues were found to be ordered. This data confirm the hypothesis that the presence of such recognition motifs may be a general feature of disordered proteins. Table 6 lists the percentage distribution of order/disorder with respect to the different secondary structure assignments for the 372 MoRFs.

| |Percent Predicted Disordered |Percent Predicted |

| | |Ordered |

|α – residues |9 |18 |

|β – residues |2 |10 |

|ι – residues |18 |30 |

|PDB Disorder |7 |7 |

Figure 9 shows the distribution of predicted disorder in sequences containing MoRFs using both the predictors VL-XT and VL3.

[pic]

Using the results of previous studies [13, 14] and a number of disorder prediction results from the MoRFs database it was easy to conclude that MoRFs primarily associated with signal transduction, cell-cycle regulation and gene expression and thus may often be implicated in various cancer types [15]. Recent studies have also helped unveil the high incidence and functional importance of disorder–to–order transitions in endocytosis [66] and in RNA- and protein chaperones [67]. The disorder found in these sequences also strongly correlates with the sites of post-translational modification. A parallel PROSITE [37] search using these MoRFs also showed that a third of these contained phosphorylation sites and as many as 14% of them displayed the presence of myrostilation sites.

An important observation from these order-disorder predictions was the coincidence of two of the well known binding regions on p53 (one in the N terminal domain with MDM2 as the binding partner and another in the C-terminal domain with Cyclin A2 as its recognition partner) with dips in VLXT order-disorder plots and presence of disordered regions on either sides (Figure 10). Such examples from the MoRF dataset indicate the possibility of discovering novel binding regions in other proteins containing MoRFs.

[pic]

[pic] [pic]

Also, using Swiss-Prot sequences (201 in number) containing 227 MoRFs, we were able to gather preliminary insights into the general nature and functional classes MoRFs tend to form. Results of this analysis have been discussed in Table 7.

|SW Keyword |Frequency |

|3D-Structure |174 |

|Signal |57 |

|Glycoprotein |41 |

|Transmembrane |37 |

|Alternative Splicing |35 |

|Hydrolase |25 |

|DNA Binding |24 |

|Transcription Regulation |23 |

|Serine Protease Inhibitor |21 |

Table 7: Top 10 Swiss Prot functional classes returned for MoRFs

The higher number of hits for keywords such as “Signal”, “Glycoprotein”, “Transmembrane” and “Alternative Splicing” corresponding to the MoRF dataset suggests that sequences containing MoRFs are more likely to be found involved in signaling processes or have higher than normal likelihoods of being transmembranic in nature or being alternatively spliced. By means of weak associations we could conclude that MoRFs may be found to have similar functional characteristics.

These functional capacities are exploited in many molecular settings and thus making it easy to say that MoRFs may fulfill many different functions. By considering unifying mechanistic details of their various modes of action, one could possibly better understand other novel functions of MoRFs.

E. Presence of polyproline type II hélices & Ramachandran Plot

Using the algorithm from Sreerama et al [56] to calculate the presence of poly proline type II helices, we were able to obtain 53 such peptides (between the lengths of 4 and 12 residues) in the MoRF dataset. The existence of such peptides in this dataset suggests that the extended and rather stiff poly-proline II helix conformation in MoRFs might be an explanation as to why the interaction site is exposed. Also, by extracting phi and psi angles for each of the MoRFs from their respective DSSP outputs, we were able to draw the following Ramachandran plot [70]. The boxed region in the plot indicates the region where the incidence of poly-proline II helices is the highest.

[pic]

Conclusions

Functional disorder has long been noted to be associated with molecular recognition elements (MoRFs) that can bind to RNA, DNA and other protein(s) (or sometimes even smaller ligands). Pertinent to this function is also the success of disorder-based prediction of phosphorylation sites. Furthermore, the function of many, or possibly all, of these MoRFs depends directly on disorder in a way that the disordered segment serves for recognizing, solubilizing or loosening the structure of its binding partner. The multifarious functioning of MoRFs (as in the example of p53 which functions both as α –helical and irregular MoRF; Figures 3 (a) and 3(c)) assumes that the lack of an ordered structure contributes in many ways to their mechanisms of action. In fact, their highly malleable structure endows them with functional features unparalleled by ordered proteins.

Here, in this report novel examples and extensions of MoRFs and their features are presented. Typical advantage of the great conformational freedom of intrinsically disordered proteins or protein fragments is most evident with entropic chains, which may exert a long range, entropic exclusion of other proteins or cellular constituents in spacer functions [57]. Another molecular setting where such regions abound is multidomain proteins, where globular domains are often separated by flexible linkers. These regions facilitate easy orientational search and allows the recognition of distant and/or discontinuous determinants on the target [14]. Fully disordered MoRFs also exploit this unique feature. Their extended structure enables them to contact their partner(s) over a large binding surface for a protein of the given size, which allows the same interaction potential to be realized by shorter proteins overall, encoded by a more economical genome [26]. In addition to these advantages, the flexibility itself is instrumental to the assembly process itself, as certain complexes may not be assembled successfully from rigid components.

Another unique consequence of the structural flexibility of MoRFs is their capacity to adapt to the structure of distinct partners, which enables an exceptional plasticity in cellular responses. An amply characterized case for this behavior is the Cdk inhibitor p21Cip1, which can interact with CycA-Cdk2, CycE-Cdk2, CycD-Cdk4 complexes [58] and apoptosis signal-regulating kinase 1 [59] under different conditions. The open, extended structure of MoRFs also enables an increased speed of interaction. It has been noted that macromolecular association rates are substantially improved by an initial, relatively non-specific association enabled by flexible (disordered) recognition segments, mechanistically formulated in the ‘‘fly-casting’’ [49] method of molecular recognition. Another prominent feature of MoRFs is that their extreme proteolytic sensitivity, in principle, allows for an effective control via rapid turnover. In fact, protein disorder prevails in signaling, regulatory and cancer-associated proteins, and which are known to be short-lived proteins subject to rapid turnover [10, 11]. Furthermore, disorder itself constitutes an integral part of the proteasomal destruction signal in two distinct ways. On the one hand, non-ubiquitinated MoRFs may be directly degraded by the 20S proteasome, as shown for p21Cip1 [60], tau proteins (also known as β-tranferrins and found involved in the Alzheimer’s disease) [61]. On the other hand, this mechanism may also play a more subtle regulatory role, by processing disordered segments in multidomain proteins and releasing the flanking, constitutively activated globular domains due to the endoproteolytic activity of the proteasome [62]. Disorder may also constitute part of the signal to the ubiquitination system itself as the regions of securin and cyclin B recognized by the ubiquitination machinery have been shown recently to be natively unfolded [63].

Discussions

Our observations suggest that MoRFs, in general, do not have to undergo extensive structural rearrangements to adapt to their partner, as their residual structure is germane to their final conformational state. The importance of such structure in the binding process has been proposed for some MoRFs, such as p27 (Kip1), p53 [58] and GCN4 [64].

The function of MoRFs is often realized via the phenomena of molecular recognition, in a process of binding to a protein, RNA or DNA partner via disorder to- order transition [2, 3, 9–11]. Based on this terminology we suggest that the binding process be considered as a special type of protein folding and protein complex formation, since it includes the formation of intermolecular (tertiary structure) contacts between the MoRF and its binding partner and also enables the stabilization of the secondary structure elements. A physiologically effective action of MoRFs requires (i) specific and reversible, interactions with the partner (for activation and deactivation of the whole complex) and (ii) ability to fold quickly. To the analogy of folding models for globular proteins, two mechanisms of the formation of structure of MoRFs have been suggested. One of these mechanisms is that the MoRF is in a completely disordered state prior to binding and makes initial contacts almost anywhere along its sequence randomly. Subsequently, these contact points serve as sites for folding around which the formation of secondary structure elements occurs as dictated by the partner. In such a mechanism, the inherent conformational preferences of the intrinsically disordered protein itself may be overridden by interactions with the partner, resulting in significantly different secondary structure elements in its uncomplexed and bound state. This mechanism could be understood to invoke the a priori formation of long-range interactions that facilitate the formation of subsequent secondary structural elements. The other mechanism involves the early formation of local secondary structure [43]. In this case, the structure of the MoRF is not entirely random and shows features that are also visible in the bound conformation. We believe that transiently or permanently ordered segment(s) present in MoRFs may serve as the binding sites for the partner proteins and around which the protein folds. Based on this, one can hypothesize that a MoRF complex which contains a multitude of contact points for its partner can be considered as a transient state of folding.

Analysis of the distribution of secondary structure elements shows that MoRFs contain more irregular secondary structures, even in the bound state. The abundance of irregular motifs in the bound structures suggests that although their folding may be template-driven, MoRF partners do not impose large constraints on their structure. Helices were found with comparable frequencies in both MoRFs and monomeric proteins, whereas extended or sheet structures are less preferred in MoRFs. The prime cause of this deviation may be attributed to a different amino acid composition of MoRFs with increased levels of C, R, S, P and K and, decreased levels of L, V, F, I, Y and D. Speaking from an evolutionary perspective, evolution in monomeric proteins aims at conserving an amino acid sequence that, after folding, yields a protein with a well-defined function. In the case of MoRFs, evolutionary pressure targeted the conservation of a sequence that initially lacks most signs of regular structure and yet is primed to assume order as soon as it encounters its macromolecular target(s).

The strong conformational preference of MoRFs for helical structural elements also suggests that these structural elements could be temporarily populated while in the non-bound state. In other words, the actual possible conformational space of MoRFs is more limited than expansive, and there is fairly lesser amount of final possible structures. This idea is in perfect accordance with previous reported observations that MoRFs display signs of residual structure. Restricted choice of available conformational states minimizes the entropic costs of binding. Also the higher secondary structure prediction rates of MoRF structures indicates that partner proteins cause minimal disturbance in their pre-existing states. In short, interactions between a MoRF and the contact sites of the partner facilitate decreased enthalpic conditions for the reaction to a great extent, thereby leading to better stabilization of the protein complex.

In summary, MoRFs can be regarded as “mixtures” of segments with strong and weak (negligible) secondary structure preferences. These results extend previous assertions that MoRFs possess structural features pertinent to their partner recognition and function.

| |MoRF PDB ID |MoRF PDB |MoRF |MoRF |MoRF |Db match |MoRE partner |MoRE partner |

| | |Name |Dbref |Db |start - end |start - end |PDB ID |PDB Name |

|1 |1a02f |c-Fos |P01100 |SW |1-53 |140-192 |1ao2a |N-Fat |

|2 |1a0rg |Transducin |P02698 |SW |1-65 |2-66 |1aorp |Phosducin |

|3 |1a1rc |Ns4A Protein |P27958 |SW |1-16 |1678-1693 |1a1rb |Ns3 |

|4 |1a2cl |Alpha-Thrombin |P00734 |SW |2-15 |337-350 |1a2ch |Alpha-Thrombin |

|5 |1a2xb |Troponin I |P02643 |SW |1-31 |4-34 |1a2ca |Troponin C |

|6 |1a6ac |Clip |P04233 |SW |1-15 |103-117 |1a6cb |Hla-Dr3 |

|7 |4aahb |Methanol Dehydrogenase |AAA83766 |GB |1-69 |28-96 |4aaha |Methanol Dehydrogenase |

|8 |1ab9a |Gamma-Chymotrypsin |P00766 |SW |1-10 |1-10 |1a9ad |Gamma-Chymotrypsin |

|9 |1an1i |Tryptase Inhibitor |P80424 |SW |1-40 |2-41 |1an1e |Trypsin |

|10 |1aqdc |Hla-A2 |P01892 |SW |1-14 |128-141 |1aqdd |Hla-Dr1 Class II Histocompatibility Protein |

|11 |1avfp |Gastricsin |P20142 |SW |1-21 |17-37 |1avfa |Gastricsin |

|12 |1avoa |11S Regulator |Q06323 |SW |1-60 |4-63 |1avob |11S Regulator |

|13 |1avpb |Adenoviral Proteinase |P24937 |SW |1-11 |240-250 |1avpa |Adenoviral Proteinase |

|14 |1avzc |Fyn Tyrosine Kinase |P06241 |SW |1-57 |85-141 |1avzb |Negative Factor |

|15 |1axcb |P21/Waf1 |P38936 |SW |1-18 |143-160 |1axcc |Pcna |

|16 |1b0nb |Sini Protein |P23308 |SW |1-31 |9-39 |1bona |Sinr Protein |

|17 |1b33n |Phycobilisome 7.8 Kd Linker Polypeptide |P20116 |SW |1-67 |1-67 |1b33a |Allophycocyanin, Beta Chain |

|18 |1b41b |Fasciculin-2 |P01403 |SW |1-61 |1-61 |1b41a |Acetylcholinesterase |

|19 |1b8hd |DNA Polymerase Fragment |AAA93077 |GB |1-11 |893-903 |1b8hc |DNA Polymerase Processivity Component |

|20 |1be3k |Cytochrome Bc1 Complex |P07552 |SW |1-22 |15-36 |1be3f |Cytochrome Bc1 Complex |

|21 |1bqpb |Lectin |P02867 |SW |1-47 |218-264 |1bqpa |Lectin |

|22 |2btci |Trypsin Inhibitor |P10293 |SW |1-29 |4-32 |2btce |Trypsin |

|23 |1bunb |Beta2-Bungarotoxin |P00989 |SW |1-61 |25-85 |1buna |Beta2-Bungarotoxin |

|24 |1bxlb |Bak Peptide |Q16611 |SW |1-16 |72-87 |1bxla |Bcl-Xl |

|25 |1c04c |Ribosomal Protein L11 |P56210 |SW |1-67 |63-129 |1c04d |Ribosomal Protein L14 |

|26 |1c5wa |Urokinase-Type Plasminogen Activator |P00749 |SW |1-9 |164-172 |1c5wb |Urokinase-Type Plasminogen Activator |

|27 |1c8ob |Ice Inhibitor |P07385 |SW |1-32 |310-341 |1c8oa |Ice Inhibitor |

|28 |1ca9g |Tnf-R2 |P20333 |SW |1-7 |422-428 |1ca9e |Tnf Receptor Associated Factor 2 |

|29 |1ceeb |Wiskott-Aldrich Syndrome Protein Wasp |A55197 |SW |1-59 |230-288 |1ceea |GTP-Binding Rho-Like Protein |

|30 |1cf4b |Activated P21Cdc42Hs Kinase |Q07912 |SW |2-44 |447-489 |1c4fa |Cdc42 Homolog |

|31 |1cffb |Calcium Pump |AAA74511 |GB |1-20 |1100-1119 |1cffa |Calmodulin |

|32 |1clvi |Alpha-Amylase Inhibitor |P80403 |SW |1-32 |1-32 |1clva |Alpha-Amylase |

|33 |1cn3f |Fragment Of Coat Protein Vp2 |P12908 |SW |11-29 |279-297 |1cn3e |Coat Protein Vp1 |

|34 |1cqgb |Ref-1 Peptide |P27695 |SW |1-13 |59-71 |1cqga |Thioredoxin |

|35 |1csba |Cathepsin B |P07858 |SW |1-47 |80-126 |1csbb |Cathepsin B |

|36 |1cvwl |Coagulation Factor Viia (Light Chain) |P08709 |SW |1-55 |150-204 |1cvwh |Coagulation Factor Viia (Heavy Chain) (Des-G |

| | |(Des-Gl | | | | | | |

|37 |1d4tb |Signaling Lymphocytic Act. Molecule |NP_003028 |GB |1-11 |276-286 |1d4ta |T Cell Signal Transduction Molecule Sap |

|38 |1d6ri |Bowman-Birk Proteinase Inhibitor Precursor |P01055 |SW |1-58 |45-102 |1d6ra |Trypsinogen |

|39 |1d8dp |K-Ras4B Peptide Substrate |P01118 |SW |1-11 |178-188 |1d8db |Farnesyltransferase (Beta Subunit) |

|40 |1dd3c |50S Ribosomal Protein L7/L12 |P29396 |SW |1-32 |1-32 |1dd3b | 50S Ribosomal Protein L7/L12 |

|41 |1deeg |Immunoglobulin G Binding Protein A |P02976 |SW |1-51 |103-153 |1deee |Igm Rf 2A2 |

|42 |1devb |Smad Anchor For Receptor Activation |AAC99462 |GB |1-41 |669-709 |1deva |Mad (Mothers Against Decapentaplegic, Drosop |

|43 |1dm0b |Shiga Toxin B Subunit |XVEBBD |GB |1-69 |21-89 |1dm0a |Shiga Toxin A Subunit |

|44 |1dmlb |DNA Polymerase |P07917 |SW |1-36 |1200-1235 |1dmla |DNA Polymerase Processivity Factor |

|45 |1dp5b |Proteinase Inhibitor Ia3 |P01094 |SW |1-29 |2-30 |1dp6a |Fixl Protein |

|46 |1ds5e |Casein Kinase, Beta Chain |P13862 |SW |1-16 |188-203 |1ds5d |Casein Kinase, Alpha Chain |

|47 |1dtdb |Metallocarboxypeptidase Inhibitor |P81511 |SW |1-61 |20-80 |1dtda |Lipase |

|48 |1e0ab |Serine/Threonine-Protein Kinase Pak-Alpha |P35465 |SW |3-46 |75-118 |1e0aa |G25K GTP-Binding Protein, Placental Isoform |

|49 |1e79i |ATP Synthase Epsilon Chain |P05632 |SW |1-47 |2-48 |1e79h |ATP Synthase Delta Chain |

|50 |1eaic |Chymotrypsin/Elastase Isoinhibitor 1 |P07851 |SW |1-61 |1-61 |1eaib |Elastase |

|51 |1ebdc |Dihydrolipoamide Acetyltransferase |P11961 |SW |1-41 |130-170 |1ebdb |Dihydrolipoamide Dehydrogenase |

|52 |1ed3c |Peptide Mtf-E (13N3E) |P05504 |SW |1-13 |29-41 |1ed3d |Class I Major Histocompatibility Antigen Rt1 |

|53 |1ee5b |Nucleoplasmin |P05221 |SW |1-19 |153-171 |1ee6a |Pectate Lyase |

|54 |1eg4p |Beta-Dystroglycan |Q14118 |SW |1-13 |882-894 |1eg4a |Dystrophin |

|55 |1ej4b |Eukaryotic Translation Initiation Factor 4E |NP_004086 |GB |1-14 |51-64 |1ej4a |Eukaryotic Initiation Factor 4E |

| | |B | | | | | | |

|56 |1ejhe |Eukaryotic Initiation Factor 4Gii |AAC02903 |GB |1-14 |622-635 |1ejhc |Eukaryotic Initiation Factor 4E |

|57 |1ejop |Fmdv Peptide |AAA42624 |GB |1-13 |136-148 |1ejoh |Igg2A Monoclonal Antibody (Heavy Chain) |

|58 |1ekba |Enteropeptidase |P98072 |SW |1-7 |788-794 |1ekbb |Enteropeptidase |

|59 |1emub |Adenomatous Polyposis Coli Protein |P25054 |SW |1-16 |2034-2049 |1emua |Axin |

|60 |3erdc |Glucocorticoid Receptor Interacting Protein |AAC53151 |GB |1-12 |687-698 |3erdb |Estrogen Receptor Alpha |

| | |1 | | | | | | |

|61 |1ezvi |Ubiquinol-Cytochrome C Reductase Complex 7.3|P22289 |SW | |4-58 |1ezvx |Heavy Chain (Vh) Of Fv-Fragment |

|62 |1f02t |Translocated Intimin Receptor |AAC38390 |GB |2-66 |272-336 |1f02i |Intimin |

|63 |1f3jp |Lysozyme C |P00698 |SW |1-14 |29-42 |1f3jd | H-2 Class II Histocompatibility Antigen |

|64 |1f47a |Cell Division Protein Zipa |P06138 |SW |1-17 |367-383 |1f47b |Cell Division Protein Ftsz |

|65 |1f4vd |Flagellar Motor Switch Protein |P06974 |SW |1-16 |1-16 |1f4vc |Chemotaxis Chey Protein |

|66 |1f83b |Synaptobrevin-II |P19065 |SW |1-24 |53-76 |1f83a |Botulinum Neurotoxin Type B |

|67 |1f8vd |Mature Capsid Protein Gamma |AAF71693 |GB |1-26 |362-401 |1f8va |Mature Capsid Protein Beta |

|68 |1f93e |Hepatocyte Nuclear Factor 1-Alpha |P22361 |SW |1-31 |1-31 |1f93d |Dimerization Cofactor Of Hepatocyte Nucl |

|69 |1ffkr |Ribosomal Protein L24E |P14116 |SW |1-53 |4-56 |1ffkt |Ribosomal Protein L30 |

|70 |1fi8d |Ecotin |AAA16410 |GB |1-48 |122-169 |1fi8b |Natural Killer Cell Protease 1 |

|71 |1fizl |Beta-Acrosin Light Chain |P08001 |SW |1-13 |20-32 |1fiza |Beta-Acrosin Heavy Chain |

|72 |1fjgn |30S Ribosomal Protein S14 |P24320 |SW |1-60 |1-60 |1fjgm |30S Ribosomal Protein S13 |

|73 |1fjgv |30S Ribosomal Protein Thx |P32193 |SW |1-24 |2-25 |1fjgt |30S Ribosomal Protein S20 |

|74 |1fjsl |Coagulation Factor Xa |P00742 |SW |1-52 |127-178 |1fjsa |Coagulation Factor Xa |

|75 |1flei |Elafin |P19957 |SW |1-47 |71-117 |1flee | Elastase |

|76 |1fqjc |Retinal Rod Rhodopsin-Sensitive Cgmp 3',5'- |P04972 |SW |1-38 |50-87 |1fqjd |Chimera Of Guanine Nucleotide-Binding Protei |

| | |C | | | | | | |

|77 |1fs1a |Cyclin A/Cdk2-Associated P19 |AAC50242 |GB |1-41 |109-149 |1fs1d |Cyclin A/Cdk2-Associated P45 |

|78 |1fv1c |Myelin Basic Protein |AAC41944 |GB |1-20 |111-130 |1fv1d |Major Histocompatibility Complex Alpha Chain |

|79 |1g3jb |Tcf3-Cbd (Catenin Binding Domain) |CAA67686 |GB |1-41 |2-52 |1g3jc |Beta-Catenin Armadillo Repeat Region |

|80 |1g5jb |Bad Protein |AAA64465 |GB |1-24 |140-163 |1g5ja |Apoptosis Regulator Bcl-X |

|81 |1g9ii |Bowman-Birk Type Trypsin Inhibitor |P01062 |SW |1-22 |10-31 |1g9ie |Trypsinogen, Cationic |

|82 |1gff3 |Bacteriophage G4 Capsid Proteins Gpf, Gpg, |P03652 |SW |1-12 |14-25 |1gff2 |Bacteriophage G4 Capsid Proteins Gpf, Gpg, G |

| | |Gp | | | | | | |

|83 |1gg6a |Gamma Chymotrypsin |P00766 |SW |1-10 |1-10 |1gg6b |Gamma Chymotrypsin |

|84 |1gl0i |Protease Inhibitor Lcmi I |P80060 |SW |1-32 |22-53 |1gl0e |Chymotrypsinogen A |

|85 |1gl1i |Protease Inhibitor Lcmi II |P80060 |SW |1-34 |58-91 |1glle |Alpha-Chymotrypsin |

|86 |1gngx |Frattide |Q92837 |SW |1-26 |198-223 |1gngb |Glycogen Synthase Kinase-3 Beta |

|87 |1h15c |DNA Polymerase |P03198 |SW |1-14 |628-641 |1h15d |Hla Class II Histocompatibility Antigen |

|88 |1h2ls |Hypoxia-Inducible Factor 1 Alpha |Q16665 |SW |1-22 |795-822 |1h21a |Factor Inhibiting Hif1 |

|89 |1h2sb |Sensory Rhodopsin II Transducer |P42259 |SW |1-60 |23-82 |1h2sa |Sensory Rhodopsin II |

|90 |1h25e |Retinoblastoma-Associated Protein |P06400 |SW |1-10 |869-879 |1h25a |Cyclin A2 |

|91 |1h26e |Cellular Tumor Antigen P53 |P04637 |SW |1-9 |378-386 |1h26d |Cyclin A2 |

|92 |1h28e |Retinoblastoma-Like Protein 1 |P28749 |SW |1-10 |654-663 |1h28d |Cyclin A2 |

|93 |1h27e |Cyclin-Dependent Kinase Inhibitor 1B |P46527 |SW |1-6 |25-35 |  |Cyclin A2 |

|94 |1h6ep |Cytotoxic T-Lymphocyte Protein 4 |P16410 |SW |1-10 |197-206 |1h6ea |Clathrin Coat Assembly Protein Ap50 |

|95 |1h6wb |Bacteriophage T4 Short Tail Fibre |P10930 |SW |1-10 |518-527 |1h6wa |Bacteriophage T4 Short Tail Fibre |

|96 |1h89a |Caat/Enhancer Binding Protein Beta |P17676 |SW |1-64 |273-336 |1h89c |Myb Proto-Oncogene Protein |

|97 |1hesp |P-Selectin Peptide |P16109 |SW |1-9 |814-822 |1hesa |Clathrin Coat Assembly Protein Ap50 |

|98 |1hqrc |Myelin Basic Protein |AAC41944 |GB |1-10 |114-123 |1hqrb |Hla-Dr Beta Chain |

|99 |1hr8o |Cytochrome C Oxidase Polypeptide IV |P04037 |SW |1-13 |7-19 |1hr8h |Mitochondrial Processing Peptidase Beta Subunit|

|100 |1hxei |Hirudin Variant-1 |P01050 |SW |1-10 |55-64 |1hxeh |Thrombin |

|101 |1i4fc |Melanoma-Associated Antigen 4 |P43358 |SW |1-10 |230-239 |1i4fb |Beta-2-Microglobulin |

|102 |1i72b |S-Adenosylmethionine Decarboxylase Beta |P17707 |SW |1-61 |4-67 |1i72a |S-Adenosylmethionine Decarboxylase Alpha Chain |

| | |Chain | | | | | | |

|103 |1iakp |Mhc Class II I-Ak |P24364 |SW |1-13 |50-62 |1iakb |Mhc Class II I-Ak |

|104 |1icfb |Cathepsin L: Light Chain |P07711 |SW |1-42 |292-333 |1icfc |Cathepsin L: Heavy Chain |

|105 |1ik9c |DNA Ligase IV |XP_007098 |GB |1-28 |755-782 |1ik9b |DNA Repair Protein Xrcc4 |

|106 |1iq5b |Ca2+/Calmodulin Dependent Kinase Kinase |T37317 |PIR |1-24 |334-357 |1iq5b |Calmodulin |

|107 |1iq1a |Importin Alpha-2 Subunit |P52293 |SW |1-7 |47-53 |1iq1c |Importin Alpha-2 Subunit |

|108 |1ivoc |Epidermal Growth Factor |P01133 |SW |1-47 |975-1021 |1ivob |Epidermal Growth Factor Receptor |

|109 |1ixsa |Holliday Junction DNA Helicase Ruva |Q9F1Q3 |SW |1-50 |142-191 |1ixsb |Ruvb |

|110 |1izlf |Photosystem II: Subunit Psbf |NP_682332 |GB |1-30 |14-43 |1izld |Photosystem II: Subunit Psbd |

|111 |1izlk |Photosystem II: Subunit Psbk |Q9F1K9 |SW |1-27 |14-40 |1izld |Photosystem II: Subunit Psbd |

|112 |1j5am |Ribosomal Protein L32 |P49228 |SW |1-58 |2-59 |1j5al | Ribosomal Protein L22 |

|113 |1jacb |Jacalin |P18671 |SW |1-15 |4-18 |1jaca |Jacalin |

|114 |1jb0i |Photosystem 1 Reaction Centre Subunit Viii |P25900 |SW |1-38 |1-38 |1jb0f |Photosystem 1 Reaction Centre Subunit III |

|115 |1jb0m |Photosystem 1 Reaction Centre Subunit Xii |P25903 |SW |1-31 |1-31 |1jb0l |Photosystem 1 Reaction Centre Subunit Xi |

|116 |1jd5b |Cell Death Protein Grim |AAC47727 |GB |1-8 |2-9 |1jd5a |Apoptosis 1 Inhibitor |

|117 |1jd6b |Head Involution Defective Protein |AAA79985 |GB |1-8 |2-9 |1jd6a |Apoptosis 1 Inhibitor |

|118 |1jdph |C-Type Natriuretic Peptide |P23582 |SW |1-18 |109-126 |1jdpb |Atrial Natriuretic Peptide Clearance Recepto |

|119 |1jjoa |Neuroserpin |O35684 |SW |1-40 |25-64 |1jjoc |Neuroserpin |

|120 |1jjoe |Neuroserpin |O35684 |SW |1-31 |367-397 |1jjoc |Neuroserpin |

|121 |1jk8c |Insulin B Peptide |CAA08766 |GB |1-13 |35-47 |1jk8b |Mhc Class II Hla-Dq8 |

|122 |1jkyb |Mitogen-Activated Protein Kinase Kinase 2 |NP_109587 |GB |1-16 |1-16 |1jkya |Lethal Factor |

|123 |1jmtb |Splicing Factor U2Af 65 Kda Subunit |P26368 |SW |1-23 |90-112 |1jmta |Splicing Factor U2Af 35 Kda Subunit |

|124 |1jmua |Protein Mu-1 |AAA47236 |GB |1-33 |10-42 |1jmub |Protein Mu-1 |

|125 |1jotb |Agglutinin |P18676 |SW |1-16 |2-17 |1jota |Agglutinin |

|126 |1jple |Cation-Independent Mannose 6-Phosphate |P11717 |SW |1-8 |2484-2491 |1jplb |ADP-Ribosylation Factor Binding Protein |

| | |Recept | | | | | | |

|127 |1jppc |Adenomatous Polyposis Coli Protein |P25054 |SW |1-11 |1021-1031 |1jppb |Beta-Catenin |

|128 |1jqsb |Elongation Factor G |P13551 |SW |1-32 |220-251 |1jqra |DNA Polymerase Beta-Like |

|129 |1ju5b |Crk |Q64010 |SW |1-12 |217-228 |1ju5a |Crk |

|130 |1jwgc |Cation-Independent Mannose-6-Phosphate |P11717 |SW |1-7 |2485-2491 |1jwgb |ADP-Ribosylation Factor Binding Protein Gga1 |

| | |Recept | | | | | | |

|131 |1k2dp |Myelin Basic Protein Peptide With 8 Residue |XP_040888 |GB |1-8 |2-9 |1k2db |H-2 Class II Histocompatibility Antigen |

|132 |1k3ab |Insulin Receptor Substrate 1 |P35568 |SW |1-8 |894-901 |1k3aa |Insulin-Like Growth Factor 1 Receptor |

|133 |1k3bc |Dipeptydil-Peptidase I Heavy Chain |P53634 |SW |1-69 |395-463 |1k3bb |Dipeptydil-Peptidase I Light Chain |

|134 |1k4wb |Steroid Receptor Coactivator-1 |AAB50242 |GB |1-10 |687-696 |1k4wa |Nuclear Receptor Ror-Beta |

|135 |1ka7b |Peptide N-Y-C |Q13291 |SW |1-12 |275-286 |1ka7a |Sh2 Domain Protein 1A |

|136 |1kcrp |Ps1 Peptide |AAK97192 |GB |1-15 |17-31 |1kcrh |Pc283 Immunoglobulin |

|137 |1kjyb |Regulator Of G-Protein Signaling 14 |O08773 |SW |1-35 |496-530 |1kjyc |Guanine Nucleotide-Binding Protein G(I) |

|138 |1kkqe |Nuclear Receptor Co-Repressor 2 |Q9Y618 |SW |1-19 |2339-2357 |1kkqd |Peroxisome Proliferator Activated Receptor |

|139 |1ko6b |Nuclear Pore Complex Protein Nup98 |P52948 |SW |1-6 |882-887 |1ko6a |Nuclear Pore Complex Protein Nup98 |

|140 |1ky7p |Amphiphysin |P49418 |SW |1-9 |322-330 |1ky7a |Alpha-Adaptin C |

|141 |1kzzb |Traf Family Member-Associated Nf-Kappa-B |Q92844 |SW |1-11 |177-187 |1kzza |Tnf Receptor Associated Factor 3 |

| | |Acti | | | | | | |

|142 |1l0ab |Traf Family Member-Associated Nf-Kappa-B |Q92844 |SW |1-17 |178-194 |1l0aa |Tnf Receptor Associated Factor 3 |

| | |Acti | | | | | | |

|143 |1l2wi |Outer Membrane Virulence Protein Yope |P08008 |SW |1-57 |22-78 |1l2wh |Yope Regulator |

|144 |1ld4e |General Control Protein Gcn4 |P03069 |SW |1-28 |250-277 |1ld4d |Coat Protein C |

|145 |1lezb |Map Kinase Kinase 3B |AAB40652 |GB |1-8 |22-29 |1leza |Mitogen-Activated Protein Kinase 14 |

|146 |1lj2c |Eukaryotic Protein Synthesis Initiation |AAC82471 |GB |1-22 |138-159 |1lj2a |Nonstructural RNA-Binding Protein 34 |

| | |Facto | | | | | | |

|147 |1lq8b |Plasma Serine Protease Inhibitor |P05154 |SW |1-29 |378-406 |1lq8c | Plasma Serine Protease Inhibitor |

|148 |1lt9a |Fibrinogen Alpha/Alpha-E Chain |P02671 |SW |1-65 |145-209 |1lt9b |Fibrinogen Beta Chain |

|149 |1lvbc |Oligopeptide Substrate For The Protease |P04517 |SW |1-10 |2785-2794 |1lvba |Catalytic Domain Of The Nuclear Inclusio |

|150 |1lw6i |Subtilisin-Chymotrypsin Inhibitor-2A |P01053 |SW |1-63 |21-83 |1lw6e |Subtilisin Bpn |

|151 |1m26b |Jacalin, Beta Chain |AAA32678 |GB |1-15 |64-78 |1m26c |Jacalin, Alpha Chain |

|152 |1m2zb |Nuclear Receptor Coactivator 2 |Q15596 |SW |1-21 |734-754 |1m2zd |Glucocorticoid Receptor |

|153 |1mk7a |Integrin Beta3 |AAA67537 |GB |3-13 |739-749 |1mk7b |Talin |

|154 |1moxc |Transforming Growth Factor Alpha |P01135 |SW |1-49 |41-89 |1moxb |Epidermal Growth Factor Receptor |

|155 |1mtpb |Serine Proteinase Inhibitor (Serpin), Chain |ZP_00059457 |GB |1-35 |385-419 |1mtpa |Serine Proteinase Inhibitor (Serpin), Chain |

| | |B | | | | | | |

|156 |1mvup |P-Glycoprotein |AAA37004 |GB |1-13 |1210-1222 |1mvub |Ig Vdj-Region (Heavy Chain) |

|157 |1mxee |Target Sequence Of Rat Calmodulin-Dependent |Q63450 |SW |1-25 |294-318 |1mxeb |Calmodulin |

| | |Protein Kinase I | | | | | | |

|158 |1mzwb |U4/U6 Snrnp 60Kda Protein |O43172 |SW |1-31 |107-137 |1mzwa |U-Snrnp-Associated Cyclophilin |

|159 |1n12b |Peptide Corresponding To The N-Terminal |P42190 |SW |1-11 |22-32 |1n12a |Mature Fimbrial Protein Pape |

| | |Exten | | | | | | |

|160 |1n13a |Pyruvoyl-Dependent Arginine Decarboxylase |Q57764 |SW |1-46 |7-52 |1n13b |Pyruvoyl-Dependent Arginine Decarboxylas |

| | |Bet | | | | | | |

|161 |1n2dc |Iq2 and Iq3 Motifs From Myo2P, A Class V |P19524 |SW |1-48 |806-853 |1n2db |Myosin Light Chain |

| | |Myos | | | | | | |

|162 |1nhdc |Enoyl-Acyl Carrier Reductase |AAK25802 |GB |1-60 |366-425 |1nhdb |Enoyl-Acyl Carrier Reductase |

|163 |1nh2b |Transcription Initiation Factor Iia Large |P32773 |SW |1-46 |3-48 |1nh2a |Transcription Initiation Factor Tfiid |

| | |Chain | | | | | | |

|164 |1niwb |Nitric-Oxide Synthase, Endothelial |P29474 |SW |1-19 |492-510 |1niwc |Calmodulin |

|165 |2nlla |Retinoic Acid Receptor |P19793 |SW |1-66 |135-200 |2nllb |Thyroid Hormone Receptor |

|166 |1nq7b |Steroid Receptor Coactivator-1 |AAB50242 |GB |1-10 |687-696 |1nq7a |Nuclear Receptor Ror-Beta |

|167 |1nrlc |Nuclear Receptor Coactivator 1 Isoform 3 |NP_671766 |GB |1-15 |682-696 |1nrlb |Orphan Nuclear Receptor Pxr |

|168 |1nwdb |Glutamate Decarboxylase |Q07346 |SW |3-28 |470-495 |1nwda |Calmodulin |

|169 |1nx0c |Calpastatin |P49342 |SW |1-11 |230-240 |1nw0b |Calcium-Dependent Protease, Small Subunit |

|170 |1nx1c |Calpastatin |P49342 |SW |1-11 |230-240 |1nx1b |Calcium-Dependent Protease, Small Subunit |

|171 |1occm |Cytochrome C Oxidase |P10175 |SW |1-43 |25-67 |1occn |Cytochrome C Oxidase |

|172 |1oc0b |Vitronectin |P04004 |SW |1-37 |22-58 |1oc0a |Plasminogen Activator Inhibitor-1 |

|173 |1oj5b |Signal Transducer and Activator Of |P42226 |SW |1-14 |795-808 |1oj5a |Steroid Receptor Coactivator 1A |

| | |Transcript | | | | | | |

|174 |1ol5b |Restricted Expression Proliferation |Q9ULW0 |SW |1-30 |7-43 |1ol5a |Serine/Threonine Kinase 6 |

| | |Associate | | | | | | |

|175 |1opib |Splicing Factor Sf1 |CAA03883 |GB |1-13 |13-25 |1opia |Splicing Factor U2Af 65 Kda Subunit |

|176 |1oryb |Flagellin |O67803 |SW |1-40 |479-518 |1orya |Flagellar Protein Flis |

|177 |1osvc |Nuclear Receptor Coactivator 2 |Q61026 |SW |1-12 |741-752 |1osvb |Bile Acid Receptor |

|178 |1ov3c |Flavocytochrome B558 Alpha Polypeptide |NP_000092 |GB |1-11 |150-160 |1ov3b |Neutrophil Cytosol Factor 1 |

|179 |1oxgb |Chymotrypsinogen A |P00766 |SW |1-14 |16-29 |1oxga |Chymotrypsinogen A |

|180 |1pd0b |Copii-Binding Peptide Of The Integral |Q01590 |SW |1-10 |201-210 |1pd0a |Protein Transport Protein Sec24 |

| | |Membran | | | | | | |

|181 |1pegp |Histone H3 |P02303 |SW |1-7 |8-14 |1pega |Histone H3 Methyltransferase Dim-5 |

|182 |1pjma |Retinoblastoma-Associated Protein |P06400 |SW |3-19 |860-876 |1pjmb |Importin Alpha-2 Subunit |

|183 |1pjna |Histone-Binding Protein N1/N2 |P06180 |SW |1-20 |532-552 |1pjnb |Importin Alpha-2 Subunit |

|184 |1pu9b |Histone H3 |P02303 |SW |1-15 |8-22 |1pu9b |Hat A1 |

|185 |1pyua |Aspartate 1-Decarboxylase Beta Chain |P31664 |SW |1-24 |1-24 |1pyub |Aspartate 1-Decarboxylase Alfa Chain |

|186 |1qd6a |Outer Membrane Phospholipase (Ompla) |P00631 |SW |1-13 |33-45 |1qd6c |Outer Membrane Phospholipase (Ompla) |

|187 |1qgc5 |Gh-Loop From Virus Capsid Protein Vp1 |AAA42665 |GB |1-24 |133-156 |1qgc4 |Inmunoglobuline |

|188 |1qgkb |Importin Alpha-2 Subunit |P52292 |SW |1-44 |11-54 |1qgka |Importin Beta Subunit |

|189 |1qled |Ccytochrome C Oxidase |P77921 |SW |1-43 |8-49 |1qlec |Cytochrome C Oxidase Polypeptide III |

|190 |1qsnb |Histone H3 |P02303 |SW |1-11 |10-20 |1qsna |Tgcn5 Histone Acetyl Transferase |

|191 |1r0tb |Ovomucoid |P01004 |SW |1-62 |65-126 |1r0ta |Trypsin |

|192 |1r1rd |Ribonucleotide Reductase R2 Protein |P00453 |SW |1-16 |361-375 |1rlrb |Ribonucleotide Reductase R1 Protein |

|193 |1r2bc |Nuclear Receptor Co-Repressor 2 |Q9Y618 |SW |1-17 |1414-1430 |1r2bb |B-Cell Lymphoma 6 Protein |

|194 |1rk3c |Peroxisome Proliferator-Activated Receptor |Q15648 |SW |1-11 |640-650 |1rk3a |Vitamin D3 Receptor |

| | |Bi | | | | | | |

|195 |1rk8c |Within The Bgcn Gene Intron Protein |P82804 |SW |1-33 |3-35 |1rk8b |Mago Nashi Protein |

|196 |1rxzb |Flap Structure-Specific Endonuclease |O29975 |SW |1-11 |326-336 |1rxza |DNA Polymerase Sliding Clamp |

|197 |1ry1s |Rhodopsin |O62798 |SW |1-18 |50-67 |1ry1u |Signal Recognition Particle Protein |

|198 |2sebe |Peptide From Collagen II |P02458 |SW |2-11 |1169-1178 |2sebd |Enterotoxin Type B |

|199 |1tbaa |Transcription Initiation Factor Iid 230K |A47371 |PIR |1-67 |11-77 |1tbab | Transcription Initiation Factor Tfiid |

| | |Chai | | | | | | |

|200 |1tfxc |Tissue Factor Pathway Inhibitor |P10646 |SW |1-58 |121-178 |1tfxb |Trypsin |

|201 |1tiic |Heat Labile Enterotoxin Type Iib |P43528 |SW |1-36 |215-250 |1tiia |Heat Labile Enterotoxin Type Iib |

|202 |1un0c |Nucleoporin Nup2 |P32499 |SW |1-16 |36-51 |1un0b |Importin Alpha Subunit |

|203 |1ur6b |Potential Transcriptional Repressor Not4Hp |O95628 |SW |1-52 |12-63 |1ur6a |Ubiquitin-Conjugating Enzyme E2-17 Kda 2 |

|204 |1viti |Hirudin |P28507 |SW |1-15 |51-65 |1vith |Alpha Thrombin |

|205 |1ycqb |P53 |AAA59989 |GB |1-11 |17-27 |1ycqa |Mdm2 |

|206 |1iwqb |Marcks |P26645 |SW |1-18 |148-165 |1iwqa |Calmodulin |

|207 |1m46b |Iq4 Motif From Myo2P, A Class V Myosin |P19524 |SW |1-25 |854-878 |1m46a |Myosin Light Chain |

|208 |1m5nq |Parathyroid Hormone-Related Protein |P12272 |SW |1-28 |103-130 |1m5ns |Importin Beta-1 Subunit |

|209 |1m93a |Serine Proteinase Inhibitor 2 |P07385 |SW |1-46 |1-46 |1m93b |Serine Proteinase Inhibitor 2 |

|210 |1mqsb |Integral Membrane Protein Sed5 |Q01590 |SW |1-21 |1-21 |1mqsa |Sly1 Protein |

|211 |1n0wb |Breast Cancer Type 2 Susceptibility Protein |P51587 |SW |1-33 |1519-1551 |1n0wa |DNA Repair Protein Rad51 Homolog 1 |

|212 |1n4mc |Transcription Factor E2F2 |Q14209 |SW |1-18 |410-427 |1n4mb |Retinoblastoma Pocket |

|213 |1o6kc |Glycogen Synthase Kinase-3 Beta |P49841 |SW |1-10 |3-12 |1o6ka |Rac-Beta Serine/Threonine Protein Kinase |

|214 |1n64p |Genome Polyprotein Capsid Protein C |P29846 |SW |1-16 |25-40 |1n64h |Fab 19D9D6 Heavy Chain |

|215 |1j19b |16-Mer Peptide From Intercellular Adhesion |P35330 |SW |1-16 |253-268 |1j19a |Radixin |

| | |Mo | | | | | | |

|216 |1o9kp |E2F-1 Transcription Factor |Q01094 |SW |1-18 |409-426 |1o9kh |Retinoblastoma Tumour Suppressor Protein |

|217 |1o9ub |Axin Peptide |O15169 |SW |1-18 |383-400 |1o9ua |Glycogen Synthase Kinase-3 Beta |

|218 |1j2jb |ADP-Ribosylation Factor Binding Protein Gga1|Q9UJY5 |SW |1-41 |168-208 |1j2ja |ADP-Ribosylation Factor 1 |

|219 |1nvpb |Transcription Initiation Factor Iia Alpha |P52655 |SW |1-43 |9-51 |1nvpa |Tata Box Binding Protein |

| | |Cha | | | | | | |

|220 |1oqdk |Tumor Necrosis Factor Receptor Superfamily |Q02223 |SW |1-39 |8-46 |1oqdj |Tumor Necrosis Factor Ligand Superfamily Member|

| | |Meber | | | | | | |

|221 |1oqek |Tumor Necrosis Factor Receptor Superfamily |Q96RJ3 |SW |1-31 |16-46 |1oqej |Tumor Necrosis Factor Ligand Superfamily Member|

| | |Member | | | | | | |

|222 |1p4qa |Cbp/P300-Interacting Transactivator 2 |Q99967 |SW |1-52 |193-259 |1p4qb |E1A-Associated Protein P300 |

|223 |1p4ub |Rabaptin-5 |Q15276 |SW |1-6 |440-445 |1p4ua |ADP-Ribosylation Factor Binding Protein Gga3 |

|224 |1ozbi |Preprotein Translocase Seca Subunit |P43803 |SW |1-24 |876-899 |1ozbh |Protein-Export Protein Secb |

|225 |1p93e |Nuclear Receptor Coactivator 2 |Q15596 |SW |1-9 |743-751 |1p93b |Glucocorticoid Receptor |

|226 |1pq1b |Bcl2-Like Protein 11 |AAC40030 |GB |1-33 |83-115 |1pqla |Apoptosis Regulator Bcl-X |

|227 |1q1ta |Large T Antigen |P03070 |SW |1-7 |127-133 |1qltc |Importin Alpha-2 Subunit |

|228 |1ujjc |C-Terminal Peptide From Beta-Secretase |P56817 |SW |1-7 |495-501 |1ujjb |ADP-Ribosylation Factor Binding Protein Gga1 |

|229 |1q4qk |Nedd2-Like Caspase Cg8091-Pa |NP_524017 |GB |1-8 |115-122 |1q4qj |Apoptosis 1 Inhibitor |

|230 |1q90r |Cytochrome B6-F Complex Iron-Sulfur Subunit |P49728 |SW |1-39 |33-71 |1q90d |Cytochrome B6-F Complex Subunit 4 |

|231 |1r4ae |Golgi Autoantigen, Golgin Subfamily A Member|Q13439 |SW |1-51 |2172-2222 |1r4aa |ADP-Ribosylation Factor-Like Protein 1 |

|232 |1a38p |R18 Peptide (Phcvprdlswldleanmclp) |NF00163012 |PIR |1 - 20 |1 - 20 |1a38b |14-3-3 Protein Zeta |

|233 |1aqcc |Peptide |O73683 |SW |1 – 10 |766 - 775 |1aqcb |X11 |

|234 |1awip |L-Pro10 |Q9JJN2 |SW |1- 10 |3098 - 3017, 3099 - |1awib |Profilin |

| | | | | | |3018 | | |

|235 |1aym4 |Human Rhinovirus 16 Coat Protein |Q8JNV2 |SW |1 – 67 |2 - 69 |1aym3 |Human Rhinovirus 16 Coat Protein |

|236 |1biip |Decameric Peptide |CAB58569 |EMBL |1 - 10 |459 - 468 |1biib |Beta-2 Microglobulin |

|237 |1bjri |Lactoferrin |NF00375716 |PIR |1 - 10 |1 - 10 |1bjre |Proteinase K |

|238 |1bogc |Peptide |NF00514021 |PIR |1 - 10 |1 - 10 |1bogb |Antibody (Cb 4-1) |

|239 |1c9pb |Bdellastasin |P82107 |SW |1 - 59 |1 - 59 |1c9pa |Trypsin |

|240 |1cqti |Pou Domain, Class 2, Associating Factor 1 |NF00110208 |PIR |1 - 44 |1 - 44 |1cqtb | Pou Domain, Class 2, Transcription Factor 1 |

|241 |1dxpc | Nonstructural Protein Ns4A (P4) |NF00235394 |PIR |1 - 16 |1 - 16 |1dxpb |Protease/Helicase Ns3 (P70) |

|242 |1e0fi |Haemadin |Q25163 |SQ |1- 45 |21 - 65 |1eoff | Thrombin |

|243 |1eb1a |Peptide Inhibitor |NF00866356 |PIR |1 - 10 |1 - 10 |1eb1h |Thrombin Heavy Chain |

|244 |1ebpc |Epo Mimetics Peptide 1 |CAD13109 |EMBL |1 – 19 |234 - 253 |1epbh |Epo Receptor |

|245 |2h1pp |Pa1 |NF00522862 |PIR |1 – 12 |1 -12 |2h1ph |2H1 |

|246 |1heze |Protein L |NF00429845 |PIR |1 – 60 |1 - 60 |1hezd |Heavy Chain Of Ig |

|247 |1hh6c |Pep-4 |NF00505422 |PIR |1 – 11 |1 -11 |1hh6b |Igg2A Kappa Antibody Cb41 (Heavy Chain) |

|248 |2hrpp |Hiv-1 Protease Peptide |Q8ADZ9 |SW |1 - 10 |524 - 533 |2hrpm |Monoclonal Antibody F11.2.32 |

|249 |1ir3b |Peptide Substrate |NF00103224 |PIR |1 – 18 |1 - 18 |1ir3a |Insulin Receptor |

|250 |1juqe |Cation-Dependent Mannose-6-Phosphate |P20645 |SW |1 - 10 |265 - 277 |1juqa |ADP-Ribosylation Factor Binding Protein |

| | |Receptor | | | | | | |

|251 |1ohzb |Endo-1,4-Beta-Xylanase Y |P16218 |SW |2 - 56 |832 - 891 |1ohza |Cellulosomal Scaffolding Protein A |

|252 |1qnzp |Gp120 |CAA00727 |EMBL |1 -16 |316 - 333 |1qnzh |0.5B Antibody (Heavy Chain) |

|253 |1rkcb |Talin |Q8AWI0 |SW |1 -26 |1944 - 1969 |1rkca |Vinculin |

|254 |1rsup |Strep-Tag II Peptide |CAC22716 |EMBL |1 - 10 |650 - 659 |1rsub |Streptavidin |

|255 |1rtfa |Two Chain Tissue Plasminogen Activator |NF00107636 |PIR |1 - 17 |1 - 17 |1rtfb |Two Chain Tissue Plasminogen Activator |

|256 |1sfii |Sfti-1 |NF00227071 |PIR |1 - 14 |1 - 14 |1sfia |Trypsin |

|257 |1sm3p |Peptide Epitope |NP_877418 |GB |1 – 13 |179 - 191 |1sm3h |Sm3 Antibody |

|258 |1upkb |Ste-20 Related Adaptor |GI:33303905 |GB |1 - 10 |391 - 402 |1upka |Mo25 Protein |

|259 |1uvqc |Orexin |NF01572587 |PIR |1 - 33 | 32 - 63 |1uvqb |Hla Class II Histocompatibility Antigen |

|260 |1x11c |13-Mer Peptide |O73683 |SW |1 - 13 |764 - 776 |1x11a |X11 |

|261 |1n4pm |Fusion Protein Consisting Of Transforming |NF01479846 |PIR |1 - 11 |1 - 11 |1n4pl |Geranyltransferase Type-I Beta Subunit |

| | |Pro | | | | | | |

|262 |1acyp |Igg1 Fab Fragment (59.1) Complexed With |NF00927552 |PIR |2 -25 |1 - 23 |1acyh |Igg1 Fab Fragment (59.1) Complexed With Hiv- |

| | |Hiv-1 | | | | | | |

|263 |2achb |Alpha1 Antichymotrypsin - Chain B |Q9UNU9 |SW |1 - 40 |368 - 407 |2acha |Alpha1 Antichymotrypsin - Chain A |

|264 |1apmi |c-AMP-Dependent Protein Kinase (E.C. |GI:530223 |GB |1 -20 |7 - 26 |1apme |c-AMP-Dependent Protein Kinase (E.C. 2.7.1.3 |

| | |2.7.1.37) | | | | | | |

|265 |2ap2e |C-Myc Tag and His Tag |GI:28474948 |GB |1 -17 |364 - 380 |2ap2d |Antibody (Heavy Chain) |

|266 |7apib | Modified Alpha=1=-Antitrypsin (Modified |GI:22207050 |GB |1 -36 |613 - 648 |7apia |Modified Alpha=1=-Antitrypsin (Modified Alph |

| | |Alpha | | | | | | |

|267 |1ayap |Tyrosine Phosphatase Syp (N-Terminal Sh2 |GI:189730 |GB |1 - 11 |1006 - 1016 |1ayaa |Tyrosine Phosphatase Syp (N-Terminal Sh2 |

| | |Domain) | | | | | |Domain) |

|268 |1aybp |Tyrosine Phosphatase Syp (N-Terminal Sh2 |Q28224 |SW |1 - 12 |901 - 912 |1ayba |Tyrosine Phosphatase Syp (N-Terminal Sh2 |

| | |Domain) | | | | | |Domain) |

|269 |1aycp |Tyrosine Phosphatase Syp (N-Terminal Sh2 |NF00959209 |PIR |1 - 11 |1 - 11 |1aycp |Tyrosine Phosphatase Syp (N-Terminal Sh2 Dom |

| | |Doma | | | | | | |

|270 |1b35d |Cricket Paralysis Virus, Vp4 |Q9IJX3 |SW |1 - 57 |1 - 57 |1b35c |Cricket Paralysis Virus, Vp3 |

|271 |2bbmb |Calmodulin (Calcium-Bound) Complexed With |XP_130630 |GB |1 – 16 |646 - 671 |2bbma |Calmodulin (Calcium-Bound) Complexed With Ra |

| | |Rab | | | | | | |

|272 |2bbvd |Black Beetle Virus Capsid Protein (Bbv) |P04329 |SW |1 – 44 |364 - 407 |2bbvb |Black Beetle Virus Capsid Protein (Bbv) Comp |

| | |Compl | | | | | | |

|273 |1bmmi |Hirudin I |GI:2297640 |GB |1 - 10 |383 - 394 |1bmmh |Alpha-Thrombin |

|274 |2bpa3 |Bacteriophage Phix174 Capsid Proteins Gpf, |NF00701276 |PIR |1 – 37 |1 - 37 |2bpa2 |Bacteriophage Phix174 Capsid Proteins Gpf, G |

| | |Gp | | | | | | |

|275 |1br8p |Peptide |Q7KZ97 |SW |1 - 10 |413 - 424 |1br8i |Antithrombin-III |

|276 |1brbi |Trypsin (E.C. 3.4.21.4) Variant (D189G, |NF00704918 |PIR |1 - 56 |1 - 58 |1brbe | Trypsin (E.C. 3.4.21.4) Variant (D189G, G226 |

| | |G226D | | | | | | |

|277 |1bsxx |Grip1 |NF00126022 |PIR |1 - 13 |1 - 13 |1bsxb |Thyroid Hormone Receptor Beta |

|278 |1bx2c |Hla-Dr2 |NF00516257 |PIR |1 -15 |240 - 254 |1bx2d |Hla-Dr2 |

|279 |1c3qx |His Tag |GI:5359489 |GB |1 - 12 |1 - 12 |1c3qa |Hydroxyethylthiazole Kinase |

|280 |1cdle |Calmodulin Complexed With Calmodulin-Binding|P11799 |SW |1 -20 |1730 - 1749 |1cdla |Calmodulin Complexed With Calmodulin-Binding |

|281 |1cdmb |Calmodulin Complexed With Calmodulin-Binding|Q9Y2H4 |SW |1 - 25 |339 - 363 |1cdma |Calmodulin Complexed With Calmodulin-Binding |

|282 |1cfsc |Antigen Bound Peptide |NF00531311 |PIR |1 - 11 |1 - 11 |1cfsb |Igg2A Kappa Antibody Cb41 (Heavy Chain) |

|283 |1cgii |Alpha-Chymotrypsinogen Complex With Human |NF00086129 |PIR |1 - 56 |1 - 56 |1cgie |Alpha-Chymotrypsinogen Complex With Human Pa |

| | |Pan | | | | | | |

|284 |1choi |Alpha-Chymotrypsin (E.C. 3.4.21.1) Complex |C31444 |PIR |1 - 56 |1 - 56 |1choe |Alpha-Chymotrypsin (E.C. 3.4.21.1) Complex W |

| | |Wi | | | | | | |

|285 |1cjfc |Proline Peptide |Q9Y6V0 |SW |1 - 14 |2336 - 2350 |1cjfa |Human Platelet Profilin |

|286 |1cjqa |Ribonuclease S |GI:15984328 |GB |1 - 15 |173 - 187 |1cjqb |Ribonuclease S |

|287 |1ckkb |Rat Ca2+/Calmodulin Dependent Protein Kinase|Q64572 |SW |1 - 16 |438 - 463 |1ckka |Calmodulin |

|288 |2ck0p |11-Mer |NF01342382 |PIR |1 - 11 |1 -11 |2ck0h |Immunoglobulin |

|289 |2clrc |Human Class I Histocompatibility Antigen |P27797 |SW |1 - 10 |1 -10 |2clrd |Human Class I Histocompatibility Antigen (Hl |

| | |(Hla | | | | | | |

|290 |1cu4p |Recognition Peptide |NF00502045 |PIR |1 - 10 |1 - 10 |1cu4h |Fab Heavy Chain |

|291 |1d7qb |N-Terminal Histidine Tag |NF00113306 |PIR |1 - 14 |1 - 14 |1d7qa |Translation Initiation Factor 1A |

|292 |1d9kp |Conalbumin Peptide |NF00518514 |PIR |1- 11 |1 - 11 |1d97k |Mhc I-Ak B Chain (Beta Chain) |

|293 |1ddmb |Numb Associate Kinase |Q9U485 |SW |1 – 11 |1439 - 1449 |1ddma |Numb Protein |

|294 |1de7a |Factor Xiii Activation Peptide (28-37) |GI:182837 |GB |1 - 10 |58-67 |1de7k |Alpha-Thrombin (Heavy Chain) |

|295 |1dkde |12-Mer Peptide |NF00691447 |PIR |1 - 12 |1 - 12 |1dkdb |Groel |

|296 |1dlhc |Hla-Dr1 (Dra, Drb1 0101) Human Class II |Q03909 |SW |1 - 13 |327 - 339 |1dlhb |Hla-Dr1 (Dra, Drb1 0101) Human Class II Hist |

| | |Histo | | | | | | |

|297 |1dwbi |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |GI:2297640 |GB |1 - 11 |384 - 394 |1dwbh |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |

| | |( | | | | | | |

|298 |1ehkc |Ba3-Type Cytochrome-C Oxidase |P82543 |SW |1 - 33 |2 - 34 |1ehkb |Ba3-Type Cytochrome-C Oxidase |

|299 |4er4i |Endothia Aspartic Proteinase |NF00646473 |PIR |1 - 10 |1 - 10 |4er4e |Endothia Aspartic Proteinase (Endothiapepsin |

| | |(Endothiapepsin) | | | | | | |

|300 |2f58p |Hiv-1 Gp120 |NF00498472 |PIR |1 - 11 |1 - 11 |2f58h |Igg1 Fab 58.2 Antibody (Heavy Chaiin) |

|301 |3f58p |Cyclic Peptide (Gp120) |NF00528581 |PIR |1 - 10 |1 - 10 |3f58h |Immunoglobulin Gamma I (58.2) |

|302 |1fccc |Immunoglobin Fc (Igg1) Complexed With |NF00155672 |PIR |1- 57 |47-103 |1fcca |Immunoglobin Fc (Igg1) Complexed With Protei |

| | |Protein | | | | | | |

|303 |1fiwl |Beta-Acrosin Light Chain |Q9GL10 |SW |1 - 22 |18 - 39 |1fiwa |Beta-Acrosin Heavy Chain |

|304 |1fptp |Igg2A Fab Fragment (C3) Complexed With |Q84865 |SW |1 - 18 |678 - 695 |1fptl |Igg2A Fab Fragment (C3) Complexed With Polio |

| | |Poliov | | | | | | |

|305 |1g0yi |Antagonist Peptide Af10847 |NF00130104 |PIR |1 - 21 |1 - 21 |1g0yr |Interleukin-1 Receptor, Type I |

|306 |1gagb |Bisubstrate Peptide Inhibitor |NF00138001 |PIR |1 - 13 |1 - 13 |1gaga |Insulin Receptor, Tyrosine Kinase Domain |

|307 |1ggip |Igg2A Fab Fragment (50.1) Complex With |NF00927547 |PIR |2 - 17 |2 - 17 |1ggim |Igg2A Fab Fragment (50.1) Complex With 16-Re |

| | |16-Res | | | | | | |

|308 |1hagi |Prethrombin2 (E.C. 3.4.21.5) Complexed With |GI:2297634 |GB | |389 - 398 |1hage |Prethrombin2 (E.C. 3.4.21.5) Complexed With |

| | |H | | | | | | |

|309 |1hhhc |Human Class I Histocompatibility Antigen |GI:50838420 |GB |1 - 10 |237 - 246 |1hhhb |Human Class I Histocompatibility Antigen (Hl |

| | |(Hla | | | | | | |

|310 |1hleb |Horse Leukocyte Elastase Inhibitor (Hlei) - |P05619 |SW |1 - 31 |349 - 379 |1hlea |Horse Leukocyte Elastase Inhibitor (Hlei |

| | |C | | | | | | |

|311 |1hqqe |Mp-2 |NF01417697 |PIR |1 - 14 |1 - 14 |1hqqa |Streptavidin |

|312 |1hrti |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |GI:1568172 |GB |1 - 60 |2 - 61 |1hrth |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |

| | |H | | | | | | |

|313 |1htlc |Heat Labile Enterotoxin (Lt) Mutant With Val|GI:412520 |GB |1 -49 |210 - 258 |1htla |Heat Labile Enterotoxin (Lt) Mutant With Val |

|314 |1htma |Hemagglutinin Ectodomain (Soluble Fragment, |GI:413463 |GB |1 - 27 |140 - 166 |1htmb |Hemagglutinin Ectodomain (Soluble Fragment, |

| | |T | | | | | | |

|315 |1hxlc |Mp-2 |NF01324821 |PIR |1 - 14 |1 - 14 |1hxlb |Streptavidin |

|316 |1hy2e |Mp-1 |NF01324820 |PIR |1 - 12 |1 - 12 |1hy2d |Streptavidin |

|317 |1i8ic |Epidermal Growth Factor Receptor, Egfrviii |NF00926580 |PIR |1 - 12 |1 - 12 |1i8ib |Epidermal Growth Factor Receptor Antibody Mr |

| | |Pe | | | | | | |

|318 |2igfp |Igg1 Fab' Fragment (B13I2) Complex With |P02247 |SW |1 – 30 |68 - 97 |2igfh |Igg1 Fab' Fragment (B13I2) Complex With Pept |

| | |Pepti | | | | | | |

|319 |1jf1c |Decameric Peptide Ligand From The |GI:32260204 |GB |1 - 10 |10 - 19 |1jf1b |Beta-2-Microglobulin |

| | |Mart-1/Mela | | | | | | |

|320 |1jgdc |Peptide S10R |NF01333403 |PIR |1 - 10 |1 - 10 |1jgdb |Beta-2-Microglobulin |

|321 |1jn5c |Fg-Repeat |Q86XD3 |SW |1 - 10 |1836 - 1847 |1jn5b |Tap |

|322 |1jpfc |Lcmv Peptidic Epitope Gp276 |Q9WA79 |SW |1 - 11 |281 - 291 |1jpfb |Beta-2-Microglobulin |

|323 |1juip |10-Mer Peptide |NF01057159 |PIR |1 - 10 |1 - 10 |1juid |Concanavalin A |

|324 |1jycp |15-Mer Peptide |NF01059711 |PIR |1 - 15 |1 - 15 |1jycd |Concanavalin A |

|325 |1jyip |12-Mer Peptide |NF01059710 |PIR |1 - 12 |1 - 12 |1jyid |Concanavalin A |

|326 |1klqb |Mad2-Binding Peptide |NF00866633 |PIR |1 - 12 |1 - 12 |1klqa |Mitotic Spindle Assembly Checkpoint Protein |

|327 |1klgc |Triosephosphate Isomerase Peptide |NF01019839 |PIR |1 - 15 |1 - 15 |1klgd |Enterotoxin Type C-3 |

|328 |1ktrm |Peptide Linker |GI:668327 |GB |1 - 20 |258 - 277 , 263 - |1ktrh |Anti-His Tag Antibody 3D5 Variable Heavy Cha |

| | | | | | |282, 268 - 287 | | |

|329 |1l6xb |Minimized B-Domain Of Protein A Z34C |NF00945281 |PIR |1 - 34 |1 - 34 |116xa |Immunoglobulin Gamma-1 Heavy Chain Constant |

|330 |1lcjb |Phosphopeptide Epq(Phospho)Yeeipiyl |P03079 |SW |1 - 11 |321 - 331 |1lcja |P56==Lck== Tyrosine Kinase |

|331 |1lewb |Myocyte-Specific Enhancer Factor 2A |Q03414 |SW |1 - 12 |308 - 319 |1lewa |Mitogen-Activated Protein Kinase 14 |

|332 |1mcvi |Hei-Toe I |NF01188315 |PIR |1 - 28 |1 - 28 |1mcva |Elastase 1 |

|333 |1mdib |Thioredoxin Mutant With Cys 35 Replaced By |O13075 |SW |1 - 13 |81 - 93 |1mdia |Thioredoxin Mutant With Cys 35 Replaced By A |

| | |Al | | | | | | |

|334 |1mujc |Clip Peptide |NF01197858 |PIR |1 - 36 |1 - 36 |1mujb |H-2 Class II Histocompatibility Antigen, A B |

|335 |1nrnr |Alpha-Thrombin (E.C. 3.4.21.5) |P56488 |SW |1 - 39 |30 - 68 |1nrnh |Alpha-Thrombin (E.C. 3.4.21.5) Non-Covalentl |

| | |Non-Covalently | | | | | | |

|336 |1ntvb |Apolipoprotein E Receptor-2 Peptide |XP_342878 |GB |1 - 10 |1078 - 1087 |1ntva |Disabled Homolog 1 |

|337 |1om9p |15-Mer Peptide Fragment Of P56 |Q9D8L5 |SW |2 - 16 |2 - 16 |1om9a |ADP-Ribosylation Factor Binding Protein Gga1 |

|338 |1or8b |Substrate Peptide |XP_407876 |GB |1 – 19 |346 - 364 |1or8a |Protein Arginine N-Methyltransferase 1 |

|339 |1orhb |Substrate Peptide |Q7S480 |SW |1 - 10 |466 - 475, 498 - 507|1orhb |Protein Arginine N-Methyltransferase 1 |

|340 |1ou8c |Synthetic Ssra Peptide |NF01422527 |PIR |1 – 11 |1 - 11 |1ou8b |Stringent Starvation Protein B Homolog |

|341 |1ox1b |11-Mer Peptide |NF01756611 |PIR |1 - 11 |1 - 11 |1ox1a |Trypsinogen, Cationic |

|342 |2pldb |Phospholipase C-Gamma-1 (E.C. 3.1.4.11) |GI:189730 |GB |1 - 10 |918 - 1029 |2plda |Phospholipase C-Gamma-1 (E.C. 3.1.4.11) (C-T |

| | |(C-Te | | | | | | |

|343 |1pnuf |50S Ribosomal Protein L9 |NF01342110 |PIR |1 - 52 |1 - 52 |1pnue |50S Ribosomal Protein L6 |

|344 |1pnu4 |50S Ribosomal Protein L36 |Q9RSK0 |SW |2 - 36 |2 - 36 |1pnu5 |50S Ribosomal Protein L1P |

|345 |1pwwc |Lf20 |NF01571451 |PIR |1 - 20 |1 - 20 |1pwwb |Lethal Factor |

|346 |1qc6c |Phe-Glu-Phe-Pro-Pro-Pro-Pro-Thr-Asp-Glu-Glu |S20887 |PIR |1 - 10 |198-208 |1qc6a |Evh1 Domain From Ena/Vasp-Like Protein |

|347 |1qrja |His Tag |GI:13275534 |GI |1 - 15 |8 - 22 |1qrjb |Htlv-I Capsid Protein |

|348 |1r17c |Fibrinopeptide B |NF01479229 |PIR |1 - 16 |1 - 16 |1r17b |Fibrinogen-Binding Protein Sdrg |

|349 |1rxmb |Consensus Fen-1 Peptide |NF01571458 |PIR |1 - 12 |1 - 12 |1rxma |DNA Polymerase Sliding Clamp |

|350 |1s9vc |Alpha-I Gliadin |NF01683718 |PIR |1 - 11 |1 - 11 |1s9vb |Hla Class II Histocompatibility Antigen, Dq( |

|351 |1scma |Myosin (Regulatory Domain) - Chain A |Q17042 |SW | 1 -60 |780 - 839 |1scmb |Myosin (Regulatory Domain) - Chain B |

|352 |1sdzb |Reaper |Q24475 |SW |1 - 10 |2 - 11 |1sdza |Apoptosis 1 Inhibitor |

|353 |4sgbi |Serine Proteinase B Complex With The Potato |P01080 |SW |1 -51 |55 - 105 |4sgbe |Serine Proteinase B Complex With The Potato |

| | |I | | | | | | |

|354 |1srnb |Semisynthetic Ribonuclease A (RNase |GI:387884 |GB |2 -15 |143 - 156 |1srna |Semisynthetic Ribonuclease A (RNase 1-118(Co |

| | |1-118(Col | | | | | | |

|355 |3srnb |Semisynthetic Ribonuclease A Mutant With Asp|NF00159871 |PIR |1 - 10 |114 - 124 |3srna |Semisynthetic Ribonuclease A Mutant With Asp |

|356 |4srnb |Semisynthetic Ribonuclease A Mutant With Asp|NF00159749 |PIR |1 - 10 |114 - 124 |3srna |Semisynthetic Ribonuclease A Mutant With Asp |

|357 |1ssab |Ribonuclease A (Residues 1 - 118) Complexed |NF00945476 |PIR |1 - 14 |1 - 14 |1ssaa |Ribonuclease A (Residues 1 - 118) Complexed |

| | |W | | | | | | |

|358 |1sscb |Ribonuclease A (Semisynthetic) Crystallized |GI:387884 |GB |1 - 10 |146 - 156 |1ssca |Ribonuclease A (Residues 1 - 118) Complexed |

| | |F | | | | | | |

|359 |1tetp |Igg1 Monoclonal Fab Fragment (Te33) Complex |GI:209556 |GB |1 - 15 |78 - 92 |1teth |Igg1 Monoclonal Fab Fragment (Te33) Complex |

| | |W | | | | | | |

|360 |1thri |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |GI:490635 |GB |1 - 13 |67 - 79 |1thrh |Alpha-Thrombin (E.C. 3.4.21.5) Complex With |

| | |H | | | | | | |

|361 |1tmcc |Truncated Human Class I Histocompatibility |Q99MQ0 |SW |1 - 10 |351 - 360 |1tmcb |Truncated Human Class I Histocompatibility A |

| | |An | | | | | | |

|362 |1tmf4 |Theiler'S Murine Encephalomyelitis Virus |NF01026525 |PIR |1 - 31 |1 - 31 |1tmf3 |Theiler'S Murine Encephalomyelitis Virus Coa |

| | |Coat | | | | | | |

|363 |1vf5e |Protein Pet L |P83795 |SW |1 - 32 |1 - 32 |1vf5d |Rieske Iron-Sulfur Protein |

|364 |1vf5h |Protein Pet N |P83798 |SW |1 - 29 |1 - 29 |1vf5n |Cytochrome B6 |

|365 |1vppx |Peptide V108 |NF00088534 |PIR |1 -20 |1 - 20 |1vppw |Vascular Endothelial Growth Factor |

|366 |1m0fb |Scaffolding Protein B |NF01151402 |PIR |1 - 60 |1 - 60 |1m0fg |Major Spike Protein G |

|367 |1n6eb |Dqtqkaaaeltff |NF01138087 |PIR |1 - 13 |1 - 13 |1n6ec |Tricorn Protease |

|368 |1p4bp |Gcn4(7P-14P) Peptide |NF01743773 |PIR |1 - 12 |1 - 12 |1p4bh |Antibody Variable Heavy Chain |

|369 |1ow6d |Paxillin |XP_341094 |GB |1 – 13 |284 - 296 |1ow6c |Focal Adhesion Kinase 1 |

|370 |1ow8d |Paxillin |XP_341094 |GB |1 - 13 |163 - 175 |1ow6c |Focal Adhesion Kinase 1 |

|371 |1r5ve |Artificial Peptide |NF01572515 |PIR |1 - 13 |1 - 13 |1r5vb |Mhc H2-Ie-Beta |

|372 |1r5we |Artificial Peptide |NF01676397 |PIR |1 - 13 |1 - 13 |1r5wd |Mhc H2-Ie-Beta |

| |July 2004 |October 2005 |

|MoRF chains from PDB Seqres |2512 |4410 |

|Filtering (remove chains containing less than 10 residues, ambiguous |1261 |1937 |

|amino acids etc) | | |

|Non Redundant MoRFs |372 |486 |

References

Wright, P. E., and Dyson, H. J. (1999) Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm, J. Mol. Biol. 293, 321-331.

Uversky VN, Gillespie JR, Fink AL. 2000. Why are "natively unfolded" proteins unstructured under physiologic conditions? Proteins 41: 415-427

Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC, Obradovic Z. 2001. Intrinsically disordered protein. J Mol Graph Model 19: 26-59

Dunker AK, Obradovic Z. 2001. The protein trinity--linking function and disorder. Nat Biotechnol 19: 805-806

Demchenko AP. 2001. Recognition between flexible protein molecules: induced and assisted folding. J Mol Recognit 14: 42-61

Namba K. 2001. Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells 6: 1-12

Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. 2002. Intrinsic disorder and protein function. Biochemistry 41: 6573-6582

Dunker AK, Brown CJ, Obradovic Z. 2002. Identification and functions of usefully disordered proteins. Adv Protein Chem 62: 25-49

Dunker, A. K., Obradovic, Z., Romero, P., Garner, E. C., and Brown, C. J. (2000) Intrinsic protein disorder in complete genomes, Genome Inform. Ser. Workshop Genome Inform. 11,161-171.

Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F. and Jones, D.T. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337, 635–645.

Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK. 2002. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 323: 573-584

Tompa P. 2002. Intrinsically unstructured proteins. Trends Biochem Sci 27: 527-533

Fink AL. 2005. Natively unfolded proteins. Curr Opin Struct Biol 15: 35-41

Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6: 197-208;

Dunker A.K., Cortese M.S., Romero P., Iakoucheva L.M., Uversky V.N. (2005) Flexible nets: The roles of intrinsic disorder in protein interaction networks. FEBS Journal. (In press).

Uversky V.N., Oldfield, C., Dunker, A.K. (2005) Showing your ID: Intrinsic disorder as an ID for recognition, regulation and cell signalling. J. Mol. Recognition 18 (5) 343-384.

Uversky VN. 2002. Natively unfolded proteins: a point where biology waits for physics. Protein Sci 11: 739-756;

Uversky V.N. (2003) Protein folding revisited. A polypeptide chain at the folding – misfolding – non-folding crossroads: Which way to go? Cell. Mol. Life Sci. 60 (9) 1852-1871.

Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK. 2005. Comparing and combining predictors of mostly disordered proteins. Biochemistry 44: 1989-2000

Liu, J. and Rost, B. (2001).Comparing function and structure between entire proteomes. Protein Sci 10: 1970-1979

Vucetic S, Brown CJ, Dunker AK, Obradovic Z. 2003. Flavors of protein disorder. Proteins 52: 573-584--

Callaghan, A.J., Aurikko, J.P., Ilag, L.L., Gunter Grossmann, J., Chandran, V., Kuhnel, K., Poljak, L., Carpousis, A.J., Robinson, C.V., Symmons, M.F. 2004. Studies of the RNA degradosome-organizing domain of the Escherichia coli ribonuclease RNase E. J. Mol. Biol. 340: 965-979

GlobPlot: exploring protein sequences for globularity and disorder

Nucleic Acid Res 2003 - Vol. 31, No.13

Demchenko AP. 2001. Recognition between flexible protein molecules: induced and assisted folding. J Mol Recognit 14: 42-61

Namba K. 2001. Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells 6: 1-12; Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. 2002. Intrinsic disorder and protein function. Biochemistry 41: 6573-6582

Gunasekaran K, Tsai CJ, Kumar S, Zanuy D, Nussinov R. 2003. Extended disordered proteins: targeting function with less scaffold. Trends Biochem Sci 28: 81-85

Dyson HJ, Wright PE. 2005. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6: 197-208

Dyson HJ, Wright PE. 2002. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12: 54-60

Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, and Dunker AK. 2005. "DisProt: A Database of Protein Disorder." Bioinformatics 21:137-140.

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000

B Rost (1999) Twilight zone of protein sequence alignments. Protein Engineering, 12, 85-94

Bairoch A., Apweiler R.The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28:45-48(2000).

Cathy H. Wu, Lai-Su L. Yeh, Hongzhan Huang, Leslie Arminski, Jorge Castro-Alvear, Yongxing Chen, Zhang-Zhi Hu, Robert S. Ledley, Panagiotis Kourtesis, Baris E. Suzek, C. R. Vinayaka, Jian Zhang, and Winona C. Barker. The Protein Information Resource. Nucleic Acids Research, 31: 345-347, 2003.

Kabsch W. & Sander C. (1983) Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features, 22:2577-2637.

Rost, Burkhard; Sander, Chris: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599.

Rost, Burkhard; Sander, Chris: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 1994

Falquet L., Pagni M., Bucher P., Hulo N., Sigrist C.J, Hofmann K., Bairoch A.

The PROSITE database, its status in 2002. Nucleic Acids Research. 30:235-238(2002).

Garner,E., Cannon,P., Romero,P., Obradovic,Z. and Dunker,A. (1998) Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization. Genome Inform Ser Workshop Genome Inform, 9, 201–213.

Garner, E., Romero, P., Dunker, A., Brown, C. and Obradovic, Z. (1999) Predicting binding regions within disordered proteins. Genome Inform Ser Workshop Genome Inform, 10, 41–50.

Dyson, H. J. & Wright, P. E. (2001). Nuclear magnetic resonance methods for elucidation of structure and dynamics in disordered states. Methods Enzymol. 339, 258–270.

Fischer E, "Einfluss der configuration auf die wirkung derenzyme" Ber. Dt. Chem. Ges. 27, 2985-2993 (1894).

Koshland D.E. (1958). Application of a theory of enzyme specificity to protein synthesis. Proceedings of the National Academy of Sciences USA, 44(2), 98-104. Wootton, J. C. (1994) Sequences with “unusual” amino acid compositions, Curr. Opin. Struct. Biol. 4, 413-421.

Kim, T. D., Ryu, H. J., Cho, H. I., Yang, C. H., and Kim, J. (2000) Thermal behavior of proteins: Heat-resistant proteins and their heat-induced secondary structural changes, Biochemistry 39, 14839-14846.

Schweers, O., Schonbrunn-Hanebeck, E., Marx, A., and Mandelkow, E. (1994) Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for â-structure, J. Biol Chem. 269, 24290-24297.

Gast, K., Damaschun, H., Eckert, K., Schulze-Forster, K., Maurer, H. R., Muller-Frohne, M., Zirwer, D., Czarnecki, J., and Damaschun, G. (1995) Prothymosin R: A biologically active protein with random coil conformation, Biochemistry 34, 13211-13218.

Shortle, D. & Ackerman, M. S. (2001). Persistence of native-like topology in a denatured protein in 8 M urea. Science, 293, 487–489.

Shortle, D. (1996) The denatured state (the other half of the folding equation) and its role in protein stability, FASEB J. 10, 27-34.

Tompa, P. (2003) The functional benefits of protein disorder, J. Mol. Struct. 666-667, 361-371.

Shoemaker, B. A., Portman, J. J. & Wolynes, P. G. (2000). Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc. Natl Acad. Sci. USA, 97, 8868–8873.

Zitzewitz, J. A., Ibarra-Molero, B., Fishel, D. R., Terry, K. L. & Matthews, C. R. (2000). Preformed secondary structure drives the association reaction of GCN4-p1, a model coiled-coil system. J. Mol. Biol. 296, 1105–1116.

Hollenbeck, J. J., McClain, D. L. & Oakley, M. G. (2002). The role of helix stabilizing residues in GCN4 basic region folding and DNA binding. Protein Science. 11, 2740–2747.

Li X, Romero P, Rani M, Dunker AK, Obradovic Z: Predicting Protein Disorder for N-, C-, and Internal Regions.Genome Inform Ser Workshop Genome Inform 1999, 10:30-40.

Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence complexity of disordered protein. Proteins 2001, 42:38-48.

Romero P, Obradovic Z, Dunker K: Sequence Data Analysis for Long Disordered Regions Prediction in the Calcineurin Family. Genome Inform Ser Workshop Genome Inform 1997, 8:110-124.

Obradovic Z., Peng K., Vucetic S., Radivojac P., Brown C. and Dunker A.K., Predicting intrinsic disorder from amino acid sequence (2003). Proteins 53 (S6); 566-572.

Sreerama, N., and Woody, R.W. (1994) Biochemistry 33, 10022-10025.

Mukhopadhyay, R., and Hoh, J. H. (2001) AFM force measurements on microtubule-associated proteins: The projection domain exerts a long-range repulsive force, FEBS Lett. 505, 374-378.

Sherr C J, Roberts J M. Inhibitors of mammalian G1 cyclin-dependent kinases. Genes Dev. 1995; 9:1149–1163.

Kanamoto T, Mota MA, Takeda K, Rubin LL, Miyazopo K, Ichijo H, • Bazenet CE: Role of apoptosis signal-regulating kinase in regulation of the c-Jun N-terminal kinase pathway and apoptosis in sympathetic neurons. Mol Cell Biol 2000, 20:196-204.

Sheaff, R.J., Singer, J.D., Swanger, J., Smitherman, M., Roberts, J.M. and Clurman, B.E. (2000) Proteasomal turnover of p21Cip1 does not require p21Cip1 ubiquitination. Mol. Cell 5, 403–410.

David, D.C., Layfield, R., Serpell, L., Narain, Y., Goedert, M. and Spillantini, M.G. (2002) Proteasomal degradation of tau protein. J. Neurochem. 83, 176–185.

Liu, C.W., Corboy, M.J., DeMartino, G.N. and Thomas, P.J. (2003) Endoproteolytic activity of the proteasome. Science 299, 408–411.

Cox, C.J., Dutta, K., Petri, E.T., Hwang, W.C., Lin, Y., Pascal, S.M. and Basavappa, R. (2002) The regions of securin and cyclin B proteins recognized by the ubiquitination machinery are natively unfolded. FEBS Lett. 527, 303–308.

Hinnebusch, A. G., and G. R. Fink. 1983. Positive regulation in the general control of Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 80:5374-5378

Kim, P. S. & Baldwin, R. L. (1982). Specific intermediates in the folding reactions of small proteins and the mechanism of protein folding. Annu. Rev. Biochem. 51, 459–489.

Dafforn, T.R. and Smith, C.J. (2004) Natively unfolded domains in endocytosis: hooks, lines and linkers. EMBO Rep. 5, 1046–1052.

Tompa, P. and Csermely, P. (2004) The role of structural disorder in the function of RNA and protein chaperones. FASEB J. 18, 1169–1175.

Fiser, A., Dosztanyi, Z. & Simon, I. (1997). The role of long-range interactions in defining the secondary structure of proteins is overestimated. Comput. Appl.Biosci. 13, 297–301.

Burley, S.K. and Petsko, G.A. 1985]\Aromatic-aromatic interaction: A mechanism of protein structure stabilization, "Science, vol. 229, pp. 23-28

G. N. Ramachandran and V. Sasiskharan (1968) Adv. Protein Chem. 23, 283-437.

The NCBI handbook [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; 2002 Oct. Available from

CURRICULUM VITAE

AMRITA MOHAN

ammohan@indiana.edu

EDUCATION

2005 – Present PhD student, Informatics, Indiana University, Bloomington

2003 - 2005 Masters of Bioinformatics, Indiana University, IUPUI

1999 - 2003 Bachelor of Info. Technology, University of Delhi, India

RESEARCH/PROFESSIONAL EXPERIENCE

1. May ’05 – Aug ’05 Intern, Rosetta Inpharmatics - Merck, Seattle, WA, USA

2. Aug ’03 – Aug ’05 Research Assistant, Center for Computational Biology & Bioinformatics, IUPUI, IN, USA

3. Jun'02 - Aug'02 Intern, Institute of Advanced Biosciences-‘E-Cell Lab’, Japan

4. Jun ‘01 – May ’02 Project Trainee, Center for Biochemical Technology (under Council for Scientific & Industrial Research), New Delhi, India

5 Jul’98 – Apr’99 Experimental project “Gene probe for detection of chronic mylogenous leukemia”, India

6 Jul’98 – Apr’99 Experimental project, “Gene expression for breast cancer”, India

POSTERS & RESEARCH PUBLICATIONS

1. Poster Presentation: First Annual Indiana Bioinformatics Conference, Department of Biochemistry & Molecular Biology Poster Session, IUPUI, May 27, 2004.

Amrita Mohan, Predrag Radivojac and Keith Dunker

MoREs: Molecular Recognition Elements

2. Poster Presentation: Research Day, Department of Biochemistry & Molecular Biology Poster Session, IUPUI, September 30, 2005.

Amrita Mohan, Predrag Radivojac and Keith Dunker,

MoREs: Molecular Recognition Elements

3. (Publication in process)

MoRFs: A dataset of Molecular Recognition Features

Amrita Mohan, Predrag Radivojac and Keith Dunker,

-----------------------

Figure 4: (a) Secondary structure distribution of residues in MoRFs

(b) Secondary structure distribution in Monomers

Table 4: PhD secondary structure prediction accuracies for MoREs

Table 6: Order-Disorder statistics for different classes of MoRFs

Appendix B: MoRF Update

372 Non redundant MoRFs

(Source: PDB Seqres July 2004)

Sheet/Strand 25%

Irregular

42%

Helices 33%

Irregular 48%

Sheet/Strand 12%

Helices 27%

Disorder 13%

Figure 2: Frequency distribution of number of homologous MoRF sequences for 372 Non Redundant MoRFs [x axis: # of MoRF sequences (# of clusters), y axis: # of homologous members (cluster members)

Total # of clusters = 372

Appendix A: Molecular Recognition Features (or Elements) and their partners

Figure 9: Disorder distribution in MoRFs using VL-XT & VL3 predictors

Figure 9: Ramachandran Plot for MoRFs

Master’s Thesis

Committee

Table 5: Region wise distribution in different structural types of MoRFs

Figure 6: Histogram for region wi 6: Histogram for region wise distribution in MoRFs

Figure 4: Length distribution in MoREs dataset.

α-helical MoRF p53 bound to mdm2

irregular -MoRF p53 bound to cyclin A2

Figure 10: VLXT disorder prediction in p53 protein: Residues 17-27 on the N-terminal bind to MDM2 & residues 378 -386 on the C –terminal bind to cyclin A2. These regions also correspond to visible dips in the VLXT prediction plot indicating the possibility of finding novel binding sites in other proteins by using knowledge on MoRFs in them.

Disorder 0%

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download