Bolten, Eva, Schliep, Alexander, Schneckener, Sebastian, Schomburg, Dietmar and Schrader, Rainer
(2001).
Clustering Protein Sequences - Structure Prediction by transitive homology.
Bioinformatics, 17 (10).
pp. 935-941.
Oxford University Press.
Preview |
PDF
zaik2000-383.pdf - Draft Version Download (151kB) | Preview |
Abstract
It is widely believed that for two proteins A and B a sequence identity above some threshold implies structural similarity. It is not fully understood whether in the case that sequence similarity between A and B is below this threshold the existence of a third protein with a level of sequence similarity with A and with B which is high enough suffices for inferring structural similarity of A and B, that is whether transitivity holds. We examined the protein sequences in the SwissProt database. Their similarity was determined using the Smith-Waterman algorithm. This data was transformed into a directed graph where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity above a fixed threshold. By use of a length dependent scaling of the alignment scores we have a criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed a very efficient library. Methods include both a novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. The parameters of above algorithms used were fine-tuned by using SCOP as a test set. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings obtained and general methodology relevant for testing our hypothesis.
| Item Type: | Article |
| Creators: | Creators Email ORCID ORCID Put Code Bolten, Eva UNSPECIFIED UNSPECIFIED UNSPECIFIED Schliep, Alexander UNSPECIFIED UNSPECIFIED UNSPECIFIED Schneckener, Sebastian UNSPECIFIED UNSPECIFIED UNSPECIFIED Schomburg, Dietmar UNSPECIFIED UNSPECIFIED UNSPECIFIED Schrader, Rainer UNSPECIFIED UNSPECIFIED UNSPECIFIED |
| URN: | urn:nbn:de:hbz:38-548539 |
| Journal or Publication Title: | Bioinformatics |
| Volume: | 17 |
| Number: | 10 |
| Page Range: | pp. 935-941 |
| Date: | 2001 |
| Publisher: | Oxford University Press |
| Language: | English |
| Faculty: | Faculty of Mathematics and Natural Sciences |
| Divisions: | Faculty of Mathematics and Natural Sciences > Department of Mathematics and Computer Science > Institute of Computer Science |
| Subjects: | Data processing Computer science |
| Refereed: | No |
| URI: | http://kups.ub.uni-koeln.de/id/eprint/54853 |
Downloads
Downloads per month over past year
Export
Actions (login required)
![]() |
View Item |
