Bolten, Eva, Schliep, Alexander, Schneckener, Sebastian, Schomburg, Dietmar and Schrader, Rainer (2001). Clustering Protein Sequences - Structure Prediction by transitive homology. Bioinformatics, 17 (10). pp. 935-941. Oxford University Press.

[img]
Preview
PDF
zaik2000-383.pdf - Draft Version

Download (151kB) | Preview

Abstract

It is widely believed that for two proteins A and B a sequence identity above some threshold implies structural similarity. It is not fully understood whether in the case that sequence similarity between A and B is below this threshold the existence of a third protein with a level of sequence similarity with A and with B which is high enough suffices for inferring structural similarity of A and B, that is whether transitivity holds. We examined the protein sequences in the SwissProt database. Their similarity was determined using the Smith-Waterman algorithm. This data was transformed into a directed graph where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity above a fixed threshold. By use of a length dependent scaling of the alignment scores we have a criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed a very efficient library. Methods include both a novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. The parameters of above algorithms used were fine-tuned by using SCOP as a test set. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings obtained and general methodology relevant for testing our hypothesis.

Item Type: Journal Article
Creators:
CreatorsEmailORCIDORCID Put Code
Bolten, EvaUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Schliep, AlexanderUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Schneckener, SebastianUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Schomburg, DietmarUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Schrader, RainerUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
URN: urn:nbn:de:hbz:38-548539
Journal or Publication Title: Bioinformatics
Volume: 17
Number: 10
Page Range: pp. 935-941
Date: 2001
Publisher: Oxford University Press
Language: English
Faculty: Faculty of Mathematics and Natural Sciences
Divisions: Faculty of Mathematics and Natural Sciences > Department of Mathematics and Computer Science > Institute of Computer Science
Subjects: Data processing Computer science
Refereed: No
URI: http://kups.ub.uni-koeln.de/id/eprint/54853

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item View Item