Bolten, Eva, Schliep, Alexander, Schneckener, Sebastian, Schomburg, Dietmar and Schrader, Rainer (2001). Clustering Protein Sequences - Structure Prediction by transitive homology. Bioinformatics, 17 (10). pp. 935-941. Oxford University Press.

[thumbnail of zaik2000-383.pdf]
Preview
PDF
zaik2000-383.pdf - Draft Version

Download (151kB) | Preview

Abstract

It is widely believed that for two proteins A and B a sequence identity above some threshold implies structural similarity. It is not fully understood whether in the case that sequence similarity between A and B is below this threshold the existence of a third protein with a level of sequence similarity with A and with B which is high enough suffices for inferring structural similarity of A and B, that is whether transitivity holds. We examined the protein sequences in the SwissProt database. Their similarity was determined using the Smith-Waterman algorithm. This data was transformed into a directed graph where protein sequences constitute vertices. A directed edge was drawn from vertex A to vertex B if the sequences A and B showed similarity above a fixed threshold. By use of a length dependent scaling of the alignment scores we have a criterion to avoid clustering errors due to multi-domain proteins. To deal with the resulting large graphs we have developed a very efficient library. Methods include both a novel graph-based clustering algorithm capable of handling multi-domain proteins and cluster comparison algorithms. The parameters of above algorithms used were fine-tuned by using SCOP as a test set. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings obtained and general methodology relevant for testing our hypothesis.

Item Type: Article
Creators:
Creators
Email
ORCID
ORCID Put Code
Bolten, Eva
UNSPECIFIED
UNSPECIFIED
UNSPECIFIED
Schliep, Alexander
UNSPECIFIED
UNSPECIFIED
UNSPECIFIED
Schneckener, Sebastian
UNSPECIFIED
UNSPECIFIED
UNSPECIFIED
Schomburg, Dietmar
UNSPECIFIED
UNSPECIFIED
UNSPECIFIED
Schrader, Rainer
UNSPECIFIED
UNSPECIFIED
UNSPECIFIED
URN: urn:nbn:de:hbz:38-548539
Journal or Publication Title: Bioinformatics
Volume: 17
Number: 10
Page Range: pp. 935-941
Date: 2001
Publisher: Oxford University Press
Language: English
Faculty: Faculty of Mathematics and Natural Sciences
Divisions: Faculty of Mathematics and Natural Sciences > Department of Mathematics and Computer Science > Institute of Computer Science
Subjects: Data processing Computer science
Refereed: No
URI: http://kups.ub.uni-koeln.de/id/eprint/54853

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item View Item