Pokotylo, Oleksii (2016). Depth- and Potential-Based Supervised Learning. PhD thesis, Universität zu Köln.
|
PDF
Oleksii_Pokotylo_PhD_thesis.pdf Bereitstellung unter der CC-Lizenz: Creative Commons Attribution Non-commercial Share Alike. Download (22MB) |
Abstract
The task of supervised learning is to define a data-based rule by which the new objects are assigned to one of the classes. For this a training data set is used that contains objects with known class membership. In this thesis, two procedures for supervised classification are introduced. The first procedure is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials, similarly to the DD-plot. Separation of the classes, as well as classification of new data points, is performed on this plot, thus the bias in kernel density estimates due to insufficiently adapted multivariate kernels is compensated by a flexible classifier on the pot-pot plot. The proposed method has been implemented in the R-package ddalpha that is a software directed to fuse experience of the applicant with recent theoretical and computational achievements in the area of data depth and depth-based classification. It implements various depth functions and classifiers for multivariate and functional data under one roof. The package is expandable with user-defined custom depth methods and separators. The second classification procedure focuses on the centers of the classes and is based on data depth. The classifier adds a depth term to the objective function of the Bayes classifier, so that the cost of misclassification of a point depends not only on its belongingness to a class but also on its centrality in this class. Classification of more central points is enforced while outliers are underweighted. The proposed objective function may also be used to evaluate the performance of other classifiers instead of the usual average misclassification rate. The thesis also contains a new algorithm for the exact calculation of the Oja median. It modifies the algorithm of Ronkainen, Oja and Orponen (2003) by employing bounded regions which contain the median. The new algorithm is faster and has lower complexity than the previous one. The new algorithm has been implemented as a part of the R-package OjaNP.
Item Type: | Thesis (PhD thesis) | ||||||||
Creators: |
|
||||||||
Corporate Creators: | Cologne Graduate School in Management, Economics and Social Sciences | ||||||||
URN: | urn:nbn:de:hbz:38-71088 | ||||||||
Date: | 17 October 2016 | ||||||||
Language: | English | ||||||||
Faculty: | Faculty of Management, Economy and Social Sciences | ||||||||
Divisions: | Faculty of Management, Economics and Social Sciences > Economics > Econometrics and Statistics > Professorship for Statistics and Econometrics | ||||||||
Subjects: | Data processing Computer science General statistics |
||||||||
Uncontrolled Keywords: |
|
||||||||
Date of oral exam: | 17 October 2016 | ||||||||
Referee: |
|
||||||||
Funders: | Cologne Graduate School in Management, Economics and Social Sciences | ||||||||
Refereed: | Yes | ||||||||
URI: | http://kups.ub.uni-koeln.de/id/eprint/7108 |
Downloads
Downloads per month over past year
Export
Actions (login required)
View Item |