Pokotylo, Oleksii (2016). Depth- and Potential-Based Supervised Learning. PhD thesis, Universität zu Köln.

[img]
Preview
PDF
Oleksii_Pokotylo_PhD_thesis.pdf
Bereitstellung unter der CC-Lizenz: Creative Commons Attribution Non-commercial Share Alike.

Download (22MB)

Abstract

The task of supervised learning is to define a data-based rule by which the new objects are assigned to one of the classes. For this a training data set is used that contains objects with known class membership. In this thesis, two procedures for supervised classification are introduced. The first procedure is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials, similarly to the DD-plot. Separation of the classes, as well as classification of new data points, is performed on this plot, thus the bias in kernel density estimates due to insufficiently adapted multivariate kernels is compensated by a flexible classifier on the pot-pot plot. The proposed method has been implemented in the R-package ddalpha that is a software directed to fuse experience of the applicant with recent theoretical and computational achievements in the area of data depth and depth-based classification. It implements various depth functions and classifiers for multivariate and functional data under one roof. The package is expandable with user-defined custom depth methods and separators. The second classification procedure focuses on the centers of the classes and is based on data depth. The classifier adds a depth term to the objective function of the Bayes classifier, so that the cost of misclassification of a point depends not only on its belongingness to a class but also on its centrality in this class. Classification of more central points is enforced while outliers are underweighted. The proposed objective function may also be used to evaluate the performance of other classifiers instead of the usual average misclassification rate. The thesis also contains a new algorithm for the exact calculation of the Oja median. It modifies the algorithm of Ronkainen, Oja and Orponen (2003) by employing bounded regions which contain the median. The new algorithm is faster and has lower complexity than the previous one. The new algorithm has been implemented as a part of the R-package OjaNP.

Item Type: Thesis (PhD thesis)
Creators:
CreatorsEmailORCIDORCID Put Code
Pokotylo, OleksiiUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Corporate Creators: Cologne Graduate School in Management, Economics and Social Sciences
URN: urn:nbn:de:hbz:38-71088
Date: 17 October 2016
Language: English
Faculty: Faculty of Management, Economy and Social Sciences
Divisions: Faculty of Management, Economics and Social Sciences > Economics > Econometrics and Statistics > Professorship for Statistics and Econometrics
Subjects: Data processing Computer science
General statistics
Uncontrolled Keywords:
KeywordsLanguage
supervised classification, data depth, kernel density estimates, bandwidth choice, potential functions, DD-plot, ddalpha, Oja median, OjaNPEnglish
Date of oral exam: 17 October 2016
Referee:
NameAcademic Title
Mosler, KarlProf. Dr.
Breitung, JörgProf. Dr.
Funders: Cologne Graduate School in Management, Economics and Social Sciences
Refereed: Yes
URI: http://kups.ub.uni-koeln.de/id/eprint/7108

Downloads

Downloads per month over past year

Export

Actions (login required)

View Item View Item