Universität zu Köln

Depth- and Potential-Based Supervised Learning

Pokotylo, Oleksii (2016) Depth- and Potential-Based Supervised Learning. PhD thesis, Universität zu Köln.

[img]
Preview
PDF
Available under License Creative Commons Attribution Non-commercial Share Alike.

Download (21Mb) | Preview

    Abstract

    The task of supervised learning is to define a data-based rule by which the new objects are assigned to one of the classes. For this a training data set is used that contains objects with known class membership. In this thesis, two procedures for supervised classification are introduced. The first procedure is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials, similarly to the DD-plot. Separation of the classes, as well as classification of new data points, is performed on this plot, thus the bias in kernel density estimates due to insufficiently adapted multivariate kernels is compensated by a flexible classifier on the pot-pot plot. The proposed method has been implemented in the R-package ddalpha that is a software directed to fuse experience of the applicant with recent theoretical and computational achievements in the area of data depth and depth-based classification. It implements various depth functions and classifiers for multivariate and functional data under one roof. The package is expandable with user-defined custom depth methods and separators. The second classification procedure focuses on the centers of the classes and is based on data depth. The classifier adds a depth term to the objective function of the Bayes classifier, so that the cost of misclassification of a point depends not only on its belongingness to a class but also on its centrality in this class. Classification of more central points is enforced while outliers are underweighted. The proposed objective function may also be used to evaluate the performance of other classifiers instead of the usual average misclassification rate. The thesis also contains a new algorithm for the exact calculation of the Oja median. It modifies the algorithm of Ronkainen, Oja and Orponen (2003) by employing bounded regions which contain the median. The new algorithm is faster and has lower complexity than the previous one. The new algorithm has been implemented as a part of the R-package OjaNP.

    Item Type: Thesis (PhD thesis)
    Creators:
    CreatorsEmail
    Pokotylo, Oleksii
    Corporate Creators: Cologne Graduate School in Management, Economics and Social Sciences
    URN: urn:nbn:de:hbz:38-71088
    Subjects: Data processing Computer science
    General statistics
    Uncontrolled Keywords:
    KeywordsLanguage
    supervised classification, data depth, kernel density estimates, bandwidth choice, potential functions, DD-plot, ddalpha, Oja median, OjaNPEnglish
    Faculty: Wirtschafts- u. Sozialwissenschaftliche Fakultät
    Divisions: Wirtschafts- u. Sozialwissenschaftliche Fakultät > Institut für Ökonometrie und Statistik
    Language: English
    Date: 17 October 2016
    Date Type: Publication
    Date of oral exam: 17 October 2016
    Full Text Status: Public
    Date Deposited: 24 Jan 2017 15:48:12
    Referee
    NameAcademic Title
    Mosler, KarlProf. Dr.
    Breitung, JörgProf. Dr.
    URI: http://kups.ub.uni-koeln.de/id/eprint/7108

    Actions (login required)

    View Item