Kernel Methods for Classification with Irregularly Sampled and Contaminated Data.

Design of a classifier consists of two stages: feature extraction and classifier learning. For a better performance, the nature, characteristics, or underlying structure of data should be taken into account in either of the stages when we design a classifier. In this thesis, we present kernel methods for classification with irregularly sampled and contaminated data.First, we propose a feature extraction method for irregularly sampled data. Irregularly sampled data often arises in medical applications where the vital signs of patients are monitored based on the severity of their condition and the availability of nursing staff. In particular, we consider an ICU (intensive care unit) admission prediction problem for a post-operative patient with possible sepsis. The experimental results show that the proposed features, when paired with kernel methods, have more discriminating power than those used by clinicians.Second, we consider one-class classification problem with contaminated data, where the majority of the data comes from a "nominal" distribution with a small fraction of the data coming from an outlying distribution. We deal with this problem by robustly estimating the nominal density (or a level set thereof) from the contaminated data. Our proposed density estimation achieves robustness by combinining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. The robustness of the density estimator is demonstrated with a representer theorem, the influence function, and experimental results.Third, we propose a kernel classifier that optimizes the L_2 distances between "difference of densities". Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We also provide statistical performance guarantees for the proposed L_2 kernel classifier in the form of a finite sample oracle inequality, and strong consistency in the sense of both ISE and probability of error.

MoreLess

Year of publication:	2011
Authors:	Kim, Joo Seuk
Subject:	Classification \| Kernel Methods \| Contaminated Data \| Machine Learning \| Computer Science \| Electrical Engineering \| Engineering

More details

Type of publication:	Other
Language:	English
Source:	BASE

Persistent link: https://www.econbiz.de/10009482954