Multi-View Feature Extraction Based On Canonical Analysis
Summary:
Canonical Correlation Analysis (CCA) aims at identifying linear dependencies between two sets of variables. CCA has recently become popular in the field of machine learning with the increase in the number of multi-view datasets, which consist of different representations coming from different sources or modalities. This thesis presents our efforts to improve the robustness and discriminative ability of CCA. CCA uses the views as complex labels to guide the search of maximally correlated projection vectors (covariates). Therefore, CCA can overfit the training data. Although, ensemble approaches have been effectively used to avoid such overfittings of classification and clustering techniques, an ensemble approach has not yet been formulated for CCA. In this thesis, we propose an ensemble method for obtaining a final set of covariates by combining multiple sets of covariates extracted from subsamples. Experimental results on various datasets demonstrate the usefulness of ensemble CCA approach. The correlated features extracted by CCA may not be class-discriminative since it does not utilize the class labels in its implementation. This thesis introduces a method to explore correlated and also discriminative features. Our proposed method utilizes two (alternating) multi-layer perceptrons, each with a linear hidden layer, learning to predict both the class-labels and the outputs of each other. The experimental results show that the features found by the proposed method accomplish signicantly higher classification accuracies. Another contribution of this thesis is the use of CCA to improve a filter feature selection algorithm. We also present our works on ensemble classification and clustering for multi-view datasets.